diff --git a/README.md b/README.md new file mode 100644 index 0000000..6141aac --- /dev/null +++ b/README.md @@ -0,0 +1,77 @@ + +## ABSTRACT +Hacking Python? Need to import some Excel data and run a detailed data analysis? Got Pandas? Pandas has become a staple in the Python data science stack with strengths in data manipulation and analysis. In this workshop, we will focus on real-world data analysis scenarios that show the strengths of this library. We’ll cover basic Pandas data structures, core import/export and I/O functionality, manipulation of data in Pandas and the basics of Pandas data visualization. We will focus on the practical so that you can leave ready to apply your skills. We assume a basic working knowledge of Python and exposure to Jupyter Notebooks. + +## ABOUT + +This talk was originally given on August 17, 2017 at the Rocky Mountain Advanced Computing Consortium (RMACC) 2017 Symposium. + +There are two parts to this repository: + +1. **the slides** which can best be [viewed here](http://keithmaull.com/talks/20170817/slides), though the HTML source is [here](./slides); NOTE: _slides prepared using the [RISE plugin](https://github.com/damianavila/RISE) for Jupyter Notebooks and [NBExtensions](http://jupyter-contrib-nbextensions.readthedocs.io/en/latest/install.html)_ + +2. **the notebooks** which are supplemental to the slides (and also the basis for their content); they are [best viewed with NBViewer starting here](http://nbviewer.jupyter.org/urls/code.keithmaull.net/kmaull/talk_2017_08_RMACC_GotPandas/raw/master/nb/0_introduction.ipynb), but you are free to clone the repo and work on the notebooks from it or from NB + + +## SECTION 0: INTRODUCTION + +| ~ 10m | [notebook](http://nbviewer.jupyter.org/urls/code.keithmaull.net/kmaull/talk_2017_08_RMACC_GotPandas/raw/master/nb/0_introduction.ipynb) | [slides](http://keithmaull.com/talks/20170817/slides/0_introduction.slides.html)| +|-------------:|:-------------------------------------------------------------------| +| **Content** | what is pandas; why pandas; pandas v numpy; installing pandas | +| **Expected
Outcomes** | • basic introduction to the Pandas ecosystem
| + +

+ +## SECTION 1: PANDAS DATA STRUCTURES + +| ~20m | [notebook](http://nbviewer.jupyter.org/urls/code.keithmaull.net/kmaull/talk_2017_08_RMACC_GotPandas/raw/master/nb/1_data_structures.ipynb) | [slides](http://keithmaull.com/talks/20170817/slides/1_data_structures.slides.html) | +|-------------:|:-------------------------------------------------------------------| +| **Content** | core pandas data structures; series, dataframe, (optionally panel); basic concepts of data structures and manipulation strategies | +| **Expected
Outcomes** | • identify and utilize series and dataframe structures
• perform basic manipulation operations
• understand basic Pythonic manipulation concepts
| + +

+ +## SECTION 2: IMPORTING DATA + +| ~20m | [notebook](http://nbviewer.jupyter.org/urls/code.keithmaull.net/kmaull/talk_2017_08_RMACC_GotPandas/raw/master/nb/2_dataframe_operations.ipynb) | [slides](http://keithmaull.com/talks/20170817/slides/2_importing_data.slides.html) | +|-------------:|:-------------------------------------------------------------------| +| **Content** | importing data; csv and excel; json; sql; other supported data formats | +| **Expected
Outcomes** | • import data of various formats
• perform data imports into dataframes
• perform various conversions in Pandas
| + +

+ +## SECTION 3: MANIPULATING DATA + +| ~20m | [notebook](http://nbviewer.jupyter.org/urls/code.keithmaull.net/kmaull/talk_2017_08_RMACC_GotPandas/raw/master/nb/3_importing_data.ipynb) | [slides](http://keithmaull.com/talks/20170817/slides/3_dataframe_operations.slides.html) | +|-------------:|:-------------------------------------------------------------------| +| **Content** | basic terminology; selecting data; slicing dataframes; setting and assigning operations; built-in summary statistics | +| **Expected
Outcomes** | • understand the basic terminology
• perform selecting data by row, coloum
• perform selecting data by label/index and boolean selections
• perform slicing, merging and subsetting
• perform multi-indexing
• access basic stats and summary
| + +

+ +## SECTION 4: WRAPPING UP + +| ~15m | [notebook](http://nbviewer.jupyter.org/urls/code.keithmaull.net/kmaull/talk_2017_08_RMACC_GotPandas/raw/master/nb/4_wrapping_up.ipynb) | [slides](http://keithmaull.com/talks/20170817/slides/4_wrapping_up.slides.html) | +|-------------:|:-------------------------------------------------------------------| +| **Content** | putting it all together; finding the need for Pandas; integrating Pandas into data engineering workflows | +| **Expected
Outcomes** | • identify real-world use cases for Pandas
• navigate and utilize key online resources for further study
| + +

+ +## RESOURCES + +Resources to use to learn more about Pandas: + +* the [pydata documentation](http://pandas.pydata.org) is complete, if not overwhelming for the beginner +* [Pandas Cookbook](https://github.com/jvns/pandas-cookbook) on Github by Julia Evans (also on pydata.org) +* [Data Wrangling with Pandas cheat sheet](https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf) by pydata.org +* [Pandas for Data Science cheat sheet](https://www.datacamp.com/community/blog/python-pandas-cheat-sheet) by DataCamp.com + +## LICENSE +Originally created by Keith E. Maull, 2017. + +CC-BY-4.0 + +![](https://i.creativecommons.org/l/by/4.0/88x31.png) + +This work is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/). \ No newline at end of file diff --git a/slides/assets/dataframe.png b/slides/assets/dataframe.png new file mode 100644 index 0000000..1b2f489 Binary files /dev/null and b/slides/assets/dataframe.png differ diff --git a/slides/assets/mmap.png b/slides/assets/mmap.png new file mode 100644 index 0000000..8d641b7 Binary files /dev/null and b/slides/assets/mmap.png differ diff --git a/slides/assets/pandas_logo.png b/slides/assets/pandas_logo.png new file mode 100644 index 0000000..c07be23 Binary files /dev/null and b/slides/assets/pandas_logo.png differ diff --git a/slides/assets/series.png b/slides/assets/series.png new file mode 100644 index 0000000..fc90826 Binary files /dev/null and b/slides/assets/series.png differ