Keith 03669f591c | 7 years ago | |
---|---|---|
nb | 7 years ago | |
slides | 7 years ago | |
README.md | 7 years ago |
Hacking Python? Need to import some Excel data and run a detailed data analysis? Got Pandas? Pandas has become a staple in the Python data science stack with strengths in data manipulation and analysis. In this workshop, we will focus on real-world data analysis scenarios that show the strengths of this library. We’ll cover basic Pandas data structures, core import/export and I/O functionality, manipulation of data in Pandas and the basics of Pandas data visualization. We will focus on the practical so that you can leave ready to apply your skills. We assume a basic working knowledge of Python and exposure to Jupyter Notebooks.
This talk was originally given on August 17, 2017 at the Rocky Mountain Advanced Computing Consortium (RMACC) 2017 Symposium.
There are two parts to this repository:
the slides which can best be viewed here, though the HTML source is here; NOTE: slides prepared using the RISE plugin for Jupyter Notebooks and NBExtensions
the notebooks which are supplemental to the slides (and also the basis for their content); they are best viewed with NBViewer starting here, but you are free to clone the repo and work on the notebooks from it or from NB
~ 10m | notebook | slides |
---|---|
Content | what is pandas; why pandas; pandas v numpy; installing pandas |
Expected Outcomes |
• basic introduction to the Pandas ecosystem |
~20m | notebook | slides |
---|---|
Content | core pandas data structures; series, dataframe, (optionally panel); basic concepts of data structures and manipulation strategies |
Expected Outcomes |
• identify and utilize series and dataframe structures • perform basic manipulation operations • understand basic Pythonic manipulation concepts |
~20m | notebook | slides |
---|---|
Content | importing data; csv and excel; json; sql; other supported data formats |
Expected Outcomes |
• import data of various formats • perform data imports into dataframes • perform various conversions in Pandas |
~20m | notebook | slides |
---|---|
Content | basic terminology; selecting data; slicing dataframes; setting and assigning operations; built-in summary statistics |
Expected Outcomes |
• understand the basic terminology • perform selecting data by row, coloum • perform selecting data by label/index and boolean selections • perform slicing, merging and subsetting • perform multi-indexing • access basic stats and summary |
~15m | notebook | slides |
---|---|
Content | putting it all together; finding the need for Pandas; integrating Pandas into data engineering workflows |
Expected Outcomes |
• identify real-world use cases for Pandas • navigate and utilize key online resources for further study |
Resources to use to learn more about Pandas:
Originally created by Keith E. Maull, 2017.
CC-BY-4.0
This work is licensed under a Creative Commons Attribution 4.0 International License.