Talk given at RMACC August 17, 2017 titled "Practical Data Wrangling in Pandas".

Keith 03669f591c pushed slides html source		8 years ago
nb	updated introduction	8 years ago
slides	pushed slides html source	8 years ago
README.md	updated title	8 years ago

README.md

Practical Data Wrangling with Pandas

ABSTRACT

Hacking Python? Need to import some Excel data and run a detailed data analysis? Got Pandas? Pandas has become a staple in the Python data science stack with strengths in data manipulation and analysis. In this workshop, we will focus on real-world data analysis scenarios that show the strengths of this library. We’ll cover basic Pandas data structures, core import/export and I/O functionality, manipulation of data in Pandas and the basics of Pandas data visualization. We will focus on the practical so that you can leave ready to apply your skills. We assume a basic working knowledge of Python and exposure to Jupyter Notebooks.

ABOUT

This talk was originally given on August 17, 2017 at the Rocky Mountain Advanced Computing Consortium (RMACC) 2017 Symposium.

There are two parts to this repository:

the slides which can best be viewed here, though the HTML source is here; NOTE: slides prepared using the RISE plugin for Jupyter Notebooks and NBExtensions
the notebooks which are supplemental to the slides (and also the basis for their content); they are best viewed with NBViewer starting here, but you are free to clone the repo and work on the notebooks from it or from NB

SECTION 0: INTRODUCTION

~ 10m	notebook \| slides
Content	what is pandas; why pandas; pandas v numpy; installing pandas
Expected Outcomes	• basic introduction to the Pandas ecosystem

SECTION 1: PANDAS DATA STRUCTURES

~20m	notebook \| slides
Content	core pandas data structures; series, dataframe, (optionally panel); basic concepts of data structures and manipulation strategies
Expected Outcomes	• identify and utilize series and dataframe structures • perform basic manipulation operations • understand basic Pythonic manipulation concepts

SECTION 2: IMPORTING DATA

~20m	notebook \| slides
Content	importing data; csv and excel; json; sql; other supported data formats
Expected Outcomes	• import data of various formats • perform data imports into dataframes • perform various conversions in Pandas

SECTION 3: MANIPULATING DATA

~20m	notebook \| slides
Content	basic terminology; selecting data; slicing dataframes; setting and assigning operations; built-in summary statistics
Expected Outcomes	• understand the basic terminology • perform selecting data by row, coloum • perform selecting data by label/index and boolean selections • perform slicing, merging and subsetting • perform multi-indexing • access basic stats and summary

SECTION 4: WRAPPING UP

~15m	notebook \| slides
Content	putting it all together; finding the need for Pandas; integrating Pandas into data engineering workflows
Expected Outcomes	• identify real-world use cases for Pandas • navigate and utilize key online resources for further study

RESOURCES

Resources to use to learn more about Pandas:

the pydata documentation is complete, if not overwhelming for the beginner
Pandas Cookbook on Github by Julia Evans (also on pydata.org)
Data Wrangling with Pandas cheat sheet by pydata.org
Pandas for Data Science cheat sheet by DataCamp.com

LICENSE

Originally created by Keith E. Maull, 2017.

CC-BY-4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.