Talk given at RMACC August 17, 2017 titled "Practical Data Wrangling in Pandas".
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Keith 03669f591c pushed slides html source 6 years ago
nb updated introduction 6 years ago
slides pushed slides html source 6 years ago
README.md updated title 6 years ago

README.md

Practical Data Wrangling with Pandas

ABSTRACT

Hacking Python? Need to import some Excel data and run a detailed data analysis? Got Pandas? Pandas has become a staple in the Python data science stack with strengths in data manipulation and analysis. In this workshop, we will focus on real-world data analysis scenarios that show the strengths of this library. We’ll cover basic Pandas data structures, core import/export and I/O functionality, manipulation of data in Pandas and the basics of Pandas data visualization. We will focus on the practical so that you can leave ready to apply your skills. We assume a basic working knowledge of Python and exposure to Jupyter Notebooks.

ABOUT

This talk was originally given on August 17, 2017 at the Rocky Mountain Advanced Computing Consortium (RMACC) 2017 Symposium.

There are two parts to this repository:

  1. the slides which can best be viewed here, though the HTML source is here; NOTE: slides prepared using the RISE plugin for Jupyter Notebooks and NBExtensions

  2. the notebooks which are supplemental to the slides (and also the basis for their content); they are best viewed with NBViewer starting here, but you are free to clone the repo and work on the notebooks from it or from NB

SECTION 0: INTRODUCTION

~ 10m notebook | slides
Content what is pandas; why pandas; pandas v numpy; installing pandas
Expected
Outcomes
• basic introduction to the Pandas ecosystem



SECTION 1: PANDAS DATA STRUCTURES

~20m notebook | slides
Content core pandas data structures; series, dataframe, (optionally panel); basic concepts of data structures and manipulation strategies
Expected
Outcomes
• identify and utilize series and dataframe structures
• perform basic manipulation operations
• understand basic Pythonic manipulation concepts



SECTION 2: IMPORTING DATA

~20m notebook | slides
Content importing data; csv and excel; json; sql; other supported data formats
Expected
Outcomes
• import data of various formats
• perform data imports into dataframes
• perform various conversions in Pandas



SECTION 3: MANIPULATING DATA

~20m notebook | slides
Content basic terminology; selecting data; slicing dataframes; setting and assigning operations; built-in summary statistics
Expected
Outcomes
• understand the basic terminology
• perform selecting data by row, coloum
• perform selecting data by label/index and boolean selections
• perform slicing, merging and subsetting
• perform multi-indexing
• access basic stats and summary



SECTION 4: WRAPPING UP

~15m notebook | slides
Content putting it all together; finding the need for Pandas; integrating Pandas into data engineering workflows
Expected
Outcomes
• identify real-world use cases for Pandas
• navigate and utilize key online resources for further study



RESOURCES

Resources to use to learn more about Pandas:

LICENSE

Originally created by Keith E. Maull, 2017.

CC-BY-4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.