Talk given at RMACC August 17, 2017 titled "Practical Data Wrangling in Pandas".
Nevar pievienot vairāk kā 25 tēmas Tēmai ir jāsākas ar burtu vai ciparu, tā var saturēt domu zīmes ('-') un var būt līdz 35 simboliem gara.

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
  1. ## ABSTRACT
  2. Hacking Python? Need to import some Excel data and run a detailed data analysis? Got Pandas? Pandas has become a staple in the Python data science stack with strengths in data manipulation and analysis. In this workshop, we will focus on real-world data analysis scenarios that show the strengths of this library. We’ll cover basic Pandas data structures, core import/export and I/O functionality, manipulation of data in Pandas and the basics of Pandas data visualization. We will focus on the practical so that you can leave ready to apply your skills. We assume a basic working knowledge of Python and exposure to Jupyter Notebooks.
  3. ## ABOUT
  4. This talk was originally given on August 17, 2017 at the Rocky Mountain Advanced Computing Consortium (RMACC) 2017 Symposium.
  5. There are two parts to this repository:
  6. 1. **the slides** which can best be [viewed here](http://keithmaull.com/talks/20170817/slides), though the HTML source is [here](./slides); NOTE: _slides prepared using the [RISE plugin](https://github.com/damianavila/RISE) for Jupyter Notebooks and [NBExtensions](http://jupyter-contrib-nbextensions.readthedocs.io/en/latest/install.html)_
  7. 2. **the notebooks** which are supplemental to the slides (and also the basis for their content); they are [best viewed with NBViewer starting here](http://nbviewer.jupyter.org/urls/code.keithmaull.net/kmaull/talk_2017_08_RMACC_GotPandas/raw/master/nb/0_introduction.ipynb), but you are free to clone the repo and work on the notebooks from it or from NB
  8. ## SECTION 0: INTRODUCTION
  9. | ~ 10m | [notebook](http://nbviewer.jupyter.org/urls/code.keithmaull.net/kmaull/talk_2017_08_RMACC_GotPandas/raw/master/nb/0_introduction.ipynb) | [slides](http://keithmaull.com/talks/20170817/slides/0_introduction.slides.html)|
  10. |-------------:|:-------------------------------------------------------------------|
  11. | **Content** | what is pandas; why pandas; pandas v numpy; installing pandas |
  12. | **Expected<br/>Outcomes** | &#8226; basic introduction to the Pandas ecosystem<br/> |
  13. <br/><br/>
  14. ## SECTION 1: PANDAS DATA STRUCTURES
  15. | ~20m | [notebook](http://nbviewer.jupyter.org/urls/code.keithmaull.net/kmaull/talk_2017_08_RMACC_GotPandas/raw/master/nb/1_data_structures.ipynb) &#124; [slides](http://keithmaull.com/talks/20170817/slides/1_data_structures.slides.html) |
  16. |-------------:|:-------------------------------------------------------------------|
  17. | **Content** | core pandas data structures; series, dataframe, (optionally panel); basic concepts of data structures and manipulation strategies |
  18. | **Expected<br/>Outcomes** | &#8226; identify and utilize series and dataframe structures<br/>&#8226; perform basic manipulation operations<br/>&#8226; understand basic Pythonic manipulation concepts<br/> |
  19. <br/><br/>
  20. ## SECTION 2: IMPORTING DATA
  21. | ~20m | [notebook](http://nbviewer.jupyter.org/urls/code.keithmaull.net/kmaull/talk_2017_08_RMACC_GotPandas/raw/master/nb/2_importing_data.ipynb) &#124; [slides](http://keithmaull.com/talks/20170817/slides/2_importing_data.slides.html) |
  22. |-------------:|:-------------------------------------------------------------------|
  23. | **Content** | importing data; csv and excel; json; sql; other supported data formats |
  24. | **Expected<br/>Outcomes** | &#8226; import data of various formats<br/>&#8226; perform data imports into dataframes<br/>&#8226; perform various conversions in Pandas<br/> |
  25. <br/><br/>
  26. ## SECTION 3: MANIPULATING DATA
  27. | ~20m | [notebook](http://nbviewer.jupyter.org/urls/code.keithmaull.net/kmaull/talk_2017_08_RMACC_GotPandas/raw/master/nb/3_dataframe_operations.ipynb) &#124; [slides](http://keithmaull.com/talks/20170817/slides/3_dataframe_operations.slides.html) |
  28. |-------------:|:-------------------------------------------------------------------|
  29. | **Content** | basic terminology; selecting data; slicing dataframes; setting and assigning operations; built-in summary statistics |
  30. | **Expected<br/>Outcomes** | &#8226; understand the basic terminology<br/>&#8226; perform selecting data by row, coloum<br/>&#8226; perform selecting data by label/index and boolean selections<br/>&#8226; perform slicing, merging and subsetting<br/>&#8226; perform multi-indexing<br/>&#8226; access basic stats and summary<br/> |
  31. <br/><br/>
  32. ## SECTION 4: WRAPPING UP
  33. | ~15m | [notebook](http://nbviewer.jupyter.org/urls/code.keithmaull.net/kmaull/talk_2017_08_RMACC_GotPandas/raw/master/nb/4_wrapping_up.ipynb) &#124; [slides](http://keithmaull.com/talks/20170817/slides/4_wrapping_up.slides.html) |
  34. |-------------:|:-------------------------------------------------------------------|
  35. | **Content** | putting it all together; finding the need for Pandas; integrating Pandas into data engineering workflows |
  36. | **Expected<br/>Outcomes** | &#8226; identify real-world use cases for Pandas<br/>&#8226; navigate and utilize key online resources for further study<br/> |
  37. <br/><br/>
  38. ## RESOURCES
  39. Resources to use to learn more about Pandas:
  40. * the [pydata documentation](http://pandas.pydata.org) is complete, if not overwhelming for the beginner
  41. * [Pandas Cookbook](https://github.com/jvns/pandas-cookbook) on Github by Julia Evans (also on pydata.org)
  42. * [Data Wrangling with Pandas cheat sheet](https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf) by pydata.org
  43. * [Pandas for Data Science cheat sheet](https://www.datacamp.com/community/blog/python-pandas-cheat-sheet) by DataCamp.com
  44. ## LICENSE
  45. Originally created by Keith E. Maull, 2017.
  46. CC-BY-4.0
  47. ![](https://i.creativecommons.org/l/by/4.0/88x31.png)
  48. This work is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).