|
@@ -0,0 +1,172 @@ |
|
|
|
|
|
{ |
|
|
|
|
|
"cells": [ |
|
|
|
|
|
{ |
|
|
|
|
|
"cell_type": "markdown", |
|
|
|
|
|
"metadata": { |
|
|
|
|
|
"toc": "true" |
|
|
|
|
|
}, |
|
|
|
|
|
"source": [ |
|
|
|
|
|
"# Table of Contents\n", |
|
|
|
|
|
" <p><div class=\"lev1 toc-item\"><a href=\"#Pandas\" data-toc-modified-id=\"Pandas-1\"><span class=\"toc-item-num\">1 </span>Pandas</a></div><div class=\"lev2 toc-item\"><a href=\"#Why-Pandas?\" data-toc-modified-id=\"Why-Pandas?-11\"><span class=\"toc-item-num\">1.1 </span>Why Pandas?</a></div><div class=\"lev2 toc-item\"><a href=\"#How-Pandas?\" data-toc-modified-id=\"How-Pandas?-12\"><span class=\"toc-item-num\">1.2 </span>How Pandas?</a></div>" |
|
|
|
|
|
] |
|
|
|
|
|
}, |
|
|
|
|
|
{ |
|
|
|
|
|
"cell_type": "markdown", |
|
|
|
|
|
"metadata": {}, |
|
|
|
|
|
"source": [ |
|
|
|
|
|
"** NAVIGATION **\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"**Got Pandas? _Practical Data Wrangling with Pandas_**\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"* **Introduction**\n", |
|
|
|
|
|
"1. [Data Structures](./0_data_structures.ipynb)\n", |
|
|
|
|
|
"2. [Importing Data](./1_importing_data.ipynb)\n", |
|
|
|
|
|
"3. [Manipulating DataFrames](./2_dataframe_operations.ipynb)\n", |
|
|
|
|
|
"4. [Wrap Up](3_wrapping_up.ipynb)\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"---" |
|
|
|
|
|
] |
|
|
|
|
|
}, |
|
|
|
|
|
{ |
|
|
|
|
|
"cell_type": "markdown", |
|
|
|
|
|
"metadata": {}, |
|
|
|
|
|
"source": [ |
|
|
|
|
|
"**NOTEBOOK OBJECTIVES**\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"In this notebook we'll:\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"* explore the purpose of Pandas,\n", |
|
|
|
|
|
"* understand where Pandas fits in the scientific data analysis ecosystem,\n", |
|
|
|
|
|
"* understand installation options." |
|
|
|
|
|
] |
|
|
|
|
|
}, |
|
|
|
|
|
{ |
|
|
|
|
|
"cell_type": "markdown", |
|
|
|
|
|
"metadata": {}, |
|
|
|
|
|
"source": [ |
|
|
|
|
|
"# Pandas\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"Pandas is a fantastic library, and if you don't _Got Pandas?_ ... perhaps it is time you do.\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"Pandas is a fast and built on top of [NumPy](http://www.numpy.org/) with dependencies on [statsmodel](http://www.statsmodels.org/stable/index.html), so if you have familiarity with NumPy, Pandas might be what you've always wanted and never knew you did!\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"For readers who are familiar with R and considering Python, Pandas may be the right tool to make the transition smoothly as the core DataFrame structure in Pandas is modeled after that of R's `data.frame`.\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"Pandas has many strengths but here are a few that might pique your interests:\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"* flexible, consistent data import and export from a wide array of sources, including SQL, CSV, EXCEL, etc.\n", |
|
|
|
|
|
"* tabular / matrix data representation with heterogeneous labeled or unlabeled columns\n", |
|
|
|
|
|
"* intuitive handling of missing data \n", |
|
|
|
|
|
"* import and conversion of data to / from NumPy\n", |
|
|
|
|
|
"* sophisticated slicing, indexing and subsetting of data\n", |
|
|
|
|
|
"* support for hierarchical labeling of data\n", |
|
|
|
|
|
"* support for time series data, including time/date conversion, moving windows, etc.\n", |
|
|
|
|
|
"* and much more ...\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"\n" |
|
|
|
|
|
] |
|
|
|
|
|
}, |
|
|
|
|
|
{ |
|
|
|
|
|
"cell_type": "markdown", |
|
|
|
|
|
"metadata": {}, |
|
|
|
|
|
"source": [ |
|
|
|
|
|
"```\n", |
|
|
|
|
|
"picture\n", |
|
|
|
|
|
"```" |
|
|
|
|
|
] |
|
|
|
|
|
}, |
|
|
|
|
|
{ |
|
|
|
|
|
"cell_type": "markdown", |
|
|
|
|
|
"metadata": {}, |
|
|
|
|
|
"source": [ |
|
|
|
|
|
"## Why Pandas?\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"Pandas has become known as the go-to library in the Python data science stack. With its strong support for importing various data formats, it can be the _first tool_ you might use to work with, manipulate, convert, reorganize and prepare data for analysis.\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"Pandas is not a replacement for NumPy, but rather a supplement to it. With its sophisticated indexing, it becomes a more powerful way to access and prepare data for analysis in NumPy, and in many cases it will become a necessary compliment to the features already provided by NumPy.\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"Pandas brings the fun back into data engineering, and once mastered is one of many tools that will be required for doing high quality data analysis in Python.\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"Everything you'd every want to know about Python can be found :\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"* [pandas.pydata.org](http://pandas.pydata.org): go here for complete, up-to-date documentation on the latest and greatest of Pandas\n", |
|
|
|
|
|
"* [github.com/pandas-dev/pandas](http://github.com/pandas-dev/pandas): if you want to browse source code for the project\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"There are also many great tutorials around the web and in the blogosphere." |
|
|
|
|
|
] |
|
|
|
|
|
}, |
|
|
|
|
|
{ |
|
|
|
|
|
"cell_type": "markdown", |
|
|
|
|
|
"metadata": {}, |
|
|
|
|
|
"source": [ |
|
|
|
|
|
"## How Pandas?\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"Pandas can be installed in Python 2 and Python 3, though it is recommended to use Python 3 as Python 2 will soon lose support and updates.\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"Pandas can be installed from a variety of mechanisms.\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"If you've installed [Anaconda](https://www.continuum.io/what-is-anaconda) then you need do nothing -- Pandas is installed by default in the conda stack.\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"If you want, you can install Pandas via [binaries from Pypi](http://pypi.python.org/pypi/pandas) or you can install via `pip`:\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"```bash\n", |
|
|
|
|
|
"pip install pandas\n", |
|
|
|
|
|
"```\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"should get you going.\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"For more about installation, please see:\n", |
|
|
|
|
|
"\n", |
|
|
|
|
|
"* [complete Pandas installation documentation on pydata.org](http://pandas.pydata.org/pandas-docs/stable/install.html)\n", |
|
|
|
|
|
"Ξ" |
|
|
|
|
|
] |
|
|
|
|
|
} |
|
|
|
|
|
], |
|
|
|
|
|
"metadata": { |
|
|
|
|
|
"anaconda-cloud": {}, |
|
|
|
|
|
"kernelspec": { |
|
|
|
|
|
"display_name": "Python [conda root]", |
|
|
|
|
|
"language": "python", |
|
|
|
|
|
"name": "conda-root-py" |
|
|
|
|
|
}, |
|
|
|
|
|
"language_info": { |
|
|
|
|
|
"codemirror_mode": { |
|
|
|
|
|
"name": "ipython", |
|
|
|
|
|
"version": 3 |
|
|
|
|
|
}, |
|
|
|
|
|
"file_extension": ".py", |
|
|
|
|
|
"mimetype": "text/x-python", |
|
|
|
|
|
"name": "python", |
|
|
|
|
|
"nbconvert_exporter": "python", |
|
|
|
|
|
"pygments_lexer": "ipython3", |
|
|
|
|
|
"version": "3.6.1" |
|
|
|
|
|
}, |
|
|
|
|
|
"toc": { |
|
|
|
|
|
"colors": { |
|
|
|
|
|
"hover_highlight": "#DAA520", |
|
|
|
|
|
"navigate_num": "#000000", |
|
|
|
|
|
"navigate_text": "#333333", |
|
|
|
|
|
"running_highlight": "#FF0000", |
|
|
|
|
|
"selected_highlight": "#FFD700", |
|
|
|
|
|
"sidebar_border": "#EEEEEE", |
|
|
|
|
|
"wrapper_background": "#FFFFFF" |
|
|
|
|
|
}, |
|
|
|
|
|
"moveMenuLeft": true, |
|
|
|
|
|
"nav_menu": { |
|
|
|
|
|
"height": "67px", |
|
|
|
|
|
"width": "251px" |
|
|
|
|
|
}, |
|
|
|
|
|
"navigate_menu": true, |
|
|
|
|
|
"number_sections": false, |
|
|
|
|
|
"sideBar": true, |
|
|
|
|
|
"threshold": 4, |
|
|
|
|
|
"toc_cell": true, |
|
|
|
|
|
"toc_section_display": "block", |
|
|
|
|
|
"toc_window_display": false, |
|
|
|
|
|
"widenNotebook": false |
|
|
|
|
|
} |
|
|
|
|
|
}, |
|
|
|
|
|
"nbformat": 4, |
|
|
|
|
|
"nbformat_minor": 2 |
|
|
|
|
|
} |