NAVIGATION
Got Pandas? Practical Data Wrangling with Pandas
NOTEBOOK OBJECTIVES
In this notebook we'll:
Pandas is a fantastic library, and if you don't Got Pandas? ... perhaps it is time you do.
Pandas is a fast and built on top of NumPy with dependencies on statsmodel, so if you have familiarity with NumPy, Pandas might be what you've always wanted and never knew you did!
For readers who are familiar with R and considering Python, Pandas may be the right tool to make the transition smoothly as the core DataFrame structure in Pandas is modeled after that of R's data.frame
.
Pandas has many strengths but here are a few that might pique your interests:
picture
Pandas has become known as the go-to library in the Python data science stack. With its strong support for importing various data formats, it can be the first tool you might use to work with, manipulate, convert, reorganize and prepare data for analysis.
Pandas is not a replacement for NumPy, but rather a supplement to it. With its sophisticated indexing, it becomes a more powerful way to access and prepare data for analysis in NumPy, and in many cases it will become a necessary compliment to the features already provided by NumPy.
Pandas brings the fun back into data engineering, and once mastered is one of many tools that will be required for doing high quality data analysis in Python.
Everything you'd every want to know about Python can be found :
There are also many great tutorials around the web and in the blogosphere.
Pandas can be installed in Python 2 and Python 3, though it is recommended to use Python 3 as Python 2 will soon lose support and updates.
Pandas can be installed from a variety of mechanisms.
If you've installed Anaconda then you need do nothing -- Pandas is installed by default in the conda stack.
If you want, you can install Pandas via binaries from Pypi or you can install via pip
:
pip install pandas
should get you going.
For more about installation, please see: