Table of Contents¶

1 Pandas

1.1 Why Pandas?

1.2 How Pandas?

NAVIGATION

Got Pandas? Practical Data Wrangling with Pandas

Introduction

NOTEBOOK OBJECTIVES

In this notebook we'll:

explore the purpose of Pandas,
understand where Pandas fits in the scientific data analysis ecosystem,
understand installation options.

Pandas¶

Pandas is a fantastic library, and if you don't Got Pandas? ... perhaps it is time you do.

Pandas is a fast and built on top of NumPy with dependencies on statsmodel, so if you have familiarity with NumPy, Pandas might be what you've always wanted and never knew you did!

For readers who are familiar with R and considering Python, Pandas may be the right tool to make the transition smoothly as the core DataFrame structure in Pandas is modeled after that of R's data.frame.

Pandas has many strengths but here are a few that might pique your interests:

flexible, consistent data import and export from a wide array of sources, including SQL, CSV, EXCEL, etc.
tabular / matrix data representation with heterogeneous labeled or unlabeled columns
intuitive handling of missing data
import and conversion of data to / from NumPy
sophisticated slicing, indexing and subsetting of data
support for hierarchical labeling of data
support for time series data, including time/date conversion, moving windows, etc.
and much more ...

picture

Why Pandas?¶

Pandas has become known as the go-to library in the Python data science stack. With its strong support for importing various data formats, it can be the first tool you might use to work with, manipulate, convert, reorganize and prepare data for analysis.

Pandas is not a replacement for NumPy, but rather a supplement to it. With its sophisticated indexing, it becomes a more powerful way to access and prepare data for analysis in NumPy, and in many cases it will become a necessary compliment to the features already provided by NumPy.

Pandas brings the fun back into data engineering, and once mastered is one of many tools that will be required for doing high quality data analysis in Python.

Everything you'd every want to know about Python can be found :

pandas.pydata.org: go here for complete, up-to-date documentation on the latest and greatest of Pandas
github.com/pandas-dev/pandas: if you want to browse source code for the project

There are also many great tutorials around the web and in the blogosphere.

How Pandas?¶

Pandas can be installed in Python 2 and Python 3, though it is recommended to use Python 3 as Python 2 will soon lose support and updates.

Pandas can be installed from a variety of mechanisms.

If you've installed Anaconda then you need do nothing -- Pandas is installed by default in the conda stack.

If you want, you can install Pandas via binaries from Pypi or you can install via pip:

pip install pandas

should get you going.

For more about installation, please see:

complete Pandas installation documentation on pydata.org Ξ

6.1 KiB Raw Blame History

Table of Contents¶

Pandas¶

Why Pandas?¶

How Pandas?¶

6.1 KiB

Raw Blame History