Introduction to pandas

Pandas is derived from the term “panel data”, a term for data sets that include observations over multiple time periods for the same individuals. It is an open source library providing great Data manipulation and analysis tool. Pandas was developed by Wes McKinney in 2011.

Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze. It can be used for analysis of data of almost any type for instance .csv, .tsv, .txt, .sql etc.

To make tabular data we can use 2D-Numpy  array but it has a problem that all the data types should be the same so to work with various types of data at the same time we need pandas package. It is a high-level data manipulation tool built on Numpy package meaning a lot of the structure of NumPy is used or replicated in Pandas. Data in pandas is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.

tabular data

when working with pandas each observation of a Row has a variable that is, Column. Here AIRLINEID, AIRNAME, CONTACTNO, EMAILID are variables and have 4 observations resp.

Pandas have mainly three types of data structures i.e Series, Dataframes, and Panel. These data structures are built on top of Numpy which means they are fast.

How To Install and Import Pandas

  • To install Pandas follow these  2 steps (for windows).
  1.  Open Command Prompt.
  2. write pip install pandas.
  • If you are working on jupyter Notebook use this


The ! at the beginning runs cells as if they were in a terminal.

  • To Import pandas use

Here pd is the short name which will be used for calling pandas as it is used many times.

 

To Learn about Numpy click

To learn about Pandas Data structures click

Close Menu