Pandas is derived from the term “panel data”, a term for data sets that include observations over multiple time periods for the same individuals. It is an open source library providing great Data manipulation and analysis tool. Pandas was developed by Wes McKinney in 2011.

Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze. It can be used for analysis of data of almost any type for instance .csv, .tsv, .txt, .sql etc.

To make tabular data we can use 2D-**Numpy** array but it has a problem that all the data types should be the same so to work with various types of data at the same time we need pandas package. It is a high-level data manipulation tool built on **Numpy** package meaning a lot of the structure of **NumPy** is used or replicated in Pandas. Data in pandas is often used to feed statistical analysis in **SciPy**, plotting functions from **Matplotlib**, and machine learning algorithms in **Scikit-learn**.

when working with pandas each observation of a Row has a variable that is, Column. Here AIRLINEID, AIRNAME, CONTACTNO, EMAILID are variables and have 4 observations resp.

**Pandas** have mainly three types of data structures i.e **Series**, **Dataframes**, and **Panel**. These data structures are built on top of **Numpy** which means they are fast.

## Series

** Series** are 1 dimensional labeled array capable of holding any data type (Strings, integers, Floats, Python objects, etc.). The axis labels are collectively referred to as the index. **Series** contains homogeneous data, for example, the following **series** is a collection of integers.

**Series**have homogeneous data- They are size immutable
**Series**have the value of data mutable

The basic method for creating a **Series** is by using the **Series** function of pandas module.

pd.**Series**(data,index,dtype,copy)

- data: -data takes various forms like ndarray, list, constants.
- index: -Index values must be unique and hashable, the same length as data. Default
**np.arrange(n)**if no index is passed. - dtype: – It tells the data type. If nothing is passed data type will be inferred.
- copy: – copy data. default False.

Note: -Labels of a **Series** need not be unique but they must be a hashable type

Here data can be:

- A Python Dictionary
- ndarray
- a constant value like 5.

The index is the list of axis labels and the length of the index should be equal to the length of the data.

The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN).

To learn more about Series DataStructure click.

## DataFrame

**DataFrame** is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, **DataFrame** accepts many different kinds of input:

- Dict of 1D ndarrays, lists, dicts, or Series
- 2-D numpy.ndarray
- Structured or record ndarray
- A Series
- Another
**DataFrame**

We can also say that a **DataFrame** is a collection of Series. So we can also pass the index or column label to a **DataFrame**.

**Why DataFrames?**

Data science involves processing, analyzing, and visualizing data. While some tools like Microsoft Excel allow us to perform basic data science tasks, they’re limited to the functionality built into the user interface. If you want to work with datasets that aren’t structured like a spreadsheet or create entirely new data visualizations from scratch, you’ll need to become proficient in **programming**. Instead of using a program written by others that can solve a narrow set of tasks, you can create your own programs that can solve your specific problems.

Programming involves organizing a collection of instructions into a program for a computer to carry out. To express these instructions, we use a **programming language **like Python which has a Pandas library that contains a DataFrame data structure which helps to perform various data analysis task.

Explore data analysis with Python. Pandas **DataFrames** make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data

Pandas is a popular Python package for data science, and with good reason: it offers powerful, expressive and flexible data structures that make data manipulation and analysis easy, among many other things. The **DataFrame** is one of these structures.

One of the main advantages of using pandas **DataFrames** instead of Numpy is that **DataFrams** allows you to have columns of various data types.

We can also perform basic data analysis task with excel spreadsheets also but pandas **DataFrame** allows much more functions which are easy to perform and many types of visualization in comparison to excel spreadsheets.

We can create **DataFrame** from various methods.

**Pd.DataFrame()**

*class pandas. DatFrame*(

*data=None*,

*index=None*,

*columns=None*,

*dtype=None*,

*copy=False*)

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.

**data**: It can contain dict, a list like an object, array, etc.

**index**: Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.

**columns**: column label to use for resulting D**ataFrame**. by default (0,1,2,3,4…) that is, range index.

**dtype: **Data type to force. Only a single dtype is allowed.

To learn more about Dataframe click.

## Panel

**A panel** is an important container for 3-D data. It is somewhat less used and it is responsible for the name pandas pan(el)-da(ta)-s. Nowadays **Panel** is not used and is also not available on the newer version of Python. **Panels** are replaced by the multi-index properties of dataframe. Its 3 axes are named to so that they describe the operations that are performed on them.

**items**: axis 0, each item corresponds to a DataFrame contained inside**major_axis**: axis 1, it is the**index**(rows) of each of the DataFrames**minor_axis**: axis 2, it is the**columns**of each of the DataFrames

Construction of **Panels** works about like you would expect.

To learn more about Panel click.