Pandas have mainly three types of data structures i.e Series, Dataframes, and Panel. These data structures are built on top of Numpy which means they are fast.
Dimension and Description
The best way to think of this data structure is that the higher dimensional data structure is the container of its lower dimensional data structure, for example, Dataframe is the container of Series and Panel is the container of Dataframe.
Building and handling two or more dimensional arrays is a tedious task, the burden is placed on the user to consider the orientation of the data set when writing functions. But using Pandas data structures, the mental effort of the user is reduced.
For example, with tabular data (DataFrame) it is more semantically helpful to think of the index (the rows) and the columns rather than axis 0 and axis 1.
Series are 1 dimensional labeled array capable of holding any data type (Strings, integers, Floats, Python objects etc.). The axis labels are collectively referred to as the index. Series contains homogeneous data, for example, the following series is a collection of integers.
- Series have homogeneous data
- They are size immutable
- Series have the value of data mutable
The basic method for creating a Series is by using the Series function of pandas module.
- data: -data takes various forms like ndarray, list, constants.
- index: -Index values must be unique and hashable, the same length as data. Default np.arrange(n) if no index is passed.
- dtype: – It tells the data type. If nothing is passed data type will be inferred.
- copy: – copy data. default False.
Note: -Labels of a Series need not be unique but they must be a hashable type
Here data can be:
- A Python Dictionary
- a constant value like 5.
The index is the list of axis labels and the length of the index should be equal to the length of the data.
The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN).
Series from Dict
Series can also be created from a dictionary. In this case, all the keys of the dictionary will be considered as row label.
If the index value is not provided then the Series will be in the same order as of dictionary. But if we provide the index value as the parameter, and the order of the index is not the same as of the dictionary, then the resultant Series will have the order of the provided index.
If nothing is provided as the data parameter, the Series will automatically fill all the index value as NaN (Not a Number).
Series From Scalar Value
Series can also be created from a Scalar value. If data is given as a scalar value then that scalar value will be broadcasted to all the index’s value.
Indexing and Slicing on Series
Just like ndarray we can perform indexing and slicing on the Series.
Series is Dict Like
A Series is like a fixed-size dict in that you can get and set values by index label:
Vectorized Operation On Series
Various arithmetic operations can be performed on a Series and for that, we don’t have to loop one by one over each element.
A key difference between Series and ndarray is that operations between Series automatically align the data based on the label. Thus, you can write computations without giving consideration to whether the Series involved having the same labels.
Operations between Series (+, -, /,,*) align values based on their associated index values– they need not be the same length. The resulting index will be the sorted union of the two indexes.
Naming a Series
Series can also have a name attribute.