Computational Tools

Pandas have various types of computational tools for all the data structures ( Series, DataFrame, Panel) which help to compute various numerical or statistical values of the rows and columns. Different types of computational tools are

  • Statistical Functions
  • Window Functions
  • Aggregation
  • Expanding Windows

Statistical Functions

Various statistical functions are available for Series, DataFrames, Panels to get a better understanding of the data.

Percent Change

Pct_change() is the method to compute the percent change over a given number of periods. We can use the fill_method to fill the null values before computing percent change. This method will calculate the percent change between the two numbers while going down the series this is the reason that first entry of the resultant dataframe is  NaN because there is no value for calculating the change. The second value is compared with first and third is compared with the second and so on

Covariance

Series.cov() can be used to compute covariance between series. Analogously, DataFrame.cov() to compute pairwise covariances among the series in the DataFrame, also excluding NA/null values.

In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, (i.e., the variables tend to show similar behavior), the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other, (i.e., the variables tend to show the opposite behavior), the covariance is negative. The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not easy to interpret because it is not normalized and hence depends on the magnitudes of the variables.

Correlation

Correlation may be computed using the corr() method. Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. Correlation is the method to compute the relationship between different variables its value lies between -1 and 1 where +ve value shows that the variables are directly proportional and -ve value shows that variables are inversely proportional and 0 shows that they do not have any effect on each other strength of the proportionality is depicted by the value of the covariance that is how much it is near to +1 or how much it is near to -1.

Window Functions

For working with data, a number of window functions are provided for computing common window or rolling statistics. Among these are count, sum, mean, median, correlation, variance, covariance, standard deviation, skewness, and kurtosis.

The rolling and expanding functions can be used directly from DataFrameGroupBy objects.

Here we can see that the first 4 entries of the resultant dataframe are NaN. This is because of the window size=5. This means that first 5 entries are taken and the mean of them is calculated and placed on the 5th value and then next five values are taken and same computations are done on them in this way first 4 values are NaN.

Various methods can be applied to the windows.

count() Number of non-null observations
sum() Sum of values
mean() Mean of values
median() The arithmetic median of values
min() Minimum
max() Maximum
std() Bessel-corrected sample standard deviation
var() Unbiased variance
skew() Sample skewness (3rd moment)
kurt() Sample kurtosis (4th moment)
quantile() Sample quantile (value at %)
apply() Generic apply
cov() Unbiased covariance (binary)
corr() Correlation (binary)

 

Aggregation

This method is used to perform multiple computations on the data. We use aggregation when we want to perform a task on the whole of the dataframe. We can aggregate by passing a function to the entire DataFrame, or select a Series (or multiple Series) via standard  __getitem__ and this will perform that function on that series.

we can also perform multiple functions by one aggregation for that we have to use agg() method and pass every function that we want to perform as a list.

we could also do perform multiple computations on a window dataframe, In that case for each function, a separate column will be formed showing the result of the function for that particular column. So if we take this example then on a window dataframe there would be 6 column 3 for sum and 3 for the mean.

Expanding Window

A common alternative to rolling statistics is to use an expanding window, which yields the value of the statistic with all the data available up to that point in time.

These follow a similar interface to .rolling, with the .expanding method returning an Expanding object.

As these calculations are a special case of rolling statistics, they are implemented in pandas such that the following two calls are equivalent:

Various methods can be applied to the expanded windows.

count() Number of non-null observations
sum() Sum of values
mean() Mean of values
median() The arithmetic median of values
min() Minimum
max() Maximum
std() Bessel-corrected sample standard deviation
var() Unbiased variance
skew() Sample skewness (3rd moment)
kurt() Sample kurtosis (4th moment)
quantile() Sample quantile (value at %)
apply() Generic apply
cov() Unbiased covariance (binary)
corr() Correlation (binary)

 

Aside from not having a window parameter, these functions have the same interfaces as their .rolling counterparts. Like above, the parameters they all accept are:

  • min_periods: threshold of non-null data points to require. Defaults to a minimum needed to compute statistic. No NaNs will be output once min_periods non-null data points have been seen.
  • center: boolean, whether to set the labels at the center (default is False).
Close Menu