For working with data, a number of window functions are provided for computing common window or rolling statistics. Among these are count, sum, mean, median, correlation, variance, covariance, standard deviation, skewness, and kurtosis.
The rolling and expanding functions can be used directly from DataFrameGroupBy objects.
Window Functions in pandas
- Windows identify sub-periods of your time series
- Calculate metrics for sub-periods inside the window.
- Create a new time series of metrics
Two types of windows:
- Rolling: same size, sliding.
- Expanding: contain all prior values.
Calculating a Rolling Average
We can use the rolling function to create a rolling window of size 30 so that we can get the average values month wise. A rolling window will simply leave the first 29 values because there are not enough values to compute mean as we have created the window of size 30. After 29 that is on 30 the average value of first 30 values will be computed(1-30) and will be placed on 30th value 31st value would be the mean of next 30 values (2-31) and so on.
In the same way, we can create a window of any size and compute the statistics for that window.
Multiple Rolling Metrics
This method is used to perform multiple computations on the data. We use aggregation when we want to perform a task on the whole of the dataframe. We can aggregate by passing a function to the entire DataFrame, or select a Series (or multiple Series) via standard __getitem__ and this will perform that function on that series.
we can also perform multiple functions by one aggregation for that we have to use agg() method and pass every function that we want to perform as a list.
we could also do perform multiple computations on a window dataframe, In that case for each function, a separate column will be formed showing the result of the function for that particular column. So if we take this example then on a window dataframe there would be 6 columns 3 for sum and 3 for the mean
Expanding Window Function
- From rolling to expanding windows
- Calculate metrics for periods up to the current date
- New time series reflects all historical values
- Useful for running rate of return, running min/max
- Two options with pandas:
- .expanding() – just like .rolling()
- .cumsum(), .cumprod(), cummin()/max()
How To Calculate Running Return
- Single period return r: current price over last price minus 1
rt =Pt / Pt1 -1
- Multi-period return: product of (1 + r) for all periods, minus 1:
RT = (1 + r1)(1 + r2)…(1 + rT ) 1
- For the period return: .pct_change()
- For basic math .add(), .sub(), .mul(), .div()
- For cumulative product: .cumprod()
Running min max
Use min and max attribute of expanding to calculate running min and max.