One of the essential pieces of NumPy is the ability to perform quick element-wise operations, both with basic arithmetic (addition, subtraction, multiplication, etc.) and with more sophisticated operations (trigonometric functions, exponential and logarithmic functions, etc.). Pandas inherit much of this functionality from NumPy, and the ufuncs that we introduced in Computation on NumPy Arrays: Universal Functions are key to this.
Pandas include a couple useful twists, however: for unary operations like negation and trigonometric functions, these ufuncs will preserve index and column labels in the output, and for binary operations such as addition and multiplication, Pandas will automatically align indices when passing the objects to the ufunc. This means that keeping the context of data and combining data from different sources–both potentially error-prone tasks with raw NumPy arrays–become essentially foolproof ones with Pandas. We will additionally see that there are well-defined operations between one-dimensional Series structures and two-dimensional DataFrame structures.
With binary operations between pandas data structures, there are two key points of interest:
- Broadcasting behavior between higher- (e.g. DataFrame) and lower-dimensional (e.g. Series) objects.
- Missing data in computations.
We will demonstrate how to manage these issues independently, though they can be handled simultaneously.
DataFrame has the methods add(), sub(), mul(), div() and related functions radd(), rsub(), … for carrying out binary operations. For broadcasting behavior, Series input is of primary interest. Using these functions, you can use to either match on the index or columns via the axis keyword:
Similarly, we can use other functions such as add and mul. Series and Index also support the
divmod() builtin. This function takes the floor division and modulo operation at the same time returning a two-tuple of the same type as the left-hand side. For example:
Operations Between DataFrame and Series
When performing operations between a
DataFrame and a
Series, the index and column alignment is similarly maintained. Operations between a
DataFrame and a
Series are similar to operations between a two-dimensional and one-dimensional NumPy array. Consider one common operation, where we find the difference of a two-dimensional array and one of its rows:
According to NumPy’s broadcasting rules, subtraction between a two-dimensional array and one of its rows is applied row-wise. In Pandas, the convention similarly operates row-wise by default:
Note that these
Series operations, like the operations discussed above, will automatically align indices between the two elements. This preservation and alignment of indices and columns mean that operations on data in Pandas will always maintain the data context, which prevents the types of silly errors that might come up when working with heterogeneous and/or misaligned data in raw NumPy arrays.
Filling Missing Values
If using NaN values is not the desired behavior, the fill value can be modified using appropriate object methods in place of the operators. For example, calling
A.add(B) is equivalent to calling
A + B, but allows the optional explicit specification of the fill value for any elements in
B that might be missing:
Series and DataFrame have the binary comparison methods
ge whose behavior is analogous to the binary arithmetic operations described above:
These operations produce a pandas object of the same type as the left-hand-side input that is of dtype
boolean objects can be used in indexing operations, see the section on Boolean indexing.
You can apply the reductions:
bool() to provide a way to summarize a boolean result.