Sorting means to arrange the values in a particular order that is, ascending or descending and if we talk about strings then in alphabetical order (mostly).
Often you want to sort Pandas data frame in a specific way. Typically, one may want to sort pandas data frame based on the values of one or more columns or sort based on the values of row index or row names of pandas dataframe. Pandas data frame has two useful functions
- sort_values(): to sort pandas data frame by one or more columns
- sort_index(): to sort pandas data frame by row index
Each of these functions come with numerous options, like sorting the data frame in a specific order (ascending or descending), sorting in place, sorting with missing values, sorting by a specific algorithm and so on.
We will use gapminder dataset to see some of the sortings.
Pandas sort_values() function sorts a data frame in Ascending or Descending order of passed Column. It’s different than the sorted Python function since it cannot sort a data frame and a particular column cannot be selected. To sort a datafame according to the values in a particular column, pass the column name as the argument to the sort_values().
to sort in a different order that is, descending order pass False as the value to the ascending parameter. By default ascending is assigned True.
When sorting by a column which has missing values in it often we want to see the missing values first so to do that e have to pass ‘first’ as the argument to na_postion parameter.
As shown in the output, The NaN values are at the top and after that comes the sorted value of fertility.
Sorting a pandas dataframe by sort_values or sort_index method creates new dataframe and if we don’t want to create a new and see the result we have to pass True as the argument to inplace parameter.
sorting by multiple columns
So far we have seen sorting on the basis of entries of a single column but we can perform sorting on the basis of multiple columns by passing the name of the columns as a list in the sort_values method.
Note that when sorting by multiple columns, pandas sort_value() uses the first variable first and second variable next. We can see the difference by switching the order of column names in the list.
We can use sort_index() to sort pandas dataframe to sort by row index or names. In this example, row index are numbers and in the earlier example we sorted data frame by fertility and therefore the row index is jumbled up. We can sort by row index (with inplace=True option) and retrieve the original dataframe.