Indexing and Selecting Data

It is the process of Selecting particular data from a dataframe. There are mainly three ways in which we can select data from a dataframe.

  • Square Brackets
  • .loc accessor
  • .iloc accessor

Square Brackets

We use square brackets just like we use for indexing and selecting string or a list.

Column Access

Here we have selected all the rows from a column. When we select data by this method we are not dealing with regular dataframe. We are dealing with a Pandas Series.

Series is just like a 1-D array that can be labeled just like dataframe. If we put 2 or more series together it becomes a dataframe.

To select data in the form a dataframe we have to use 2 square brackets.

We can also select data from multiple columns at a time (order of the column doesn’t matter).

Row Access

We can also index rows from the dataframe with the help of square brackets. We just need to pass the index of rows that we need to index. We can also perform slicing with the help of  ‘:‘ operator, Just like we do in strings.

Here we have selected the first 20 rows of the dataframe. Square brackets offer limited access as it can select either row or a column at a time but not both.

If we want to access columns and rows of our choice then we have to use two different square brackets simultaneously, one for rows and other for the column.

Here the first bracket is used for the row the access (all rows in this case) and the second one is or the columns access (name and pclass in this case). Notice, we have used two square brackets for the selection of multiple columns at a time.

.loc Accessor

The loc accessor is used for the indexing and slicing of dataframe on the basis of labels, which means that we have to specify rows and columns based on their row and column label. We can also pass a boolean condition in it.

Row Access

we can access rows with the help of .loc[] by passing row labels (which are 0,1,2….. in our example).

To select the data in the form of dataframe we use two square brackets [[ ]] just like in the case of the square bracket access.

If we want to select multiple rows then we have to pass the label of each row that we want or we can also perform slicing.

Row Column Access

We can select rows and columns simultaneously by just separating rows and columns labels with a ‘ ‘ but remember first rows will come and then column.

Boolean Indexing

We can perform boolean indexing with loc accessor by specifying the condition inside brackets.

This returns all the rows having pclass equals to 1.

Slicing With Labels

pandas provide a suite of methods in order to have purely label based indexing. This is a strict inclusion based protocol. Every label asked for must be in the index, or a KeyError will be raised. When slicing, both the start bound AND the stop bound are included, if present in the index. Integers are valid labels, but they refer to the label and not the position.

Selection By Callable

We can also use a callable function for selection purpose. The callable must be a function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing.

.iloc Accessor

It is used to select data on the basis of position (from 0 to length-1 of the axis). Here we use the Index instead of labels. It may be used with a boolean array. .iloc will raise IndexError if a requested indexer is out of bounds, except slice indexer which allows out of bounds indexes.

Row Access

A row can be indexed by passing the index of the row we want. We can select multiple rows just by giving the index of each row or by performing slicing.

Row Column Access

We can select rows and columns simultaneously by just separating rows and columns labels with a ‘ ‘ but remember first rows will come and then column.

Setting with enlargement

The .loc/[] operations can perform enlargement when setting a non-existent key for that axis.

In the series case, this is effectively an appending operation.

If we give an index to the square bracket or a label to the loc accessor which is not present in the dataframe and we are assigning it something then that would create a new entry in the dataframe with the assigned value broadcasted to it.

As we can see here, a new column ship has been created in the and 1 has been broadcasted to it as its value.

Head And Tail Method

Head and Tail method of dataframe are used to to get a brief knowledge of the dataframe. The Head method returns first 5 rows of the dataframe by default and Tail returns last 5 rows by default. We can change the number of rows returned by providing the number of rows that we want as an argument.

Same goes for Tail.

Close Menu