DataFrame

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input:

  • Dict of 1D ndarrays, lists, dicts, or Series
  • 2-D numpy.ndarray
  • Structured or record ndarray
  • A Series
  • Another DataFrame

We can also say that a DataFrame is a collection of Series. So we can also pass the index or column label to a DataFrame.

Why DataFrames?

Data science involves processing, analyzing, and visualizing data. While some tools like Microsoft Excel allow us to perform basic data science tasks, they’re limited to the functionality built in to the user interface. If you want to work with datasets that aren’t structured like a spreadsheet or create entirely new data visualizations from scratch, you’ll need to become proficient in programming. Instead of using a program written by others that can solve a narrow set of tasks, you can create your own programs that can solve your specific problems.

Programming involves organizing a collection of instructions into a program for a computer to carry out. To express these instructions, we use a programming language like Python which has a Pandas library that contains a DataFrame data structure which helps to perform various data analysis task.

Explore data analysis with Python. Pandas DataFrames make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data

Pandas is a popular Python package for data science, and with good reason: it offers powerful, expressive and flexible data structures that make data manipulation and analysis easy, among many other things. The DataFrame is one of these structures.

One of the main advantages of using pandas DataFrames instead of Numpy is that DataFrams allows you to have columns of various data types.

We can also perform basic data analysis task with excel spreadsheets also but pandas DataFrame allows much more functions which are easy to perform and many types of visualization in comparison to excel spreadsheets.

We can create DataFrame from various methods.

Pd.DataFrame()

class pandas.DatFrame(data=Noneindex=Nonecolumns=Nonedtype=Nonecopy=False)

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.

data: It can contain dict, a list like object, array etc.

indexIndex to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.

columns: column label to use for resulting DataFrame. by default (0,1,2,3,4…) that is, RangeIndex.

dtype: Data type to force. Only a single dtype is allowed.

Creating DataFrame

 

From Dict of ndarray

DataFrame can also be created from a dict. In this case, the keys of the dictionary are converted into columns of DataFrame and the corresponding values of keys are converted into entries of columns.

dict Robofied

If no value is provided for any row then that row will be filled with NaN (Not a Number).

From Dict of Series

When using dict of Series the resultant DataFrame will be same and same rules are applied for this case also.

series Robofied

From Series

When creating DataFrame with the help of series the resulting DataFrame will have the same index as of the series and a column name with the name of the series.

series Robofied

Column Selection, Addition, and Deletion

Just like a dict, We can do the selection addition and deletion in a DataFrame with almost same syntax.

When performing selection we give the DataFrame name and the column name in a square bracket, It will provide all the rows of that column. We can also perform slicing and boolean indexing on a DataFrame.

selection Dataframe

When performing addition we give the DataFrame name and the column name that we want to add in the square bracket and provide the list of items that we want to add in that column.

If a single value is given then it will be broadcasted to each row.

addition dataframe

 

When performing deletion we simply give the DataFrame name and the column name in the square bracket to the del.

deletion Robofied

Indexing and Slicing

Indexing and Selecting is the process of Selecting particular data from a dataframe. There are mainly three ways in which we can select data from a DataFrame.

  • Square Brackets
  • .loc accessor
  • .iloc accessor

The loc accessor is used for the indexing and slicing of DataFrame on the basis of labels, which means that we have to specify rows and columns based on their row and column label. We can also pass a boolean condition in it.

.iloc accessor is used to select data on the basis of position (from 0 to length-1 of the axis). Here we use Index instead of labels. It may be used with a boolean array. .iloc will raise IndexError if a requested indexer is out of bounds, except slice indexer which allows out of bounds indexes.

indexing selecting Robofied

 

for further information on indexing and slicing please visit( http://blog.robofied.com/indexing-and-selecting-data/ )

Transposing

Transposing can be performed by using T attribute of a DataFrame. It means to simply convert the columns into row labels and the column name will be given as ( 0,1,2,3….)

transposing Robofied

 

Leave a Comment

Your email address will not be published. Required fields are marked *