While working on data-intensive applications, We have often confronted with Input/Output(I/O) challenges which represent the bottleneck for every performance-critical application. With the increasing volume of stored data, there is a necessity to store the data in disks to compensate for lack of sufficient RAM by loading the data from disk to RAM and vice-versa. I/O operations are thus essentially very important tasks when it comes to processing financial data or any scientific data.
Through this post, We will gain some knowledge of a few libraries and their tricks of the trade. Python has built-in capabilities which can be used to store the object on disk and to read it from disk into RAM. Also, Python is robust when it comes to processing text files and SQL databases. The Pandas library provides a multitude of classes and methods to read and write files in a good range of formats.
Flat Files are basic text files containing Records that is, table data without structural relationship. These files have the following fields.
- Record: Row of fields or attributes.
- Column: Feature or attributes.
- Header: It tells the various attributes of the files or simply it tells the name of the columns of the file.
Flat Files extensions.
- .csv: comma separated values.
- .tsv: Tab separated values.
- .txt: Text File
Comma and tabs are delimiters that are, the file in which the data are separated by characters or sequences of characters other than a comma, such as tabs.
How to Import Flat File
If the file consists entirely of numbers then we can use Numpy. If instead, we want to store data in DataFrame we will use Pandas.
Note: Relational database such as the data formed in MySQL is not Flat Files.
Some points worth noting about Flat Files are:
- Flat File consists of rows and each row is called a record.
- A Record in a Flat File consists of fields or attributes each of which contains at most one item of info.
- Flat Files are pervasive in nature.
Pandas read_csv() Function is used to import a .csv file and read_tsv() function is to import a .tsv file. We can also use read_csv() for importing a tsv file by just passing “\t” as an argument to sep parameter. For importing a .csv file we have to pass the file name as the string to the read_csv() method of pandas.
Same goes for the read_tsv() that is While importing a tsv file we use read_tsv() and pass the file name as a string to it. The Pandas is also great at dealing with many issues we will encounter when importing data as a data sciencetist, such as comments occurring in flat files, empty lines, and missing data values that is NaNs.
df.head() returned the first five rows of the dataframe.
We use pandas for importing excel file because it produces dataframes natively. To import Excel File we use ExcelFile() method of pandas.
To figure out what the sheet names are we use .sheet_names attribute of Excel Files object.
If we have multiple numbers of sheets and we want to load a particular sheet as a dataframe we need to apply method parse to object that is (df in our example) with a single argument which is either a name of sheet as a string or the index of sheets as a float.
Introduction to APIs and JSON
API is an Application Programming Interface. It is a set of protocols and routines used for building and interacting with the software application.
- It is a file format that solves the problem of the real-time server to browser communication.
- They are human readable files, for example, Pickled file.
- JSON consist of name-value pair separated by colons. This is just like dictionaries in python that is why they are natural to store in dictionaries when loaded as they are just keu_value pair in dictionaries.
- The keys in JSON will always be strings.
- The value can be strings, integers, arrays object, or even objects and object can even be a JSON.
Loading JSON in Python
Here we had the JSON file in our working directory if you don’t have a JSON file in your working directory then you have to download it.
Steps to load a JSON a file:
- Import json
- Build a connection with file and load the JSON file with json_load() function.
- When we print the types of JSON data it comes dictionary.
- To print the key and value of data or name and value we can iterate over key value.
Creating a Database Engine in Python
- We use SQL Alchemy package because it works with many Relational Database.
- We use the create_engine function to fire up a SQL engine that will communicate our queries to the database.
- create_engine takes only one argument that is a string which describes the type of database and the name of the database.
- In order to query the database, we need to connect to the database to do so.
Getting Table Names
The workflow of SQL Querying
- Import the packages and functions.
- create the database engine.
- connect to the engine.
- Query the database.
- save the query results to a database.
- close the connection.
Here, first of all, we created the engine then we connected to it by connect() method which returned the object that is assigned to the variable rs. To convert the result into DatFrame we use fetchall() method on rs and save it as DataFrame using DataFrame method of pandas, fetchall() fetches all rows.