Web Parsing

For this blog, I created one sample HTML file to scrap and will follow that throughout this blog of web scraping. For this, we will use Beautiful Soup, request package to parse the website.

You have to install packages if you are on Python and if you are on anaconda then it will automatically come with the installed libraries.

  1. pip install beautifulsoup4 -> It helps in pulling data out of HTML and XML files.
  2. pip install lxml/html5lib -> There are parsers for HTML file as different parsers behave in a different way.
  3. pip install requests -> It is used to fetch the information from the web.

Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the LXML parser and we are going to use LXML parser for this tutorial.

How to navigate a tree of HTML file and finding a particular tag

How to fetch all links and scrap a page

If you want to fetch the data from a website on some server.

