Web Scraping

Web Scraping

Introduction (Part 1)

ยท

3 min read

Hello Techies ๐Ÿ‘‹, Welcome to new Series on Web Scraping Using Python.

In this series, You will knowing about an Interesting topic which is Web Scraping Techniques with Python.

Before We start, I believe you are a little familiar with web technologies. In our case, Understanding HTML is enough.

Ready to Go. Let's get started.

What is Web Scraping?

Web Scraping is a technique where we scrap or get the HTML contents of a Website. Note that it is not screen scraping. In screen scraping, the scraper only scraps the pixels from the screen whereas webscraping is used on any websites to get the HTML content so that the scraper could replicate the website elsewhere if needed.

What are the uses of Web Scraping?

Web Scraping is mainly used when you want to interact with any website to get information from it.

  • To fetch prices

  • To get product descriptions

  • To analyse costs etc.

Web Scraping with Python:

So, here comes the main topic. Web Scraping with Python is easy if you understand the HTML tags. Don't worry you don't need to go deeper into HTML. When we talk about Python, it provides a library to perform the web scraping which is Beautifulsoup. It has various methods to scrap and operate on the Websites.

Prerequisites:

- beautifulsoup
- bs4
- requests

Installation:

pip install bs4
pip install beautifulsoup
pip install requests

Time to Code :)

Step 1: Requesting the website

#requests module to get the html elements from the url
import requests

req_url = requests.get("https://www.example.com")
req_url = req_url.text

Import the requests module to request the website for it's content. Now we will use the requests.get() method to get the html content. The get method takes website url as it's argument. This will be stored in req_url variable. We convert the html content into text format using req_url.text to grap the headers and content from the response object.

Step 2: Parsing and Reading the Response

#beautifulsoup for parsing and reading the html content
from bs4 import BeautifulSoup
soup = BeautifulSoup(req_url)

Import BeautifulSoup to parse and read the html content in req_url. The BeautifulSoup method takes the response text object as argument.

Step 3: Saving the result

Now that we got the result, it can be stored in a file if needed. Note that this step is not mandatory. If you want to see the skeleton website that you scrapped, then it is suggested to store the result as a html file.

#saving the html content into a file
file = open("webscrapping.html","w+",encoding='utf-8')
file.write(str(soup))
file.close()

The above code opens a webscrapping.html file in your current working directory and saves it. It is better practice to always close a file using close() method as soon as you are done with your task with the file.

That's it. You did first part successfully.

You can find the code here
Happy Coding :) Thank you ๐Ÿ’œ

Did you find this article valuable?

Support Lakshmi Sowjanya's Blog by becoming a sponsor. Any amount is appreciated!

ย