Python Selenium Web Scraping

In this Python Selenium article we want to learn about Python Selenium Web Scraping, web scraping has become an essential tool for extracting valuable information from websites, Python offers several options and libraries for web scraping. One of the popular choices is Selenium, Selenium is powerful automation framework. In this article, we want to learn about scraping websites with Python and Selenium.

For working with the example of this tutorial you need some requirements, first you should have installed Python in your system, then we need to install Python Selenium and you can use pip for that like this, also you need to driver for specific browser.

pip install selenium

1	pip install selenium

Note: You can download the drivers from here.

Chrome:	https://chromedriver.chromium.org/downloads
Edge:	https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/
Firefox:	https://github.com/mozilla/geckodriver/releases
Safari:	https://webkit.org/blog/6900/webdriver-support-in-safari-10/

This is the complete code for this article

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options


options = Options()
options.headless = True
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")

driver = webdriver.Chrome(options=options)
driver.get("http://quotes.toscrape.com/")
quote_elements = driver.find_elements(By.CLASS_NAME, "quote")
for quote_element in quote_elements:
    quote_text = quote_element.find_element(By.CLASS_NAME, "text").text
    author = quote_element.find_element(By.CLASS_NAME, "author").text
    print(f"Quote: {quote_text}")
    print(f"Author: {author}")
    print("-" * 50)


driver.quit()

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.chrome.options import Options

options = Options()

options.headless = True

options.add_argument("--disable-dev-shm-usage")

options.add_argument("--no-sandbox")

driver = webdriver.Chrome(options=options)

driver.get("http://quotes.toscrape.com/")

quote_elements = driver.find_elements(By.CLASS_NAME, "quote")

for quote_element in quote_elements:

quote_text = quote_element.find_element(By.CLASS_NAME, "text").text

author = quote_element.find_element(By.CLASS_NAME, "author").text

print(f"Quote: {quote_text}")

print(f"Author: {author}")

print("-" * 50)

driver.quit()

First we need to import the required classes from Python Selenium

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time

from selenium import webdriver

from selenium.webdriver.chrome.service import Service

from selenium.webdriver.common.by import By

from selenium.webdriver.chrome.options import Options

import time

To initiate a web driver session, we need to configure the Chrome web driver, make sure that you have already added the Chrome Driver in your working directors, We also set up a few options to run the driver in headless mode (without opening a visible browser window) and to disable unnecessary notifications.

options = Options()
options.headless = True
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
driver = webdriver.Chrome(options=options)

options = Options()

options.headless = True

options.add_argument("--disable-dev-shm-usage")

options.add_argument("--no-sandbox")

driver = webdriver.Chrome(options=options)

Using the get() method, we instruct the web driver to navigate to the desired URL. In our case, it’s http://quotes.toscrape.com/.

driver.get("http://quotes.toscrape.com/")

1	driver.get("http://quotes.toscrape.com/")

Now that we have loaded the target page, we can extract specific information from it. Let’s scrape the quotes and their authors from the page. we will use Selenium find_elements() method to locate the desired elements on the page, and after that iterate over them to extract the text content.

quote_elements = driver.find_elements(By.CLASS_NAME, "quote")
for quote_element in quote_elements:
    quote_text = quote_element.find_element(By.CLASS_NAME, "text").text
    author = quote_element.find_element(By.CLASS_NAME, "author").text
    print(f"Quote: {quote_text}")
    print(f"Author: {author}")
    print("-" * 50)

quote_elements = driver.find_elements(By.CLASS_NAME, "quote")

for quote_element in quote_elements:

quote_text = quote_element.find_element(By.CLASS_NAME, "text").text

author = quote_element.find_element(By.CLASS_NAME, "author").text

print(f"Quote: {quote_text}")

print(f"Author: {author}")

print("-" * 50)

After that we have finished extracting the data, it is good practice to clean up and close the web driver to release system resources.

driver.quit()

1	driver.quit()

Run the code this will be the scraped data with Python Selenium

Subscribe and Get Free Video Courses & Articles in your Email

Leave a Comment Cancel reply