In this article we want to learn about Scrap Amazon Products with Python Scrapy, Amazon is one the most popular and big marketplace. there are millions of products, and finding specific items can be a difficult task, in this lesson we want to learn about simple and easy techniques to scrap amazon products with Python, for this we want to use scrapy.
What is Scrapy?
Scrapy is an open source web crawling and web scraping framework, it is written in Python. also it provides simple and easy way to extract data from websites, also it allows users to define rules to follow links, extract content and store scraped data. Scrapy is widely used for different web scraping tasks, because it has scalability, asynchronous processing and built in support for handling requests and responses.
Start with Scrapy:
First of all we need to install Scrapy and you can use pip for that.
1 |
pip install scrapy |
After that open a command prompt and navigate to the directory where you want to create your project.
Run the following command to create a new Scrapy project named amazon_scraper, this will create a directory structure for your project with different files and folders.
1 |
scrapy startproject amazon_scraper |
Now Open amazon_scraper/items.py file and define the structure of the item you want to scrape. for example, if you want to scrape product titles, prices, and ratings, you can define an item like this, especially in here we are not going to scrap all products from Amazon, we want to just scrap kitchen products.
1 2 3 4 5 6 7 |
import scrapy class KitchenProductItem(scrapy.Item): title = scrapy.Field() price = scrapy.Field() rating = scrapy.Field() |
In Scrapy a spider is responsible for defining how to crawl a website and extract data. Create a new Python file named amazon_spider.py in the amazon_scraper/amazon_scraper/spiders directory. open amazon_scraper/amazon_scraper/spiders/amazon_spider.py file and replace its content with the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
import scrapy from ..items import KitchenProductItem class AmazonSpider(scrapy.Spider): name = 'amazon' start_urls = ['https://www.amazon.com/s?k=kitchen'] def start_requests(self): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36', } for url in self.start_urls: yield scrapy.Request(url, headers=headers) def parse(self, response): products = response.xpath('//div[@data-asin]') for product in products: item = KitchenProductItem() item['title'] = product.xpath('.//h2/a/span/text()').get() item['price'] = product.xpath('.//span[@class="a-offscreen"]/text()').get() item['rating'] = product.xpath('.//span[@class="a-icon-alt"]/text()').get() yield item next_page_url = response.css('div.a-section.a-text-center.s-pagination-container.s-pagination-dynamic-container a.s-pagination-item.s-pagination-next::attr(href)').get() if next_page_url: yield response.follow(next_page_url, self.parse) |
And now Open a terminal or command prompt and navigate to amazon_scraper directory. Run the following command to start the spider:
1 |
scrapy crawl amazon -o products.csv |
This will start the spider named amazon and save the scraped data to a CSV file named products.csv.
This will be our CSV file
FAQs about Scraping Amazon Products with Python Scrapy:
Q: Is it legal to scrape Amazon product data?
A: While scraping publicly available data from websites like Amazon is technically legal in many aspects, it’s important to review and comply with Amazon’s terms of service and robots.txt file. IF we violate these terms, then it could lead to legal consequences, including being blocked from accessing the website.
Q: Can I scrape large volumes of data from Amazon using Scrapy?
A: Yes, Scrapy is a powerful tool for scraping large volumes of data, including Amazon product data. However keep in mind that a lot of scraping may cause anti scraping measures from Amazon, such as IP blocking or CAPTCHA challenges.
Q: Can I scrape product images and customer reviews from Amazon using Scrapy?
A: Yes, it’s possible to scrape product images, customer reviews and other product related information from Amazon using Scrapy. but extracting images may require additional processing, and scraping reviews may involve navigating through multiple pages of reviews.
Subscribe and Get Free Video Courses & Articles in your Email