Data Scraping On Amazon

Code :


import csv
import os
import requests
import datetime
import re
from bs4 import BeautifulSoup

def Check_price():
    # Define URL and headers
    URL = 'https://www.amazon.ae/Trust-engineer-save-time-Mug/dp/B07N6P1XC9/ref=sr_1_1?crid=35TL7DOU0NKX9&dib=eyJ2IjoiMSJ9.jNOkticQrzVZaT1LRLvEiq4ySgEfKW7aZ5uNE2v0LIOu_QH2T6QcgSm14yzc67pODZfTaHq5meUvzzy-IBgF1MF-6bxNufpGdar10NWIndYDM0-6L6FbtT-qIYK69qe3bWuVLCmRlwXQwOWl6WaLzJlc-g2IX9HlSv-xsoarcLjYuM2BIY9_sHp9goTGZQUXLXjigWtnAs8mcrR8N7itB_F1m_mJomMr-20si0U8grgd0twmkr4GOiJEL-cqXhW1lJl5Z2bBpUKKexZHl1O9SKg_aKzDaL4jxy15RRm2bCs.8INO3lJwzDaE4FjeSmvHNu-PNJtcaD7AqPD3sMdVKA8&dib_tag=se&keywords=mug+amazon+data&qid=1709186718&sprefix=mug+amazon+data%2Caps%2C343&sr=8-1'
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
        "Accept-Encoding": "gzip, deflate",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "DNT": "1",
        "Connection": "close",
        "Upgrade-Insecure-Requests": "1"
    }

    try:
        # Send GET request to the URL
        response = requests.get(URL, headers=headers)
        response.raise_for_status()  # Raise an exception for any HTTP errors

        # Extract HTML content
        soup = BeautifulSoup(response.content, "html.parser")

        # Extract title
        title = soup.find(id='productTitle').get_text().strip()

        # Extract price (numerical value only)
        price_text = soup.find(class_='aok-offscreen').get_text().strip()
        price = re.search(r'\d+\.\d+', price_text).group()  # Extract numerical value using regular expression

        # Data cleaning
        title = title.strip()  # Remove leading and trailing whitespace

        # Get current date
        collecting_date = datetime.date.today()

        # Define data to be written to CSV
        data = [title, price, collecting_date]

        # Define file path for saving the CSV file on desktop
        desktop_path = os.path.join(os.path.expanduser('~'), 'Desktop')
        file_path = os.path.join(desktop_path, 'AmazonMug.csv')

        # Write data to the CSV file
        with open(file_path, 'a+', newline='', encoding='utf-8') as f:
            writer = csv.writer(f)
            if f.tell() == 0:  # Check if file is empty (no data written yet)
                writer.writerow(['title', 'price', 'collecting_date'])  # Write headers if file is empty
            writer.writerow(data)

        print("CSV file saved on the desktop.")

    except requests.exceptions.RequestException as e:
        print("Failed to connect to the URL:", e)

    except AttributeError as e:
        print("Failed to extract data from the HTML:", e)

# Call the function
Check_price()

Explanation :

This Python function Check_price is designed to retrieve the title and price of a product from Amazon.ae, then save this information along with the current date to a CSV file on the user's desktop. Here's a breakdown of what the code does:

It imports necessary libraries for web scraping (requests and BeautifulSoup), working with dates (datetime), regular expressions (re), and file manipulation (csv, os).
The function Check_price defines the URL of the Amazon product page and sets up headers to mimic a browser's request.
It sends a GET request to the URL, retrieves the HTML content of the page, and creates a BeautifulSoup object to parse the HTML.
The function extracts the product title and price from the parsed HTML. It uses specific HTML element IDs and classes to locate the desired information.
Regular expressions are employed to extract the numerical value from the price text. This ensures that only the numerical value of the price is obtained.
Data cleaning steps are applied to remove any leading or trailing whitespace from the title.
The current date is obtained using datetime.date.today() to represent when the data was collected.
Data is formatted into a list containing the title, price, and collecting date.
The function defines the file path for saving the CSV file on the user's desktop.
It checks if the CSV file is empty. If it is, the function writes the header row. Otherwise, it appends the new data to the existing file.
The function prints a message indicating that the CSV file has been saved on the desktop.
Error handling is implemented using try and except blocks to catch potential exceptions during the HTTP request, data extraction, or file writing process.
Finally, the function is called to execute the scraping and data saving process.

You can use this function to periodically check the price of the Amazon product and keep track of any changes over time.