View on GitHub

Lecture Series

Python, AI, and Cybersecurity Resources

API, RSS, Crawling and Scraping

API

RSS

Crawling is the process of systematically browsing the web to discover and index pages. It is primarily used by search engines like Google to find new and updated content across the internet. Crawlers, also known as spiders or bots, navigate through web pages by following links. The goal is to create an up-to-date index of websites for search engines, so they can efficiently provide relevant results to users.

Scraping, on the other hand, involves extracting specific information from web pages. Scraping targets the content of a page, often for analysis or to make use of the data in some other context, like building price comparison tools. Web scrapers are designed to access a specific site, parse its HTML, and extract structured information like product details, prices, or reviews.

In summary, crawling is about discovering and indexing web content, while scraping is about extracting targeted data from specific sites.

API

Stock Library

  1. yfinance · 2. pandas-datareader · 3. Theta Data · 4. Alpha Vantage · 5. Finnhub · 6. Nasdaq Data Link (formerly Quandl) · 7. Twelve Data · 8. IBApi

Scraping

Static (requests)

How to use the requests library in Python to send an HTTP GET request and work with the response.

    import requests
    
    # Step 1: Define the URL
    url = "https://blog.codeforlife.xyz/2024/06/request-python.html"
    
    # Step 2: Send a GET request
    response = requests.get(url)
    
    # Step 3: Check the response status
    if response.status_code == 200:
        print("Request was successful!")
    else:
        print(f"Request failed with status code: {response.status_code}")
    
    # Step 4: Parse the JSON response
    data = response.json()
    
    # Step 5: Display the data
    print("\nResponse JSON Data:")
    print(data)
    
    # Access specific fields from the JSON response
    print("\nTitle of the Post:")
    print(data["title"])

Dynamic (selenium)

Simple example to demonstrate how to use Python’s selenium library to automate web browsing.

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.common.keys import Keys
    
    # Step 1: Set up the WebDriver (e.g., for Chrome)
    driver = webdriver.Chrome()  # Make sure you have the ChromeDriver installed
    
    # Step 2: Open a webpage
    url = "https://www.google.com"
    driver.get(url)
    
    # Step 3: Find the search box element
    search_box = driver.find_element(By.NAME, "q")
    
    # Step 4: Interact with the search box (e.g., enter a query and press Enter)
    search_box.send_keys("Python selenium tutorial")
    search_box.send_keys(Keys.RETURN)
    
    # Step 5: Wait for the results to load and print the title of the page
    print("Page title after search:", driver.title)
    
    # Step 6: Close the browser
    driver.quit()

RSS feed

Simple example of how to read and parse an RSS feed using Python. The feedparser library is commonly used for this purpose.

    import feedparser
    
    # Step 1: Define the RSS feed URL
    rss_url = "https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml"  # Example RSS feed
    
    # Step 2: Parse the RSS feed
    feed = feedparser.parse(rss_url)
    
    # Step 3: Display feed information
    print("Feed Title:", feed.feed.title)
    print("Feed Link:", feed.feed.link)
    print("Feed Description:", feed.feed.description)
    
    print("\nTop Articles:")
    # Step 4: Loop through the entries (articles)
    for entry in feed.entries[:5]:  # Limit to the top 5 articles
        print("\nTitle:", entry.title)
        print("Link:", entry.link)
        print("Published:", entry.published)

API

How to perform GET and POST requests with a token-based authentication using Python’s requests library.

    import requests
    
    # API Base URL
    base_url = "https://example.com/api"  # Replace with your API base URL
    
    # Your authentication token
    auth_token = "your_token_here"
    
    # Headers with the authorization token
    headers = {
        "Authorization": f"Bearer {auth_token}",
        "Content-Type": "application/json"
    }
    
    # ----------------------------
    # Step 1: Perform a GET Request
    # ----------------------------
    # Example endpoint to fetch data
    get_url = f"{base_url}/data"
    response = requests.get(get_url, headers=headers)
    
    # Check if the GET request was successful
    if response.status_code == 200:
        print("GET Request Successful!")
        print("Response JSON:", response.json())
    else:
        print(f"GET Request Failed with status code: {response.status_code}")
    
    # ----------------------------
    # Step 2: Perform a POST Request
    # ----------------------------
    # Example endpoint to create a resource
    post_url = f"{base_url}/data"
    data = {
        "name": "Test Item",
        "description": "This is a test item created via API.",
        "price": 19.99
    }
    
    post_response = requests.post(post_url, headers=headers, json=data)
    
    # Check if the POST request was successful
    if post_response.status_code == 201:
        print("\nPOST Request Successful!")
        print("Response JSON:", post_response.json())
    else:
        print(f"POST Request Failed with status code: {post_response.status_code}")