top of page

Web scraping with Requests and BeautifulSoup

  • Writer: Fatine Sefrioui
    Fatine Sefrioui
  • Oct 23, 2025
  • 2 min read

Updated: Oct 24, 2025

A step-by-step guide to turn any web page into a clean dataset

Web scraping is one of the most exciting skills for anyone diving into data analysis. It’s the art of turning web pages into usable datasets whether you’re collecting recipes, product details, or articles. In this tutorial, I’ll walk you through a simple example I coded on VS Code, showing how to extract data from a web page step by step using Python’s requests and BeautifulSoup libraries.

Web scraping with Python, Requests, Beautiful Soup on VS code

Step 1 - Getting the web page

  • We start by importing our two essential tools:


import requests

from bs4 import BeautifulSoup


  • Then, we define the URL we want to scrape:


url = "https://codeavecjonathan.com/scraping/recette/"


The requests library allows us to send an HTTP request to that page and receive its HTML code as a response:


response = requests.get(url)

response.encoding = response.apparent_encoding


This last line helps to clean up strange characters that sometimes appear depending on the website’s encoding.


  • We then check if everything worked well:


if response.status_code == 200:

html = response.text


If the status code is 200, it means the page loaded successfully.


Step 2 - Saving the HTML file

  • Before analyzing, we save the page content locally so we can explore it if needed:


f = open("recette.html", "w")

f.write(html)

f.close()


This simple trick creates a file called recette.html containing the entire HTML structure of the web page which is very handy for debugging or later reference.


Step 3 - Parsing the content with BeautifulSoup

Now comes the fun part!


  • We create a BeautifulSoup object that lets us navigate through the HTML and extract the information we want:


soup = BeautifulSoup(html, "html5lib")


Here, we use the parser "html5lib" because it’s more forgiving and handles messy HTML better.


Step 4 - Extracting the recipe title and description

  • We start by collecting the title (the first <h1> on the page):


titre = soup.find("h1").text

print(titre)


  • Then the description, which is stored inside a <p> element with the class "description":


description = get_text_if_not_none(soup.find("p", class_="description"))

print(description)


This function get_text_if_not_none() ensures we don’t get an error if the element is missing.

It’s a good practice when scraping... always assume something might not exist!


Step 5 - Extracting ingredients

Now let’s scrape the ingredients list!


  • We first locate the <div> that contains all ingredients, then extract each one inside its own <p> tag:


div_ingrédients = soup.find("div", class_="ingredients")

e_ingrédients = div_ingrédients.find_all("p")

for e_ingrédient in e_ingrédients:

print("INGRÉDIENT", e_ingrédient.text)


This loop goes through every ingredient and prints it cleanly.


Step 6 - Extracting preparation steps

  • Same logic here, but instead of <div>, the preparation steps are inside a <table>:


table_preparation = soup.find("table", class_="preparation")

e_étapes = table_preparation.find_all("td", class_="preparation_etape")

for e_étape in e_étapes:

print("ÉTAPES", e_étape.text)


Each step is stored in a table cell (<td>), and we simply print them one by one.


Step 7 - Wrapping up

  • At the end, we add a final check:


else:

print("ERREUR", response.status_code)

print("FIN")


That’s it, you just scraped your first web page!

You’ve turned unstructured HTML into clean, usable text.


To go further

If you want to explore more about web scraping and best practices, check out:

Comments


bottom of page