Web scraping with Requests and BeautifulSoup
- Fatine Sefrioui

- Oct 23, 2025
- 2 min read
Updated: Oct 24, 2025
A step-by-step guide to turn any web page into a clean dataset
Web scraping is one of the most exciting skills for anyone diving into data analysis. It’s the art of turning web pages into usable datasets whether you’re collecting recipes, product details, or articles. In this tutorial, I’ll walk you through a simple example I coded on VS Code, showing how to extract data from a web page step by step using Python’s requests and BeautifulSoup libraries.

Step 1 - Getting the web page
We start by importing our two essential tools:
import requests
from bs4 import BeautifulSoup
Then, we define the URL we want to scrape:
url = "https://codeavecjonathan.com/scraping/recette/"
The requests library allows us to send an HTTP request to that page and receive its HTML code as a response:
response = requests.get(url)
response.encoding = response.apparent_encoding
This last line helps to clean up strange characters that sometimes appear depending on the website’s encoding.
We then check if everything worked well:
if response.status_code == 200:
html = response.text
If the status code is 200, it means the page loaded successfully.
Step 2 - Saving the HTML file
Before analyzing, we save the page content locally so we can explore it if needed:
f = open("recette.html", "w")
f.write(html)
f.close()
This simple trick creates a file called recette.html containing the entire HTML structure of the web page which is very handy for debugging or later reference.
Step 3 - Parsing the content with BeautifulSoup
Now comes the fun part!
We create a BeautifulSoup object that lets us navigate through the HTML and extract the information we want:
soup = BeautifulSoup(html, "html5lib")
Here, we use the parser "html5lib" because it’s more forgiving and handles messy HTML better.
Step 4 - Extracting the recipe title and description
We start by collecting the title (the first <h1> on the page):
titre = soup.find("h1").text
print(titre)
Then the description, which is stored inside a <p> element with the class "description":
description = get_text_if_not_none(soup.find("p", class_="description"))
print(description)
This function get_text_if_not_none() ensures we don’t get an error if the element is missing.
It’s a good practice when scraping... always assume something might not exist!
Step 5 - Extracting ingredients
Now let’s scrape the ingredients list!
We first locate the <div> that contains all ingredients, then extract each one inside its own <p> tag:
div_ingrédients = soup.find("div", class_="ingredients")
e_ingrédients = div_ingrédients.find_all("p")
for e_ingrédient in e_ingrédients:
print("INGRÉDIENT", e_ingrédient.text)
This loop goes through every ingredient and prints it cleanly.
Step 6 - Extracting preparation steps
Same logic here, but instead of <div>, the preparation steps are inside a <table>:
table_preparation = soup.find("table", class_="preparation")
e_étapes = table_preparation.find_all("td", class_="preparation_etape")
for e_étape in e_étapes:
print("ÉTAPES", e_étape.text)
Each step is stored in a table cell (<td>), and we simply print them one by one.
Step 7 - Wrapping up
At the end, we add a final check:
else:
print("ERREUR", response.status_code)
print("FIN")
That’s it, you just scraped your first web page!
You’ve turned unstructured HTML into clean, usable text.
To go further
If you want to explore more about web scraping and best practices, check out:



Comments