Web scraping with Requests and BeautifulSoup

Fatine Sefrioui
Oct 23, 2025
2 min read

Updated: Oct 24, 2025

A step-by-step guide to turn any web page into a clean dataset

Web scraping is one of the most exciting skills for anyone diving into data analysis. It’s the art of turning web pages into usable datasets whether you’re collecting recipes, product details, or articles. In this tutorial, I’ll walk you through a simple example I coded on VS Code, showing how to extract data from a web page step by step using Python’s requests and BeautifulSoup libraries.

Web scraping with Python, Requests, Beautiful Soup on VS code

Step 1 - Getting the web page

We start by importing our two essential tools:

import requests

from bs4 import BeautifulSoup

Then, we define the URL we want to scrape:

url = "https://codeavecjonathan.com/scraping/recette/"

The requests library allows us to send an HTTP request to that page and receive its HTML code as a response:

response = requests.get(url)

response.encoding = response.apparent_encoding

This last line helps to clean up strange characters that sometimes appear depending on the website’s encoding.

We then check if everything worked well:

if response.status_code == 200:

html = response.text

If the status code is 200, it means the page loaded successfully.

Step 2 - Saving the HTML file

Before analyzing, we save the page content locally so we can explore it if needed:

f = open("recette.html", "w")

f.write(html)

f.close()

This simple trick creates a file called recette.html containing the entire HTML structure of the web page which is very handy for debugging or later reference.

Step 3 - Parsing the content with BeautifulSoup

Now comes the fun part!

We create a BeautifulSoup object that lets us navigate through the HTML and extract the information we want:

soup = BeautifulSoup(html, "html5lib")

Here, we use the parser "html5lib" because it’s more forgiving and handles messy HTML better.

Step 4 - Extracting the recipe title and description

We start by collecting the title (the first <h1> on the page):

titre = soup.find("h1").text

print(titre)

Then the description, which is stored inside a <p> element with the class "description":

description = get_text_if_not_none(soup.find("p", class_="description"))

print(description)

This function get_text_if_not_none() ensures we don’t get an error if the element is missing.

It’s a good practice when scraping... always assume something might not exist!

Step 5 - Extracting ingredients

Now let’s scrape the ingredients list!

We first locate the <div> that contains all ingredients, then extract each one inside its own <p> tag:

div_ingrédients = soup.find("div", class_="ingredients")

e_ingrédients = div_ingrédients.find_all("p")

for e_ingrédient in e_ingrédients:

print("INGRÉDIENT", e_ingrédient.text)

This loop goes through every ingredient and prints it cleanly.

Step 6 - Extracting preparation steps

Same logic here, but instead of <div>, the preparation steps are inside a <table>:

table_preparation = soup.find("table", class_="preparation")

e_étapes = table_preparation.find_all("td", class_="preparation_etape")

for e_étape in e_étapes:

print("ÉTAPES", e_étape.text)

Each step is stored in a table cell (<td>), and we simply print them one by one.

Step 7 - Wrapping up

At the end, we add a final check:

else:

print("ERREUR", response.status_code)

print("FIN")

That’s it, you just scraped your first web page!

You’ve turned unstructured HTML into clean, usable text.

To go further

If you want to explore more about web scraping and best practices, check out:

BeautifulSoup official documentation

Real Python - Web Scraping with BeautifulSoup and Requests

Let's practice!