Scraping data from Wikipedia using Python and Beautiful Soup

February 28, 2023

2 minutes read

In this tutorial, we will be extracting from Wikipedia every winner of the Moroccan football league since 1957.

We will use Python's requests library to make HTTP requests, and Beautiful Soup to parse the HTML content of the page.

The soup variable now contains the HTML content of the page

If you inspect the page's HTML, you'll notice that all the tables in the Wikipedia article share the same class value, and don't have a unique id we can use to access a specific table, they do however contain different headers.

In the next line of code, we use the soup.find() function to find the th element that contains the text "Season". We then call the parent attribute twice to get to the table element that contains the list of winners.

We then loop through each row tr in the winners_table and extract the season and the winning team. We store these values in a dictionary that we add to winners_list.

Finally, we write the winners_list to a JSON file using the json.dumps() function.

Here is the full code: