LaVOZs

The World’s Largest Online Community for Developers

'; python - How to get more than one item with identical html tag on BeautifulSoup - LavOzs.Com

I am new to BeautifulSoup and I am not that familiar with Html.. But I am learning and I am finding myself some little projects to do. For this one, what I want is to get the football match info from this site, like TeamA Date/time TeamB.

Here is my code

import requests
from bs4 import BeautifulSoup

url = 'https://www.lequipe.fr/Football/ligue-1/page-calendrier-resultats/21e-journee'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

all_result = soup.find('div', class_="grid")

all_pairs = all_result.find_all('div', class_='grid__item')

i = 0
for result in all_pairs:
    i = i + 1
    team_name = result.find('span', class_='TeamScore__nameshort')  
    calendrier = result.find('div', class_='TeamScore__data')

    
    
    print(i)
    print(team_name.text.strip())
    print(calendrier.text.strip())
    print()

My problems are:

  1. It only grab the first team. Like Nice vs. Rennes, but it only gets "Nice". The Html tags for TeamA and TeamB seem the same to me. I checked find_all, but it did not work neither.

  2. For whatever reason, the Date/Time it gets are wrong. It shows some completely different dates and time. I don't know why..

Thank you for your help.

find_all is indeed the function you are after.

Try this:

import requests
from bs4 import BeautifulSoup

url = 'https://www.lequipe.fr/Football/ligue-1/page-calendrier-resultats/21e-journee'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

all_result = soup.find('div', class_="grid")

all_pairs = all_result.find_all('div', class_='grid__item')

i = 0
for result in all_pairs:
    i = i + 1
    team_names = result.find_all('span', class_='TeamScore__nameshort')
    first_team_name = team_names[0]
    second_team_name =  team_names[1]
    calendrier = result.find('div', class_='TeamScore__data')



    print(i)
    print('{} vs {}'.format(first_team_name.text.strip(), second_team_name.text.strip()))
    print(calendrier.text.strip())
    print()

which should output:

1
Nice vs Rennes
24 janv.
                    20h45

2
Marseille vs Angers
25 janv.
                    17h30

3
Montpellier vs Dijon
25 janv.
                    20h00

4
Monaco vs Strasbourg
25 janv.
                    20h00

5
Reims vs Metz
25 janv.
                    20h00

6
Brest vs Amiens
25 janv.
                    20h00

7
Saint-Étienne vs Nîmes
25 janv.
                    20h00

8
Lyon vs Toulouse
26 janv.
                    15h00

9
Nantes vs Bordeaux
26 janv.
                    17h00

10
Lille vs Paris-SG
26 janv.
                    21h00

find_all just returns a list of elements so you will have to use an index to access the element you want (or alternatively, iterate over the list).

As for the dates being different, I haven't looked into it but one reason could be that when you visit the site in your browser, the dates are changed by JS to be in your local timezone. By getting the site with BeautifulSoup, you would be getting the default timezone dates.

You can use

element = soup.select('div.grid__item')
firstElement = element[0].get_text()

Another Example to get an attribute for the following html code:

<div class="nextpage">
    <a class="next-story" href="somepage.html">Some Page</a>
    <a class="next-story" href="somepage2.html">Some Page 2</a>
    <a class="next-story" href="somepage3.html">Some Page 3</a>
</div>

Code would be:

link = soup.select('div.nextpage a.next-story')
href = link[0].get('href')

When you print href, it would return 'somepage.html'

Related
How do you disable browser Autocomplete on web form field / input tag?
How to randomly select an item from a list?
How to get the current time in Python
Where should I put <script> tags in HTML markup?
How can I make a div not larger than its contents?
How do I get a substring of a string in Python?
How do I get the number of elements in a list?
How can I count the occurrences of a list item?
How to create an HTML button that acts like a link?