+2 votes
in Programming Languages by (26.1k points)

I want to extract some texts from a webpage. The texts are within the span tag with itemprop="name". How to specify itemprop="name" in the BeautifulSoup function to get values?

e.g.

<span itemprop="name">Stanford University</span>

1 Answer

+2 votes
by (121k points)
selected by
 
Best answer

You can use itemprop="name" as an argument of the find_all() function to search all span tags with itemprop="name".

Here is an example:

from bs4 import BeautifulSoup
import urllib.request as ur

url = full_url_of_the_webpage
req = ur.Request(url, None, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36' })
rs = ur.urlopen(req)

soup = BeautifulSoup(rs, 'html.parser')
for sp in soup.find_all('span',itemprop="name"):
        print(sp.text)

 To run this code, replace "full_url_of_the_webpage" with actual url of the page.


...