+4 votes
in Programming Languages by (17.9k points)
I want to get the text from img alt tag using Python package beautifulsoup. I am using find_all() method to find all img tag on a web page, but not sure how to get the text from the alt tag.

E.g.

From the following HTML code, I want output="hello".

<img src="imgfile.png" alt="hello">

1 Answer

+2 votes
by (28.5k points)

You need to use the find_all() method with the parameter "alt=True" to get the alt text.

Here is an example:

The code scans the returned value of the find_all() method and selects alt text using "alt" as a key.

from bs4 import BeautifulSoup

import urllib.request

#

#

# define a user agent so that request is not declined

user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

headers = {'User-Agent': user_agent, }

url = your_url_containing_images

#

#

# request to open the web page

request = urllib.request.Request(url, None, headers)

response = urllib.request.urlopen(request)

soup = BeautifulSoup(response, 'html5lib')

#

#

# find all img tag and select alt text

for foo in soup.find_all('img', alt=True):

    print(foo['alt'])


...