[Python] Find text in the HTML <a> Tag using BeautifulSoup

Question 1

I am trying to find all '<a>' tags with the "href" attribute on a webpage using the following beautifulsoup code, but it returns many incorrect values. What am I missing in the code?

from bs4 import BeautifulSoup
import urllib.request as ur
req = ur.Request(url, None, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36' })
rs = ur.urlopen(req)
soup = BeautifulSoup(rs, 'html.parser')
for sp in soup.find_all('a'):
print(sp.text)

Question 2

You need to use parameter "href=True" in the find_all() function. Without the parameter, it tries to find all <a> tags with/without href and hence you are getting some incorrect values.

Here is the modified code:

from bs4 import BeautifulSoup
import urllib.request as ur

req = ur.Request(url, None, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36' })
rs = ur.urlopen(req)

soup = BeautifulSoup(rs, 'html.parser')
for sp in soup.find_all('a', href=True):
print(sp.text)

pkumar81 · Answer 1 · 2022-09-15T20:42:44+0000

You need to use parameter "href=True" in the find_all() function. Without the parameter, it tries to find all <a> tags with/without href and hence you are getting some incorrect values.

Here is the modified code:

from bs4 import BeautifulSoup
import urllib.request as ur

req = ur.Request(url, None, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36' })
rs = ur.urlopen(req)

soup = BeautifulSoup(rs, 'html.parser')
for sp in soup.find_all('a', href=True):
print(sp.text)

[Python] Find text in the HTML <a> Tag using BeautifulSoup

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories