+4 votes
in Programming Languages by (17.9k points)

The find_all() function of the Python package BeautifulSoup returns all the tags and strings present on a webpage that match filters. Is there any way to select the first 5 or 10 tags instead of all?

1 Answer

+2 votes
by (28.5k points)

You can use the argument "limit" to pass in the number of tags you want in the output. This works just like the LIMIT keyword in SQL. It tells BeautifulSoup to stop gathering results after finding a certain number of tags.

Here is an example:

from bs4 import BeautifulSoup

import urllib.request

# user agent so that request is not declined

user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

headers = {'User-Agent': user_agent, }

url = your_url_here

#

# open the page

request = urllib.request.Request(url, None, headers)

response = urllib.request.urlopen(request)

soup = BeautifulSoup(response, 'html.parser')

#

# fetch first 10 'a' tag

for foo in soup.find_all("a", limit=10):

    print(foo)


...