Python code to download all images (Image Scraping) from a webpage

If a webpage has many photos and you want to download all of them, manual download is not a good option as it can take several minutes. You need some script that can scrape images from the page. In this post, I have written a simple Python script that can download and save images from a webpage to your local machine. Some web servers do not allow scrapping without a user-agent. So, I have included a user-agent in this code which should not be rejected by web servers. If you still get a rejection from a web server, you can find different user-agents on the internet and try them.

The following code saves images with the name ‘my-photo-*. You can modify it if you want different names. The code uses Python modules bs4, urllib, and requests for scrapping and saving images.

from bs4 import BeautifulSoup
import urllib.request
import requests


def save_image_file(ilink, filename):
    """
    Download and save the image file from a URL.
    """
    response = requests.get(ilink)
    if response.status_code == 200:
        with open(filename, 'wb') as f:
            f.write(response.content)
    else:
        print("Bad response code for the link:", ilink)


def read_url_data(link, headers):
    """
    Read the URL and create a beautifulsoup object
    """
    request = urllib.request.Request(link, None, headers)
    response = urllib.request.urlopen(request)
    return BeautifulSoup(response, 'html5lib')


if __name__ == "__main__":
    # variables
    user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36 OPR/38.0.2220.41'
    headers = {'User-Agent': user_agent, }
    url = 'https://www.discoverbits.in/post/matplotlib-python-code-to-plot-bar-charts-with-error-bars/'

    # read the URL
    soup = read_url_data(url, headers)

    # check all img tags and download images
    photo_name = 'my-photo-'
    i = -1
    for tag in soup.find_all('img'):  # get all img tag
        if tag.attrs['src']:
            ext = '.' + tag['src'].split('.')[-1]  # capture the photo extension
            filename = 'photos/' + photo_name + str(i) + ext
            picurl = tag['src']
            print('Downloading.....', picurl)
            save_image_file(picurl, filename)
            i += 1
        else:
            print('BAD TAG', tag)

Post your comments to let me know if this code works for you.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.