+4 votes
in Programming Languages by (56.8k points)
How can I check the original encoding of a webpage using the Python BeautifulSoup package?

1 Answer

+3 votes
by (73.8k points)
selected by
 
Best answer

You can use the original_encoding attribute of the BeautifulSoup object to fetch the original encoding of a web page.

Here is an example:

from bs4 import BeautifulSoup

import urllib.request

#

# user agent so that request is not declined

user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

headers = {'User-Agent': user_agent, }

url = your_url_here

#

# open the page

request = urllib.request.Request(url, None, headers)

response = urllib.request.urlopen(request)

soup = BeautifulSoup(response, 'html.parser')

print(soup.original_encoding)


...