Python download webpage

9/12/2023

In order to get usable meta-data, I added this: og_url = html_page.find(“meta”, property = “og:url”)Īnd got something like this as a result: Parse Input URL While another website had no og:title and had this instead:

For example, one of the websites had this: Upon evaluating the HTML code of both, I realized that the content of their meta tags was slightly different. Now, I had two main websites from which I occasionally downloaded pdf files. In order to get a properly formatted and humanly readable HTML source code, I tried doing this with BeautifulSoup, which is a Python package for parsing HTML and XML documents: html_page = bs(html, features=”lxml”)

However, when I tried to print it on my console, it wasn’t a pleasant sight. In Python, HTML of a web page can be read like this: html = urlopen(my_url).read()

0 Comments

Python download webpage

Leave a Reply.

Author

Archives

Categories