Beautiful Soup and extracting a div and its contents by ID

soup.find(“tagName”, { “id” : “articlebody” }) Why does this NOT return the <div id=”articlebody”> … </div> tags and stuff in between? It returns nothing. And I know for a fact it exists because I’m staring right at it from soup.prettify() soup.find(“div”, { “id” : “articlebody” }) also does not work. (EDIT: I found that BeautifulSoup … Read more

TypeError: a bytes-like object is required, not ‘str’ in python and CSV

TypeError: a bytes-like object is required, not ‘str’ getting above error while Executing below python code to save the HTML table data in Csv file. don’t know how to get rideup.pls help me. import csv import requests from bs4 import BeautifulSoup url=”http://www.mapsofindia.com/districts-india/” response=requests.get(url) html=response.content soup=BeautifulSoup(html,’html.parser’) table=soup.find(‘table’, attrs={‘class’:’tableizer-table’}) list_of_rows=[] for row in table.findAll(‘tr’)[1:]: list_of_cells=[] for cell … Read more

Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org

I’m practicing the code from ‘Web Scraping with Python’, and I keep having this certificate problem: from urllib.request import urlopen from bs4 import BeautifulSoup import re pages = set() def getLinks(pageUrl): global pages html = urlopen(“http://en.wikipedia.org”+pageUrl) bsObj = BeautifulSoup(html) for link in bsObj.findAll(“a”, href=re.compile(“^(/wiki/)”)): if ‘href’ in link.attrs: if link.attrs[‘href’] not in pages: #We have … Read more

BeautifulSoup getting href [duplicate]

This question already has answers here: retrieve links from web page using python and BeautifulSoup [closed] (16 answers) Closed 8 years ago. I have the following soup: <a href=”some_url”>next</a> <span class=”class”>…</span> From this I want to extract the href, “some_url” I can do it if I only have one tag, but here there are two … Read more

bs4.FeatureNotFound: Couldn’t find a tree builder with the features you requested: lxml. Do you need to install a parser library?

… soup = BeautifulSoup(html, “lxml”) File “/Library/Python/2.7/site-packages/bs4/__init__.py”, line 152, in __init__ % “,”.join(features)) bs4.FeatureNotFound: Couldn’t find a tree builder with the features you requested: lxml. Do you need to install a parser library? The above outputs on my Terminal. I am on Mac OS 10.7.x. I have Python 2.7.1, and followed this tutorial to get … Read more

UnicodeEncodeError: ‘charmap’ codec can’t encode characters

I’m trying to scrape a website, but it gives me an error. I’m using the following code: import urllib.request from bs4 import BeautifulSoup get = urllib.request.urlopen(“https://www.website.com/”) html = get.read() soup = BeautifulSoup(html) print(soup) And I’m getting the following error: File “C:\Python34\lib\encodings\cp1252.py”, line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: ‘charmap’ codec can’t encode characters in position … Read more

How to find elements by class

I’m having trouble parsing HTML elements with “class” attribute using Beautifulsoup. The code looks like this soup = BeautifulSoup(sdata) mydivs = soup.findAll(‘div’) for div in mydivs: if (div[“class”] == “stylelistrow”): print div I get an error on the same line “after” the script finishes. File “./beautifulcoding.py”, line 130, in getlanguage if (div[“class”] == “stylelistrow”): File … Read more

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xa0′ in position 20: ordinal not in range(128)

I’m having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup. The problem is that the error is not always reproducible; it sometimes works with some pages, and sometimes, it barfs by throwing a UnicodeEncodeError. I have tried just about everything I can think of, … Read more