Convert Unicode to ASCII without errors in Python

My code just scrapes a web page, then converts it to Unicode. html = urllib.urlopen(link).read() html.encode(“utf8″,”ignore”) self.response.out.write(html) But I get a UnicodeDecodeError: Traceback (most recent call last): File “/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/__init__.py”, line 507, in __call__ handler.get(*groups) File “/Users/greg/clounce/main.py”, line 55, in get html.encode(“utf8″,”ignore”) UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xa0 in position 2818: ordinal not in range(128) … Read more

Replace non-ASCII characters with a single space

I need to replace all non-ASCII (\x00-\x7F) characters with a space. I’m surprised that this is not dead-easy in Python, unless I’m missing something. The following function simply removes all non-ASCII characters: def remove_non_ascii_1(text): return ”.join(i for i in text if ord(i)<128) And this one replaces non-ASCII characters with the amount of spaces as per … Read more