I’m pulling data out of a Google doc, processing it, and writing it to a file (that eventually I will paste into a WordPress page).
It has some non-ASCII symbols. How can I convert these safely to symbols that can be used in HTML source?
Currently I’m converting everything to Unicode on the way in, joining it all together in a Python string, then doing:
import codecs
f = codecs.open('out.txt', mode="w", encoding="iso-8859-1")
f.write(all_html.encode("iso-8859-1", "replace"))
There is an encoding error on the last line:
UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xa0 in position
12286: ordinal not in range(128)
Partial solution:
This Python runs without an error:
row = [unicode(x.strip()) if x is not None else u'' for x in row]
all_html = row[0] + "<br/>" + row[1]
f = open('out.txt', 'w')
f.write(all_html.encode("utf-8"))
But then if I open the actual text file, I see lots of symbols like:
Qur’an
Maybe I need to write to something other than a text file?