Convert a Unicode string to a string in Python (containing extra symbols)

How do you convert a Unicode string (containing extra characters like £ $, etc.) into a Python string? 10 s 10 See unicodedata.normalize title = u”Klüft skräms inför på fédéral électoral große” import unicodedata unicodedata.normalize(‘NFKD’, title).encode(‘ascii’, ‘ignore’) ‘Kluft skrams infor pa federal electoral groe’

Why are emoji characters like 👩‍👩‍👧‍👦 treated so strangely in Swift strings?

The character 👩‍👩‍👧‍👦 (family with two women, one girl, and one boy) is encoded as such: U+1F469 WOMAN, ‍U+200D ZWJ, U+1F469 WOMAN, U+200D ZWJ, U+1F467 GIRL, U+200D ZWJ, U+1F466 BOY So it’s very interestingly-encoded; the perfect target for a unit test. However, Swift doesn’t seem to know how to treat it. Here’s what I mean: … Read more

Best way to convert text files between character sets?

What is the fastest, easiest tool or method to convert text files between character sets? Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa. Everything goes: one-liners in your favorite scripting language, command-line tools or other utilities for OS, web sites, etc. Best solutions so far: On Linux/UNIX/OS X/cygwin: Gnu iconv suggested … Read more

Twitter image encoding challenge [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for … Read more

UnicodeDecodeError when reading CSV file in Pandas with Python

I’m running a program which is processing 30,000 similar files. A random number of them are stopping and producing this error… File “C:\Importer\src\dfman\importer.py”, line 26, in import_chr data = pd.read_csv(filepath, names=fields) File “C:\Python33\lib\site-packages\pandas\io\parsers.py”, line 400, in parser_f return _read(filepath_or_buffer, kwds) File “C:\Python33\lib\site-packages\pandas\io\parsers.py”, line 205, in _read return parser.read() File “C:\Python33\lib\site-packages\pandas\io\parsers.py”, line 608, in read ret … Read more

What is the difference between UTF-8 and Unicode?

I have heard conflicting opinions from people – according to the Wikipedia UTF-8 page. They are the same thing, aren’t they? Can someone clarify? 18 s 18 Let me use an example to illustrate this topic: A Chinese character: 汉 its Unicode value: U+6C49 convert 6C49 to binary: 01101100 01001001 Nothing magical so far, it’s … Read more

What is the best way to remove accents (normalize) in a Python unicode string?

I have a Unicode string in Python, and I would like to remove all the accents (diacritics). I found on the web an elegant way to do this (in Java): convert the Unicode string to its long normalized form (with a separate character for letters and diacritics) remove all the characters whose Unicode type is … Read more