Best way to convert text files between character sets?

What is the fastest, easiest tool or method to convert text files between character sets? Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa. Everything goes: one-liners in your favorite scripting language, command-line tools or other utilities for OS, web sites, etc. Best solutions so far: On Linux/UNIX/OS X/cygwin: Gnu iconv suggested … Read more

What is the difference between UTF-8 and Unicode?

I have heard conflicting opinions from people – according to the Wikipedia UTF-8 page. They are the same thing, aren’t they? Can someone clarify? 18 s 18 Let me use an example to illustrate this topic: A Chinese character: 汉 its Unicode value: U+6C49 convert 6C49 to binary: 01101100 01001001 Nothing magical so far, it’s … Read more

Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence

Sample code: >>> import json >>> json_string = json.dumps(“ברי צקלה”) >>> print(json_string) “\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4” The problem: it’s not human readable. My (smart) users want to verify or even edit text files with JSON dumps (and I’d rather not use XML). Is there a way to serialize objects into UTF-8 JSON strings (instead of \uXXXX)? Best … Read more

What’s the difference between UTF-8 and UTF-8 without BOM?

What’s different between UTF-8 and UTF-8 without a BOM? Which is better? 2Best Answer 21 The UTF-8 BOM is a sequence of bytes at the start of a text stream (0xEF, 0xBB, 0xBF) that allows the reader to more reliably guess a file as being encoded in UTF-8. Normally, the BOM is used to signal … Read more

“Unmappable character for encoding UTF-8” error

I’m getting a compile error at the following method. public static boolean isValidPasswd(String passwd) { String reg = “^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\”‘%*=¬.,-])(?=[^\\s]+$).{8,24}$”; return Pattern.matches(reg, passwd); } at Utility.java:[76,74] unmappable character for enoding UTF-8. 74th character is’ ” ‘ How can I fix this? Thanks.