Saving utf-8 texts with json.dumps as UTF8, not as \u escape sequence

Sample code: >>> import json >>> json_string = json.dumps(“ברי צקלה”) >>> print(json_string) “\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4” The problem: it’s not human readable. My (smart) users want to verify or even edit text files with JSON dumps (and I’d rather not use XML). Is there a way to serialize objects into UTF-8 JSON strings (instead of \uXXXX)? Best … Read more

How does Zalgo text work?

I’ve seen weirdly formatted text called Zalgo like below written on various forums. It’s kind of annoying to look at, but it really bothers me because it undermines my notion of what a character is supposed to be. My understanding is that a character is supposed to move horizontally across a line and stay within … Read more

How do I see what character set a MySQL database / table / column is?

What is the (default) charset for: MySQL database MySQL table MySQL column 15 s 15 Here’s how I’d do it – For Schemas (or Databases – they are synonyms): SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name = “schemaname”; For Tables: SELECT CCSA.character_set_name FROM information_schema.`TABLES` T, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = “schemaname” AND … Read more

What exactly do “u” and “r” string flags do, and what are raw string literals?

While asking this question, I realized I didn’t know much about raw strings. For somebody claiming to be a Django trainer, this sucks. I know what an encoding is, and I know what u” alone does since I get what is Unicode. But what does r” do exactly? What kind of string does it result … Read more

UnicodeDecodeError: ‘charmap’ codec can’t decode byte X in position Y: character maps to

I’m trying to get a Python 3 program to do some manipulations with a text file filled with information. However, when trying to read the file I get the following error: Traceback (most recent call last): File “SCRIPT LOCATION”, line NUMBER, in <module> text = file.read()` File “C:\Python31\lib\encodings\cp1252.py”, line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: … Read more

std::wstring VS std::string

I am not able to understand the differences between std::string and std::wstring. I know wstring supports wide characters such as Unicode characters. I have got the following questions: When should I use std::wstring over std::string? Can std::string hold the entire ASCII character set, including the special characters? Is std::wstring supported by all popular C++ compilers? … Read more

What’s the difference between UTF-8 and UTF-8 without BOM?

What’s different between UTF-8 and UTF-8 without a BOM? Which is better? 2Best Answer 21 The UTF-8 BOM is a sequence of bytes at the start of a text stream (0xEF, 0xBB, 0xBF) that allows the reader to more reliably guess a file as being encoded in UTF-8. Normally, the BOM is used to signal … Read more

What characters can be used for up/down triangle (arrow without stem) for display in HTML?

I’m looking for a HTML or ASCII character which is a triangle pointing up or down so that I can use it as a toggle switch. I found ↑ (&uarr;), and ↓ (&darr;) – but those have a narrow stem. I’m looking just for the HTML arrow “head”. 20 20 Unicode arrows heads: ▲ – … Read more