“for line in…” results in UnicodeDecodeError: ‘utf-8’ codec can’t decode byte

Here is my code, for line in open(‘u.item’): # Read each line Whenever I run this code it gives the following error: UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xe9 in position 2892: invalid continuation byte I tried to solve this and add an extra parameter in open(). The code looks like: for line in open(‘u.item’, … Read more

Detect encoding and make everything UTF-8

I’m reading out lots of texts from various RSS feeds and inserting them into my database. Of course, there are several different character encodings used in the feeds, e.g. UTF-8 and ISO 8859-1. Unfortunately, there are sometimes problems with the encodings of the texts. Example: The “ß” in “Fußball” should look like this in my database: … Read more

What does “Content-type: application/json; charset=utf-8” really mean?

When I make a POST request with a JSON body to my REST service I include Content-type: application/json; charset=utf-8 in the message header. Without this header, I get an error from the service. I can also successfully use Content-type: application/json without the ;charset=utf-8 portion. What exactly does charset=utf-8 do ? I know it specifies the … Read more

Why do we use Base64?

Wikipedia says Base64 encoding schemes are commonly used when there is a need to encode binary data that needs be stored and transferred over media that are designed to deal with textual data. This is to ensure that the data remains intact without modification during transport. But is it not that data is always stored/transmitted … Read more

Setting the default Java character encoding

How do I properly set the default character encoding used by the JVM (1.5.x) programmatically? I have read that -Dfile.encoding=whatever used to be the way to go for older JVMs. I don’t have that luxury for reasons I wont get into. I have tried: System.setProperty(“file.encoding”, “UTF-8”); And the property gets set, but it doesn’t seem … Read more

What is the difference between utf8mb4 and utf8 charsets in MySQL?

What is the difference between utf8mb4 and utf8 charsets in MySQL? I already know about ASCII, UTF-8, UTF-16 and UTF-32 encodings; but I’m curious to know whats the difference of utf8mb4 group of encodings with other encoding types defined in MySQL Server. Are there any special benefits/proposes of using utf8mb4 rather than utf8? 5 Answers … Read more