Why is executing Java code in comments with certain Unicode characters allowed?

The following code produces the output “Hello World!” (no really, try it). public static void main(String… args) { // The comment below is not a typo. // \u000d System.out.println(“Hello World!”); } The reason for this is that the Java compiler parses the Unicode character \u000d as a new line and gets transformed into: public static … Read more

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xa0′ in position 20: ordinal not in range(128)

I’m having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup. The problem is that the error is not always reproducible; it sometimes works with some pages, and sometimes, it barfs by throwing a UnicodeEncodeError. I have tried just about everything I can think of, … Read more