PHP DOMDocument loadHTML not encoding UTF-8 correctly

I’m trying to parse some HTML using DOMDocument, but when I do, I suddenly lose my encoding (at least that is how it appears to me). $profile = “<div><p>various japanese characters</p></div>”; $dom = new DOMDocument(); $dom->loadHTML($profile); $divs = $dom->getElementsByTagName(‘div’); foreach ($divs as $div) { echo $dom->saveHTML($div); } The result of this code is that I … Read more

u’\ufeff’ in Python string

I got an error with the following exception message: UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\ufeff’ in position 155: ordinal not in range(128) Not sure what u’\ufeff’ is, it shows up when I’m web scraping. How can I remedy the situation? The .replace() string method doesn’t work on it. 6 Answers 6

UTF-8 byte[] to String

Let’s suppose I have just used a BufferedInputStream to read the bytes of a UTF-8 encoded text file into a byte array. I know that I can use the following routine to convert the bytes to a string, but is there a more efficient/smarter way of doing this than just iterating through the bytes and … Read more

error UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xff in position 0: invalid start byte

https://github.com/affinelayer/pix2pix-tensorflow/tree/master/tools An error occurred when compiling “process.py” on the above site. python tools/process.py –input_dir data — operation resize –outp ut_dir data2/resize data/0.jpg -> data2/resize/0.png Traceback (most recent call last): File “tools/process.py”, line 235, in <module> main() File “tools/process.py”, line 167, in main src = load(src_path) File “tools/process.py”, line 113, in load contents = open(path).read() File”/home/user/anaconda3/envs/tensorflow_2/lib/python3.5/codecs.py”, … Read more

“Incorrect string value” when trying to insert UTF-8 into MySQL via JDBC?

This is how my connection is set: Connection conn = DriverManager.getConnection(url + dbName + “?useUnicode=true&characterEncoding=utf-8”, userName, password); And I’m getting the following error when tyring to add a row to a table: Incorrect string value: ‘\xF0\x90\x8D\x83\xF0\x90…’ for column ‘content’ at row 1 I’m inserting thousands of records, and I always get this error when the … Read more

Using PowerShell to write a file in UTF-8 without the BOM

Out-File seems to force the BOM when using UTF-8: $MyFile = Get-Content $MyPath $MyFile | Out-File -Encoding “UTF8” $MyPath How can I write a file in UTF-8 with no BOM using PowerShell? Update 2021 PowerShell has changed a bit since I wrote this question 10 years ago. Check multiple answers below, they have a lot … Read more