How does WordPress support Unicode?

I’ve been looking through WordPress’ codes and they’re mostly using PHP’s regular string functions, like strlen, strpos, etc. Yet I know WordPress supports utf8, so how does it do that?

Does it overload the regular string functions with multibyte string functions?

If so, is that a good idea in practice? If not, then how do they do it?

2 s

WordPress is written in PHP. As far as I know, PHP doesn’t support any character encoding (UTF-8, UTF-16, …). It just assumes the text to be ASCII encoded.

The actual encoding and decoding is done by your browser. When you write a post, your browser sends the text you just entered as UTF-8 to the server. WordPress just stores it in the database.

The encoding is specified in WordPress’ HTML code:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

This instructs the browser to use “UTF-8” for all text on the page. This includes the actual text on the page as well as all input fields.

So, WordPress doesn’t handle UTF-8 itself. It let’s the browser handle it. (This is also means, if you’d specify a different encoding for the backend and the frontend pages, you’d get garbage text on the frontend.)

As a note: Unlike PHP, MySQL is UTF-8 aware. So, for example, a search for non-ASCII characters yields the correct result because the search is handled by MySQL rather than WordPress.

Leave a Comment