Strange characters – despite everything being UTF-8

Not sure why this thread was closed, but this is the same issue inflicting many people.

All my WP config settings are in order:

//define('DB_CHARSET', 'utf8');
//define('DB_CHARSET', 'utf8_unicode_ci');
//define('DB_COLLATE', '');

I even tried enabling them one by one. None worked.

When I save a post, weird characters appear in place of apostrophes and spaces. This happens whether I type content manually or

I’ve tried a few plugins.

  1. UTF-8 Sanitize
  2. Convert WP to UTF-8

..etc.

None of them work. The problem persists.

I’ve also changed the database’s character set and collation in MYSQL. Screenshot:

MySQL tables/columns are all utf-8

This is a screenshot of me entering some content by pasting it:

Text is fine when entering it in the WP UI

But immediately upon saving, the text comes back with garbled characters having replaced it:

Annoying garbled characters

What else?

I went through the rigmarole of dumping the entire MySQL DB, then removing all older non-utf8 characters via TR command.

tr -cd '\11\12\15\40-\176' < file-with-binary-chars > clean-file

Where the file-with-binary-chars was the MySQL dump. Then I restored the table.

My MySQL config is all utf8:

[client]
default-character-set=utf8

[mysqld]
character-set-client=utf8
collation-server=utf8_unicode_ci
character-set-server=utf8

My browser is Chrome. The encoding is UTF-8 (in the VIEW menu).

What else can I do? Do I need to make all the plugin files utf-8 as well?

FYI, this blog is one of the WordPress blogs. There are other newer WordPress installations on the same server using the same installation of MySQL 5.6.17, but they don’t have such an issue. My guess is that this being an older blog may have some variations in the text entered long ago, but frankly after having done all of the above, I really don’t know what else I can do.

Thanks for any inputs or pointers!

6 s
6

This is typically caused when you are copying/pasting MS Word information into the WordPress content editor. WordPress uses something called “Smart Quotes”, via a function named wptexturize().

Ideal Solution

The ideal solution would be to go back through your content, and replace all single/double quotes using the keyboard.

However, if you’re working with massive copy/pastes, this may not be feasible.

Disable wptexturize() Filter

Another option is to disable the wptexturize() filter from running; which can be accomplished by placing the following code in your child theme functions.php file:

remove_filter('the_content', 'wptexturize');

You may also wish to remove the filter from comments and/or excerpts:

remove_filter('comment_text', 'wptexturize');
remove_filter('the_excerpt', 'wptexturize');

Or for titles:

remove_filter ('single_post_title', 'wptexturize');
remove_filter ('the_title', 'wptexturize');
remove_filter ('wp_title', 'wptexturize');

Clean Database

For existing content which has already saved the “weird” characters into the database; you may need to clean the database by running the following queries from PHPMyAdmin (be sure to take a database backup first):

UPDATE wp_posts SET post_content = REPLACE(post_content, '“', '“');
UPDATE wp_posts SET post_content = REPLACE(post_content, 'â€', '”');
UPDATE wp_posts SET post_content = REPLACE(post_content, '’', '’');
UPDATE wp_posts SET post_content = REPLACE(post_content, '‘', '‘');
UPDATE wp_posts SET post_content = REPLACE(post_content, '—', '–');
UPDATE wp_posts SET post_content = REPLACE(post_content, '–', '—');
UPDATE wp_posts SET post_content = REPLACE(post_content, '•', '-');
UPDATE wp_posts SET post_content = REPLACE(post_content, '…', '…');

Plugins

Well… it’s WordPress. You can always use a plugin to help manage the wptexturize() filter. Take a look through This List, and see if one is right for you.

Leave a Comment