One for the gurus: upgrade to 3.x messed up only filenames with accented chars

After upgrading from 2.8.x (maybe it was 2.9.x) to 3.1.2, all the references inside posts to filenames (usually images) which contain accented chars stopped working.

Before, filenames that are displayed in the filesystem like “EXPRESSÃO.jpg” would be correctly called out in the post content HTML as “EXPRESSÃO.jpg”. WP 3.x decided to convert those references to “EXPRESSÃO.jpg” (which is what the filename looked like when created/uploaded by the blog author), so now all the images that have such chars are broken links.

ONLY references to filenames are mangled. All other accented text works fine. i was able to, with the help of a perl script via SSH, obtain a list of all the files that have special chars.

Tabel with all the mangled chars: http://pastebin.com/MMbjJphU
(BTW which table is this?)

wp-config shows (yes i already tried to comment them out)

define('DB_CHARSET', '');
define('DB_COLLATE', '');

mySQL shows:

DB Collation in effect: latin1_swedish_ci    
default DB Collation: latin1_swedish_ci    
Tables Collation: latin1_swedish_ci    
Columns Collation: latin1_swedish_ci

whats the quickest way to solve this problem? from what i see, i have 2 options:

A) i could rename the files to use the correct spelling (EXPRESSÃO.jpg -> EXPRESSÃO.jpg). I am leaning towards this. Maybe someone could help with a php/perl/python script that would rename the files?

B) i could batch update wp-posts in phpmyadmin to start using all those weird chars (EXPRESSÃO.jpg -> EXPRESSÃO.jpg).

2 Answers
2

Maybe you should consider option C).
Convert all accented characters to normal UTF-8 characters.
So EXPRESSÃO.jpg -> EXPRESSAO.jpg

I think this would help you a lot, not only when it come sto coding and file systems, but also storing names / references in databases.

Update

This is a function I use for removing accents. I found this solution somewhere on the web.

  function sanitizeName($name)
  {
      $pattern = array("'é'", "'è'", "'ë'", "'ê'", "'É'", "'È'", "'Ë'", "'Ê'", "'á'", "'à'", "'ä'", "'â'", "'å'", "'Á'", "'À'", "'Ä'", "'Â'", "'Å'", "'ó'", "'ò'", "'ö'", "'ô'", "'Ó'", "'Ò'", "'Ö'", "'Ô'", "'í'", "'ì'", "'ï'", "'î'", "'Í'", "'Ì'", "'Ï'", "'Î'", "'ú'", "'ù'", "'ü'", "'û'", "'Ú'", "'Ù'", "'Ü'", "'Û'", "'ý'", "'ÿ'", "'Ý'", "'ø'", "'Ø'", "'œ'", "'Œ'", "'Æ'", "'ç'", "'Ç'");
      $replace = array('e', 'e', 'e', 'e', 'E', 'E', 'E', 'E', 'a', 'a', 'a', 'a', 'a', 'A', 'A', 'A', 'A', 'A', 'o', 'o', 'o', 'o', 'O', 'O', 'O', 'O', 'i', 'i', 'i', 'I', 'I', 'I', 'I', 'I', 'u', 'u', 'u', 'u', 'U', 'U', 'U', 'U', 'y', 'y', 'Y', 'o', 'O', 'a', 'A', 'A', 'c', 'C');        

      $name = preg_replace($pattern, $replace, $name);      
      return $name;
  }

So maybe you can read all files, clean the names using this function, and save the files with new name.

In the end, I gave up accents and I now convert all accents to HTML equvalant.So Abaeté; is stored as Abaeté in my DB.

Leave a Comment