]> The LambCutlet Disorganisation » MySQL 4.1.x/5.x + UTF-8 database + WordPress = ???

The LambCutlet Disorganisation

MySQL 4.1.x/5.x + UTF-8 database + WordPress = ???

Posted by Jonathan at 20:50:05 UTC on the 5th of June, 2006

For those of you like me and have this strange habit of peppering posts in different languages and/or scripts such as 漢字 (Chinese), العربية (Arabic), Ελληνικά (Greek), Русский (Russian), 한국어 (Korean) or extensions to the Latin Alphabet by the way of ß, Ŋ/ŋ, Þ/þ and other such silliness, Unicode as the default encoding (though itself not perfect) is a very good idea!

However, it would seem that WordPress (versions 1.5.x, 2.0.x and 2.1-alpha) chokes when the default encoding of the database is set to UTF-8 (via my.cnf using default-character-set=utf8) which in turns makes the browser display “unknown” characters.

After a bit of digging around and finding a number of Chinese and Japanese blogs each with slightly different takes on a fix, the basic issue was the fact that WordPress was “mangling” the already UTF-8 data through a LATIN1 connection.

So the fix for all this, is to hunt down ./wp-includes/wp-db.php and find the following code (around line 42):


        function wpdb($dbuser, $dbpassword, $dbname, $dbhost) {
                $this->dbh = @mysql_connect($dbhost, $dbuser, $dbpassword);
                if (!$this->dbh) {

… and add en explicit SET NAMES to the query:


        function wpdb($dbuser, $dbpassword, $dbname, $dbhost) {
                $this->dbh = @mysql_connect($dbhost, $dbuser, $dbpassword);
                $this->query("SET NAMES 'utf8'");
                if (!$this->dbh) {

Save the modified file and Unicode lovin’ from database to browser and back shall be yours! ;)

Filed under: Meta, Software

9 Comments »

  1. Hehehe, good ole WankPress. :P

    Comment by Fountain of Apples21:25:28 UTC on the 5th of June, 2006

  2. You might want to look at the comments related to latest MySQL version:
    http://dev.mysql.com/doc/refman/4.1/en/news-4-1-20.html

    Comment by Markus20:03:27 UTC on the 7th of June, 2006

  3. Debian likes to be very conservative with it’s version numbers and very much not on the bleeding edge. That said… there was an update for MySQL 4.1.x today. :D

    Comment by Jonathan Stanley16:14:47 UTC on the 8th of June, 2006

  4. So do you think UTF8 may also be affected by that multi-byte issue?

    Anyway, look at function wpdb::escape, it uses addslashes only, the specific MySQL method is commented.

    Comment by Markus19:36:02 UTC on the 8th of June, 2006

  5. Looking at the actual bug-report, there is no mention of Unicode and needs a string that is actually invalid in whatever charset. This doesn’t happen in Unicode, so said bug shouldn’t affect any of the Unicode encodings.

    Comment by Jonathan Stanley20:12:55 UTC on the 8th of June, 2006

  6. Hey man, so simple solution, but I had to search it and found it on your blogspot ;-)

    Comment by mtg — 16:06:24 UTC on the 12th of July, 2006

  7. Thank you for this post. It is most irritating that this is necessary, however.

    Comment by Andy Wingo10:13:28 UTC on the 7th of March, 2007

  8. The svn trunk version 2.2 finally fixes the UTF-8 issue, whenever that’ll be released is anyone’s guess! ;)

    Comment by Jonathan Stanley15:47:30 UTC on the 7th of March, 2007

  9. Looks interesting!
    Thanks for letting us know about it.
    Regards,
    Thomasena

    Comment by Thomasena09:39:53 UTC on the 25th of November, 2008

RSS feed for comments on this post.

Leave a comment

Due to continued annoyance from spam-bots, this site now uses a Captcha. Disabled users can still submit their comments via my contact form.

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

Don't forget that this site support Gravatars!

(required)

(required)

Authorisation code image