]> The LambCutlet Disorganisation » MySQL 4.1.x/5.x + UTF-8 database + WordPress = ???

The LambCutlet Disorganisation

MySQL 4.1.x/5.x + UTF-8 database + WordPress = ???

Posted by Jonathan at 20:50:05 UTC on the 5th of June, 2006

For those of you like me and have this strange habit of peppering posts in different languages and/or scripts such as 漢字 (Chinese), العربية (Arabic), Ελληνικά (Greek), Русский (Russian), 한국어 (Korean) or extensions to the Latin Alphabet by the way of ß, Ŋ/ŋ, Þ/þ and other such silliness, Unicode as the default encoding (though itself not perfect) is a very good idea!

However, it would seem that WordPress (versions 1.5.x, 2.0.x and 2.1-alpha) chokes when the default encoding of the database is set to UTF-8 (via my.cnf using default-character-set=utf8) which in turns makes the browser display “unknown” characters.

After a bit of digging around and finding a number of Chinese and Japanese blogs each with slightly different takes on a fix, the basic issue was the fact that WordPress was “mangling” the already UTF-8 data through a LATIN1 connection.

So the fix for all this, is to hunt down ./wp-includes/wp-db.php and find the following code (around line 42):


        function wpdb($dbuser, $dbpassword, $dbname, $dbhost) {
                $this->dbh = @mysql_connect($dbhost, $dbuser, $dbpassword);
                if (!$this->dbh) {

… and add en explicit SET NAMES to the query:


        function wpdb($dbuser, $dbpassword, $dbname, $dbhost) {
                $this->dbh = @mysql_connect($dbhost, $dbuser, $dbpassword);
                $this->query("SET NAMES 'utf8'");
                if (!$this->dbh) {

Save the modified file and Unicode lovin’ from database to browser and back shall be yours! ;)

Filed under: Meta, Software

17 Comments »

  1. Hehehe, good ole WankPress. :P

    Comment by Fountain of Apples21:25:28 UTC on the 5th of June, 2006

  2. You might want to look at the comments related to latest MySQL version:
    http://dev.mysql.com/doc/refman/4.1/en/news-4-1-20.html

    Comment by Markus20:03:27 UTC on the 7th of June, 2006

  3. Debian likes to be very conservative with it’s version numbers and very much not on the bleeding edge. That said… there was an update for MySQL 4.1.x today. :D

    Comment by Jonathan Stanley16:14:47 UTC on the 8th of June, 2006

  4. So do you think UTF8 may also be affected by that multi-byte issue?

    Anyway, look at function wpdb::escape, it uses addslashes only, the specific MySQL method is commented.

    Comment by Markus19:36:02 UTC on the 8th of June, 2006

  5. Looking at the actual bug-report, there is no mention of Unicode and needs a string that is actually invalid in whatever charset. This doesn’t happen in Unicode, so said bug shouldn’t affect any of the Unicode encodings.

    Comment by Jonathan Stanley20:12:55 UTC on the 8th of June, 2006

  6. Hey man, so simple solution, but I had to search it and found it on your blogspot ;-)

    Comment by mtg — 16:06:24 UTC on the 12th of July, 2006

  7. Thank you for this post. It is most irritating that this is necessary, however.

    Comment by Andy Wingo10:13:28 UTC on the 7th of March, 2007

  8. The svn trunk version 2.2 finally fixes the UTF-8 issue, whenever that’ll be released is anyone’s guess! ;)

    Comment by Jonathan Stanley15:47:30 UTC on the 7th of March, 2007

  9. Looks interesting!
    Thanks for letting us know about it.
    Regards,
    Thomasena

    Comment by Thomasena09:39:53 UTC on the 25th of November, 2008

  10. Thanks for letting us know about it.

    Comment by ed hardy05:12:32 UTC on the 16th of April, 2010

  11. Looks interesting!

    Thanks for letting us know about i

    Comment by http://www.winteruggshoes.com>UGG Boots00:44:28 UTC on the 13th of August, 2010

  12. Thank you for this post. It is most irritating that this is necessary, however.

    Comment by http://www.winteruggshoes.com>UGG Boots00:47:20 UTC on the 13th of August, 2010

  13. Looking at the actual bug-report, there is no mention of Unicode and needs a string that is actually invalid in whatever charset. This doesn’t happen in Unicode, so said bug shouldn’t affect any of the Unicode encodings.

    Comment by http://www.winteruggshoes.com>UGG Boots00:50:07 UTC on the 13th of August, 2010

  14. hank you for this post. It is most irritating that this is necessary, however.

    Comment by Rado06:15:56 UTC on the 20th of August, 2010

  15. Thanks for letting us know about i

    Comment by Hair straighteners06:18:45 UTC on the 20th of August, 2010

  16. Anyway, look at function wpdb::escape, it uses addslashes only, the specific MySQL method is commented.

    Comment by Hair straighteners06:20:23 UTC on the 20th of August, 2010

  17. this website looks very good, but I just wanna say that, I got a piars of shoes from Nike Air Max, it’s good quality but low price.

    Comment by allsion08:52:05 UTC on the 24th of August, 2010

RSS feed for comments on this post.

Leave a comment

Due to continued annoyance from spam-bots, this site now uses a Captcha. Disabled users can still submit their comments via my contact form.

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

Don't forget that this site support Gravatars!

(required)

(required)

Authorisation code image