For those of you like me and have this strange habit of peppering posts in different languages and/or scripts such as 漢字 (Chinese), العربية (Arabic), Ελληνικά (Greek), Русский (Russian), 한국어 (Korean) or extensions to the Latin Alphabet by the way of ß, Ŋ/ŋ, Þ/þ and other such silliness, Unicode as the default encoding (though itself not perfect) is a very good idea!
However, it would seem that WordPress (versions 1.5.x, 2.0.x and 2.1-alpha) chokes when the default encoding of the database is set to UTF-8 (via my.cnf using default-character-set=utf8) which in turns makes the browser display “unknown” characters.
After a bit of digging around and finding a number of Chinese and Japanese blogs each with slightly different takes on a fix, the basic issue was the fact that WordPress was “mangling” the already UTF-8 data through a LATIN1 connection.
So the fix for all this, is to hunt down ./wp-includes/wp-db.php and find the following code (around line 42):
function wpdb($dbuser, $dbpassword, $dbname, $dbhost) {
$this->dbh = @mysql_connect($dbhost, $dbuser, $dbpassword);
if (!$this->dbh) {
… and add en explicit SET NAMES to the query:
function wpdb($dbuser, $dbpassword, $dbname, $dbhost) {
$this->dbh = @mysql_connect($dbhost, $dbuser, $dbpassword);
$this->query("SET NAMES 'utf8'");
if (!$this->dbh) {
Save the modified file and Unicode lovin’ from database to browser and back shall be yours! ;)
Hehehe, good ole WankPress. :P
Comment by Fountain of Apples — 21:25:28 UTC on the 5th of June, 2006
You might want to look at the comments related to latest MySQL version:
http://dev.mysql.com/doc/refman/4.1/en/news-4-1-20.html
Comment by Markus — 20:03:27 UTC on the 7th of June, 2006
Debian likes to be very conservative with it’s version numbers and very much not on the bleeding edge. That said… there was an update for MySQL 4.1.x today. :D
Comment by Jonathan Stanley — 16:14:47 UTC on the 8th of June, 2006
So do you think UTF8 may also be affected by that multi-byte issue?
Anyway, look at function wpdb::escape, it uses addslashes only, the specific MySQL method is commented.
Comment by Markus — 19:36:02 UTC on the 8th of June, 2006
Looking at the actual bug-report, there is no mention of Unicode and needs a string that is actually invalid in whatever charset. This doesn’t happen in Unicode, so said bug shouldn’t affect any of the Unicode encodings.
Comment by Jonathan Stanley — 20:12:55 UTC on the 8th of June, 2006
Hey man, so simple solution, but I had to search it and found it on your blogspot ;-)
Comment by mtg — 16:06:24 UTC on the 12th of July, 2006
Thank you for this post. It is most irritating that this is necessary, however.
Comment by Andy Wingo — 10:13:28 UTC on the 7th of March, 2007
The svn trunk version 2.2 finally fixes the
UTF-8issue, whenever that’ll be released is anyone’s guess! ;)Comment by Jonathan Stanley — 15:47:30 UTC on the 7th of March, 2007