{"id":159,"date":"2009-06-20T16:14:18","date_gmt":"2009-06-20T03:14:18","guid":{"rendered":"http:\/\/www.thunderguy.com\/semicolon\/?p=159"},"modified":"2009-06-20T16:21:37","modified_gmt":"2009-06-20T03:21:37","slug":"mysql-character-encodings","status":"publish","type":"post","link":"https:\/\/thunderguy.com\/semicolon\/2009\/06\/20\/mysql-character-encodings\/","title":{"rendered":"MySQL character encodings"},"content":{"rendered":"<p>I recently noticed that many of the comments and trackbacks on this website were composed entirely of question marks. At first I thought it might be plain old spam, but it turned out to be a character encoding problem. Here&#8217;s how I fixed it.<\/p>\n<p>Excessive question marks often indicate a problem with character encoding. After a little investigation I realised that the character collation for most of the WordPress database tables was set to <code>latin1_swedish_ci<\/code>, indicating a character set of <code>latin1<\/code>. Clearly this was going to cause problems for non-western languages; I noticed problems with comments from Israel, Eastern Europe and East Asia amongst others.<\/p>\n<p>The fix was pretty simple: change the character set used by the comments table to UTF-8 like this.<\/p>\n<pre class=\"code\"><code>ALTER TABLE wp_comments CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;<\/code><\/pre>\n<p>I think this happened when I restored the database from a backup; the collations were reset to <code>latin1_swedish_ci<\/code>, which is the MySQL default. Sadly, the encoding problem was happening when data went into the database, so all the current question marks will remain. At least future comments and trackbacks will now appear correctly, I hope.<\/p>\n<p>There is one strange thing though &#8212; the comments table in another blog of mine has its collation also set to <code>latin1_swedish_ci<\/code>, but it can handle Japanese, Hebrew, Russian, and anything else I throw at it. The case continues.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I recently noticed that many of the comments and trackbacks on this website were composed entirely of question marks. At first I thought it might be plain old spam, but it turned out to be a character encoding problem. Here&#8217;s how I fixed it. Excessive question marks often indicate a problem with character encoding. After [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13],"tags":[5,83],"class_list":["post-159","post","type-post","status-publish","format-standard","hentry","category-wordpress","tag-sql","tag-wordpress"],"_links":{"self":[{"href":"https:\/\/thunderguy.com\/semicolon\/wp-json\/wp\/v2\/posts\/159","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/thunderguy.com\/semicolon\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/thunderguy.com\/semicolon\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/thunderguy.com\/semicolon\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/thunderguy.com\/semicolon\/wp-json\/wp\/v2\/comments?post=159"}],"version-history":[{"count":3,"href":"https:\/\/thunderguy.com\/semicolon\/wp-json\/wp\/v2\/posts\/159\/revisions"}],"predecessor-version":[{"id":161,"href":"https:\/\/thunderguy.com\/semicolon\/wp-json\/wp\/v2\/posts\/159\/revisions\/161"}],"wp:attachment":[{"href":"https:\/\/thunderguy.com\/semicolon\/wp-json\/wp\/v2\/media?parent=159"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/thunderguy.com\/semicolon\/wp-json\/wp\/v2\/categories?post=159"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/thunderguy.com\/semicolon\/wp-json\/wp\/v2\/tags?post=159"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}