Getting Unicode, MySql, and Rails to Cooperate
In a post about Ruby and Unicode a while back, I mentioned the page at the Rails wiki called How To Use Unicode Strings in Rails. (Btw, check out Why the Lucky Stiff’s response to my post, some useful code there.)
It turns out that there were a few more mostly MySQL-specific steps involved in getting Unicode to work correctly with Rails. So I thought I’d describe all the steps we went through to get it set up in one place. This has only been tested with MySQL 4.1.
In MySQL: Set the Encoding when you Create Tables
You need to explicity tell MySQL that you want your tables to be encoded in UTF-8. Here’s a sample table with 3 columns, id
, foo
, and bar
:
create table samples ( id int not null auto_increment, foo varchar(100) not null, bar text not null, primary key (id) ) Type=MyISAM CHARACTER SET utf8;
The line Type=MyISAM CHARACTER SET utf8;
is where the action is. The table type has to be MyISAM, not innoDB, because unfortunately innoDB tables don’t support full text searching of UTF-8 encoded content (details here). Apparently innoDB tables are more flexible in general, but if full-text search is crucial for you, you’ll have to go with MyISAM.
Bummer, that.
In any case, then add the CHARACTER SET utf8
directive, as shown.
(I wonder if there is some way to set this as a default, without adding the line to the DDL for every table?)
Set the “charset” and “Content-type” in the Application Controller
This is also described at How To Use Unicode Strings in Rails.
class ApplicationController < ActionController::Base before_filter :set_charset def set_charset @headers["Content-Type"] = "text/html; charset=utf-8" end end
The previous steps were enough to get Unicode showing up in a little test app I generated with scaffold
(the app just consisted of an input field and a textarea).
To test it, I pasted in some sample text in various languages. It worked okay for text containing only the characters found in ASCII or latin1, but among other characters there were weird cases of random characters being removed or added. The problem seemed not to have anything to do with the script (i.e., the Unicode block). For instance, in an Esperanto text, “ĉ” (U+0109 LATIN SMALL LETTER C WITH CIRCUMFLEX) came out fine, but “ĝ” (U+011D LATIN SMALL LETTER G WITH CIRCUMFLEX) was borked. Go figure.
It took some digging to find the next bit — thanks to Ben Jackson of INCOMUM Design & Conceito for getting the straight story on the Rails list. The solution is…
Tell Rails to tell MySQL to Use UTF-8. (Got that?)
It came down to a MySQL configuration option: You have to tell MySQL to SET NAMES UTF8
, as DHH pointed out in the previous link. You can either do it in the source to ActiveRecord, in mysql_adapter.rb
, or you can just make the change in your own application. We chose the latter route.
So, here’s our app/controllers/application.rb
as it stands:
class ApplicationController < ActionController::Base before_filter :set_charset before_filter :configure_charsets def set_charset @headers["Content-Type"] = "text/html; charset=utf-8" end def configure_charsets @response.headers["Content-Type"] = "text/html; charset=utf-8" # Set connection charset. MySQL 4.0 doesn't support this so it # will throw an error, MySQL 4.1 needs this suppress(ActiveRecord::StatementInvalid) do ActiveRecord::Base.connection.execute 'SET NAMES UTF8' end end end
(In case you’re wondering, yes, you can have more than one before_filter
.)
class ApplicationController < ActionController::Base before_filter :configure_charsets def configure_charsets @response.headers["Content-Type"] = "text/html; charset=utf-8" # Set connection charset. MySQL 4.0 doesn't support this so it # will throw an error, MySQL 4.1 needs this suppress(ActiveRecord::StatementInvalid) do ActiveRecord::Base.connection.execute 'SET NAMES UTF8' end end end
UPDATED AGAIN Argh, I forgot to remove the set_charset
definition before. It should be correct now…
Anyway, now our Rails installation seems to handle (almost) any nutty writing system that we throw at it.
I’m not sure these are all the Right Way™, I’m just saying it’s worked for us (so far).
For one thing, it seems like it might make sense to just go ahead and set the default character set and collation for MySQL to UTF-8, independently of any Rails stuff at all. Aren’t all character sets supposed to be slouching toward Unicode, anyway? But that sounds like a rather apocalyptic measure, for some reason… I guess I’ll hold off on the rapture of the character sets.
Collation is it’s own topic — but then we’ve not gotten to sorting anything yet. Here’s a recommendation for building the tables with:
CHARACTER SET utf8 COLLATE utf8_general_ci;
To set the collation order. Wading through the MySQL docs on that is next on the agenda.
rg Said,
July 5, 2005 @ 9:11 am
Thanks, very helpful and worked a charm.
pat Said,
July 5, 2005 @ 2:11 pm
Great, glad to hear it!
By the way, since writing this I ran across:
Using Gettext To Translate Your Rails Application by Sascha Ebach. (I wonder if he has a blog…)
Which explains everything I covered plus how to use Gettext. It’s far more extensive than this little writeup, but hopefully this will remain a good brief intro on MySQL + Rails.
Tim Case Said,
July 10, 2005 @ 1:09 pm
Pat,
Reading this post is better than waking up and finding out you have a super model in your bed (wait a minute I take that back it’s good but not that good), it RULES and you rule for writing it. I’m doing the two-handed sultan salute for helping me with what was turning into one ugly unicode Sunday. Thanks!
suppress(ActiveRecord::StatementInvalid) do
ActiveRecord::Base.connection.execute ‘SET NAMES UTF8′
end
WTF?!? Brilliant.
David Said,
August 28, 2005 @ 1:41 pm
> (I wonder if there is some way to set this as a default, without adding the line to the DDL for every table?)
You can set it up also on database level and server level and also at column level. Good intro at http://www.php-editors.com/mysql_manual/p_manual_Charset.html