Getting Unicode, MySql, and Rails to Cooperate
June 23rd, 2005In a post about Ruby and Unicode a while back, I mentioned the page at the Rails wiki called How To Use Unicode Strings in Rails. (Btw, check out Why the Lucky Stiff’s response to my post, some useful code there.)
It turns out that there were a few more mostly MySQL-specific steps involved in getting Unicode to work correctly with Rails. So I thought I’d describe all the steps we went through to get it set up in one place. This has only been tested with MySQL 4.1.
In MySQL: Set the Encoding when you Create Tables
You need to explicity tell MySQL that you want your tables to be encoded in UTF-8. Here’s a sample table with 3 columns, id, foo, and bar:
create table samples (
id int not null auto_increment,
foo varchar(100) not null,
bar text not null,
primary key (id)
) Type=MyISAM CHARACTER SET utf8;
The line Type=MyISAM CHARACTER SET utf8; is where the action is. The table type has to be MyISAM, not innoDB, because unfortunately innoDB tables don’t support full text searching of UTF-8 encoded content (details here). Apparently innoDB tables are more flexible in general, but if full-text search is crucial for you, you’ll have to go with MyISAM.
Bummer, that.
In any case, then add the CHARACTER SET utf8 directive, as shown.
(I wonder if there is some way to set this as a default, without adding the line to the DDL for every table?)
Set the “charset” and “Content-type” in the Application Controller
This is also described at How To Use Unicode Strings in Rails.
class ApplicationController < ActionController::Base
before_filter :set_charset
def set_charset
@headers["Content-Type"] = "text/html; charset=utf-8"
end
end
The previous steps were enough to get Unicode showing up in a little test app I generated with scaffold (the app just consisted of an input field and a textarea).
To test it, I pasted in some sample text in various languages. It worked okay for text containing only the characters found in ASCII or latin1, but among other characters there were weird cases of random characters being removed or added. The problem seemed not to have anything to do with the script (i.e., the Unicode block). For instance, in an Esperanto text, “ĉ” (U+0109 LATIN SMALL LETTER C WITH CIRCUMFLEX) came out fine, but “ĝ” (U+011D LATIN SMALL LETTER G WITH CIRCUMFLEX) was borked. Go figure.
It took some digging to find the next bit — thanks to Ben Jackson of INCOMUM Design & Conceito for getting the straight story on the Rails list. The solution is…
Tell Rails to tell MySQL to Use UTF-8. (Got that?)
It came down to a MySQL configuration option: You have to tell MySQL to SET NAMES UTF8, as DHH pointed out in the previous link. You can either do it in the source to ActiveRecord, in mysql_adapter.rb, or you can just make the change in your own application. We chose the latter route.
So, here’s our app/controllers/application.rb as it stands:
class ApplicationController < ActionController::Base
before_filter :set_charset
before_filter :configure_charsets
def set_charset
@headers["Content-Type"] = "text/html; charset=utf-8"
end
def configure_charsets
@response.headers["Content-Type"] = "text/html; charset=utf-8"
# Set connection charset. MySQL 4.0 doesn't support this so it
# will throw an error, MySQL 4.1 needs this
suppress(ActiveRecord::StatementInvalid) do
ActiveRecord::Base.connection.execute 'SET NAMES UTF8'
end
end
end
(In case you’re wondering, yes, you can have more than one before_filter.)
class ApplicationController < ActionController::Base
before_filter :configure_charsets
def configure_charsets
@response.headers["Content-Type"] = "text/html; charset=utf-8"
# Set connection charset. MySQL 4.0 doesn't support this so it
# will throw an error, MySQL 4.1 needs this
suppress(ActiveRecord::StatementInvalid) do
ActiveRecord::Base.connection.execute 'SET NAMES UTF8'
end
end
end
UPDATED AGAIN Argh, I forgot to remove the set_charset definition before. It should be correct now…
Anyway, now our Rails installation seems to handle (almost) any nutty writing system that we throw at it.
I’m not sure these are all the Right Way™, I’m just saying it’s worked for us (so far).
For one thing, it seems like it might make sense to just go ahead and set the default character set and collation for MySQL to UTF-8, independently of any Rails stuff at all. Aren’t all character sets supposed to be slouching toward Unicode, anyway? But that sounds like a rather apocalyptic measure, for some reason… I guess I’ll hold off on the rapture of the character sets.
Collation is it’s own topic — but then we’ve not gotten to sorting anything yet. Here’s a recommendation for building the tables with:
CHARACTER SET utf8 COLLATE utf8_general_ci;
To set the collation order. Wading through the MySQL docs on that is next on the agenda.
Thanks, very helpful and worked a charm.
- rg @ 5 July 2005Great, glad to hear it!
By the way, since writing this I ran across:
Using Gettext To Translate Your Rails Application by Sascha Ebach. (I wonder if he has a blog…)
Which explains everything I covered plus how to use Gettext. It’s far more extensive than this little writeup, but hopefully this will remain a good brief intro on MySQL + Rails.
- pat @ 5 July 2005Pat,
Reading this post is better than waking up and finding out you have a super model in your bed (wait a minute I take that back it’s good but not that good), it RULES and you rule for writing it. I’m doing the two-handed sultan salute for helping me with what was turning into one ugly unicode Sunday. Thanks!
suppress(ActiveRecord::StatementInvalid) do
ActiveRecord::Base.connection.execute ‘SET NAMES UTF8′
end
WTF?!? Brilliant.
- Tim Case @ 10 July 2005> (I wonder if there is some way to set this as a default, without adding the line to the DDL for every table?)
- David @ 28 August 2005You can set it up also on database level and server level and also at column level. Good intro at http://www.php-editors.com/mysql_manual/p_manual_Charset.html
Thanks for the article! Very helpful.
The first subsection makes it sound like you can use innoDB, if you’re willing to sacrifice full-text searching. I just tried it on MySQL 4.1.12, and it only works as MyISAM. With innoDB, I got each byte displayed as an ASCII character, with all the other settings (before filter) exactly the same.
- wzph @ 22 September 2005Thanks very much for the configure_charsets code! I want to say that I have tried it with InnoDB tables on MySQL 5.0.x and it seems to work fine! So for those who use foreign keys, no need to panic!
Cheers
- telos @ 21 November 2005telos, do you know if full-text searching works in 5.0?
- pat @ 21 November 2005Actually, if you put this line:
- sebastien @ 9 April 2006encoding: utf8
in your database.yml configuration file, it is
equivalent to: SET NAMES UTF8.
Sebastien’s last suggestion works great here with Mysql 5.0 too. I even removed content-type header and it’s still fine! Thanks for a great write-up!
- Metin Amiroff @ 12 April 2006[…] @headers["Content-Type"] = "text/html; charset=utf-8" end As explained here, MySQL must use MyISAM table type, not innoDB, so make sure […]
- My Other Blog is a Ferrari » Blog Archive » To make apps Japanese compatible @ 23 April 2006Hi,
- EchoOff @ 2 May 2006How about dumping unicode utf8 database from mysql 4.1 and uploading it to mySQL 4.0.
Can I somehow read unicode utf8 records from uploaded tables?
hello there,
- jeff @ 19 September 2006i am trying to reproduce a data server that a company uses to present information to its customers via the web. No problem. What they are using right now is Microsoft Access, connecting via ODBC to a server on line. The cool thing that they can do is have several tables linked together. Like if a certain field is updated in one table, it updates the same info automatically in another table. So, i want to reproduce this in MySQL, but i can’t seem to find out how. Is there a way to pull this off on the server side ? or does that have to be done on the client side ?
thanks for any tips.
@EchoOff: Oops, sorry I missed this question, but I’m afraid I really don’t know the answer. A quick glance at this page:
http://dev.mysql.com/doc/refman/4.1/en/charset-unicode.html
You might be out of luck; it suggests that Unicode (utf-8) support was added in 4.1.
@jeff: Sorry, I have no experience with ODBC or Access.
Good luck, thanks for the comments.
- pat @ 19 September 2006@jeff: what you are describing could be accomplished by a trigger, which is now supported in mysql 5… see http://dev.mysql.com/doc/refman/5.0/en/using-triggers.html for more info.
- tim @ 29 September 2006There’s a simplest way. Just add the following into your config/database.yml:
encoding: utf8
- nullie @ 9 October 2006Great article! I also faced similar issues. My thoughts here:
http://www.gaugeus.com/ramblings/2006/11/15/unicode-support-in-ruby-on-rails-with-mysql-and-microsoft-sql-server
- Andres @ 16 November 2006[…] (参考) Getting Unicode, MySql, and Rails to Cooperate http://ruphus.com/blog/2005/06/23/getting-unicode-mysql-and-rails-to-cooperate/ この記事のタグ:Rails Ruby プログラム 覚書 […]
- Rails UTF-8文字化け対策 » TMJ style @ 10 March 2007On personal opinion, I find this very helpful.
- ocnsss @ 28 March 2007Guys, I have also posted some more relevant info further on this, not sure if you find it useful: http://www.bidmaxhost.com/forum/
[…] Sveiki, štai ir vėl sulaukėm savaitgalio skaitinių rubrikos, po šiokios tokios pertraukos. Šią savaitę buvo publikuotas NePo straipsnis Kompiuterių virusų istorija. Tikriausiai pastebėjot, jog Pixel.lt aktyvumas truputį sumažėjo, tačiau pavasaris sunkus metas, vieniems darbai, kitiems mokslai ir t.t. Linkime sau pasitempti. Šio savaitgalio skaitiniai: Getting Unicode, MySql, and Rails to Cooperate Perl, the first postmodern computer language Top 5 javascript frameworks Susiję straipsniai:Savaitgalio skaitiniai #24Savaitgalio skaitiniai #22Savaitgalio skaitiniai #21 […]
- » Savaitgalio skaitiniai #25 Archyvas » Pixel.lt @ 12 May 2007I do guess that this is executable to see this site, because simply here students can see the fantastic article associated with this topic. Thus, the dissertation service should use this for dissertation form accomplishing.
- Ella28 @ 17 December 2009Thanks so much for your good knowledge associated with this topic. But to find the best essay writing all people should know some facts just about write my essay.
- kuKim @ 31 December 2009You must know that I from time to time buy essaysand buy research paper choosing writing services and some students purchase essay writing about this good post . Thanks for the really hot stuff!
- cfKristin @ 4 January 2010Thank you, it’s very amazing description it might be very helpful for students. For example last year when I had a difficult of time at the end of semester with a constant flow of academic assignments and mission, I had a inspiring idea to buy it somewhere and than use plagiarism checking. I was so dead that I did not care for what can happen when my academic work was written by flipside person. To my adroitest surprise, research paper was estimable the price I paid for it. I was so happy with the quality and now everytime i use this service.
- GWEN19La @ 13 January 2010Hi,
- essay service @ 21 January 2010It must’ve taken you a bit of time, so thanks for taking the time to do so, I appreciate it, and this post is just great.
thank you for share.
- puma boots @ 24 January 2010环球外汇交易 在世界最受欢迎的外汇交易
- 外汇 @ 2 February 2010