[lang]

Present Perfect

Personal
Projects
Packages
Patches
Presents
Linux

Picture Gallery
Present Perfect

mysql and utf-8

Filed under: Hacking — Thomas @ 10:30

2009-07-18
10:30

Someone will probably tell me what knob to turn or what option to twiddle to make this Do The Right Thing, because I can't let myself think MySQL and its tools would still not support utf-8 correctly:


mysql> select * from artist where artistname like '%gur%';
+-----------+------------+-----------------+
| artist_id | artistname | first_letter_id |
+-----------+------------+-----------------+
|       258 | Sigur Rós |               3 | 
|       620 | Guru       |              15 | 
+-----------+------------+-----------------+
2 rows in set (0.00 sec)

12 Comments »

  1. 2 RTFMs for you:
    http://www.phpwact.org/php/i18n/utf-8/mysql
    http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html

    BONUS:
    I searched “utf8 mysql” in stackoverflow.com and found this one:
    http://stackoverflow.com/questions/346531/utf8-problem-with-mysql-5

    Comment by elcuco — 2009-07-18 @ 11:08

  2. so.. what is the problem ? the wrong width of first row ?

    Comment by locked — 2009-07-18 @ 12:00

  3. Have you tried:

    –default-character-set=utf8

    for the command-line tools? MySQL and character encodings is tricky because of the many points of failure. It is possible to have an application write and read data in what it thinks is UTF-8, and work perfectly, even though MySQL believes you are using US-ASCII. This leads to cases where everything works, except the MySQL CLI tools. Encoding can be set on a per-table basis as well.

    Comment by Jeroen Hoek — 2009-07-18 @ 12:10

  4. The blog-engine made an N-dash out of the two hyphens preceding “default-character-set”, so correct that if you copy/paste that option.

    Comment by Jeroen Hoek — 2009-07-18 @ 12:12

  5. By default MYSQL server store all strings in latin1, and client connection.

    show variables like ‘char%’;

    You probably need to change this.

    For client, try “set names UTF8;”

    For server, this must be change “before” init_db (AFAIK)

    +

    Comment by Remi — 2009-07-18 @ 12:26

  6. “I can’t let myself think MySQL and its tools would still not support utf-8 correctly”

    What’s life as an idealist like?

    Comment by Jo Shields — 2009-07-18 @ 12:37

  7. To do a case sensitive search you need to use a case-sensitive collation, for example, “select * from artist where artistname like ‘%gur%’ collate utf8_bin;”

    Comment by Chaz6 — 2009-07-18 @ 13:05

  8. No, it won’t support UTF-8 properly no matter what you do. Likewise it is very likely to fuck up any attempt to use the date/time related functions (they are not timezone/locale aware at all), and will also suck with more complex join statements (going havoc into eternal loops etc where any other database would just give you answer nicely). Seriously, get a real database application. DB2, MSSQL, Oracle are good. Sybase and Informics if you’re into data warehousing and stuff like that. If you insist on open source, take the Oracle “clone” of open source world: postgresql.

    Comment by meh — 2009-07-18 @ 13:19

  9. I don’t see anything wrong. “like” is case insensitive and you asked for any artistname that contains “gur” and that’s what mysql gave you.

    Comment by Weboide — 2009-07-18 @ 15:06

  10. It’s not that hard to get it right:
    http://uwstopia.nl/blog/2007/01/simple-utf-8-and-mysql-how-to

    If you want case-sensitive matching you should either use a binary collation on the column or a binary collation for your LIKE clause.

    Comment by Wouter — 2009-07-18 @ 17:03

  11. http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html

    Comment by Gianni — 2009-07-20 @ 14:38

  12. Comment 8 had it so right, up until “real database application. DB2, MSSQL, Oracle”.

    Seriously, what the heck? postgresql an oracle clone?

    Just use postgresql and be happy.

    (And to answer this particular question for postgresql: just use an appropriate locale when creating the database.)

    Comment by Anonymous — 2009-07-21 @ 08:21

RSS feed for comments on this post. TrackBack URL

Leave a comment

picture