Portable php-mysql connection charset fix by Advies en zo :: Meedenken en -doen

view a random? 180 views

I’m having some interesting problems with charsets at Whizzoo.com, in particular getting the characters with question marks to represent real characters. Unfortunately I think a lot of the Iñtërnâtiônàlizætiøn issues are probably removed from the database now and replaced with issues with PHP string functions, i.e. remove special characters and to lower case. I’m hoping I can remove these issues and make Whizzoo acceptable to more international audiences.

I have found some interesting resources with Google… unfortunately I still have the problem - so they are not yet useful ;) It seems that even though I am defining utf8 wherever I can, ISO-8859-1 is still used? Though it only appears to be the encoding on the popup results when you are typing a ‘whizzoo’, this is the only place that the chosen characters actually appear correctly.

Portable php-mysql connection charset fix by Advies en zo :: Meedenken en -doen

// Make sure any results we retrieve or commands we send use the same
// charset and collation as the database:
$db_charset = $conn->query( "SHOW VARIABLES LIKE 'character_set_database';" );
$charset_row = mysqli_fetch_assoc( $db_charset );
$conn->query( "SET NAMES '" . $charset_row['Value'] . "'" );

Handling UTF-8 with PHP

Character sets and encoding issues

Update: [29-Mar-2008] I didn’t actually solve the problem above. There wasn’t a problem on my local PC, because I set up my database server to default to UTF-8. For some reason, our web host provides a DB server that defaults to latin-1. So even though each database is carefully constructed as a UTF-8 type, and then each table and finally each column, the connection still defaults to latin-1. I provide the solution in a later post and below for your Google searching convenience.

@$conn = new mysqli(DB_SERVER, DB_USER, DB_PASS, DB_NAME);
// or die(mysql_error());
if (!$conn||mysqli_connect_errno())
{
throw new Exception('Could not connect to database server:'.
mysqli_connect_error());
}
else
{
$conn->query( "SET NAMES 'utf8'" );
$conn->set_charset("utf8");
return $conn;
}

It is important to note that all encoding types should be compatible. As with reading a file, if one part of the puzzle expects a default type, then the process will probably not cause an error, but the data may not be what is expected. As a rule of thumb, test the system with the data you expect to encounter - if it doesn’t appears incorrectly you know you have a problem.

Using a string like Iñtërnâtiônàlizætiøn is recommended, though I found this didn’t quite test to the limits that I wanted to support. If you are also interested in supporting multiple languages outside of the basic latin character range, then I’d suggest also testing those languages specifically. For example:

  • Chinese: 我是这样改用的
  • Russian: Я Б Г Д Ж Й
  • German: Ä ä Ü ü ß
  • Hankaku: アイウエオカキクケコサシスセソタチツテ
  • Kanji: てすと
  • Polish: Ł Ą Ż Ę Ć Ń Ś Ź
  • Hebrew - which adds the complexity of being a right to left language (and I have no idea what this says): עברית רשמית בבלוגר
  • Arabic - another right to left language: عشية قمة

If your application handles this sort of data, then it is likely to handle most. If your data shows up as empty boxes or question marks, then it is likely you haven’t got the right charset or encoding somewhere in the your application pipeline. Or you may have to install the fonts or language support on your computer. This is very easy in windows, just go into your language settings from the control panel and add the languages that you’d like to be able to see.

Some of the language specific test cases come from the external webpage Test for UTF-8: Japanese, German, Russian, Polish.

Last 5 posts by James Little
Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • MisterWong
  • Reddit
  • Scoopit
  • StumbleUpon
  • Technorati
  • rss

Related Posts:

One Comment

  1. Posted January 12, 2008 at 1:53 pm | Permalink

    The answer is in the application… or the blogging, as somehow wordpress has no problem with it’s Iñtërnâtiônàlizætiøn!

2 Trackbacks

  1. […] nzfusion.com James is the Technical Director of NZfusion, and this is his blog. « Portable php-mysql connection charset fix by Advies en zo :: Meedenken en -doen […]

  2. […] my earlier post php-mysql connection charset fix, I mentioned I was having issues with enforcing an UTF8 charset. I found the […]

Post a Comment

Your email is never published nor shared.