I’m having some interesting problems with charsets at Whizzoo.com, in particular getting the characters with question marks to represent real characters. Unfortunately I think a lot of the Iñtërnâtiônàlizætiøn issues are probably removed from the database now and replaced with issues with PHP string functions, i.e. remove special characters and to lower case. I’m hoping I can remove these issues and make Whizzoo acceptable to more international audiences.
I have found some interesting resources with Google… unfortunately I still have the problem - so they are not yet useful
It seems that even though I am defining utf8 wherever I can, ISO-8859-1 is still used? Though it only appears to be the encoding on the popup results when you are typing a ‘whizzoo’, this is the only place that the chosen characters actually appear correctly.
Portable php-mysql connection charset fix by Advies en zo :: Meedenken en -doen
// Make sure any results we retrieve or commands we send use the same
// charset and collation as the database:
$db_charset = $conn->query( "SHOW VARIABLES LIKE 'character_set_database';" );
$charset_row = mysqli_fetch_assoc( $db_charset );
$conn->query( "SET NAMES '" . $charset_row['Value'] . "'" );
Character sets and encoding issues
Update: [29-Mar-2008] I didn’t actually solve the problem above. There wasn’t a problem on my local PC, because I set up my database server to default to UTF-8. For some reason, our web host provides a DB server that defaults to latin-1. So even though each database is carefully constructed as a UTF-8 type, and then each table and finally each column, the connection still defaults to latin-1. I provide the solution in a later post and below for your Google searching convenience.
@$conn = new mysqli(DB_SERVER, DB_USER, DB_PASS, DB_NAME);
// or die(mysql_error());
if (!$conn||mysqli_connect_errno())
{
throw new Exception('Could not connect to database server:'.
mysqli_connect_error());
}
else
{
$conn->query( "SET NAMES 'utf8'" );
$conn->set_charset("utf8");
return $conn;
}
It is important to note that all encoding types should be compatible. As with reading a file, if one part of the puzzle expects a default type, then the process will probably not cause an error, but the data may not be what is expected. As a rule of thumb, test the system with the data you expect to encounter - if it doesn’t appears incorrectly you know you have a problem.
Using a string like Iñtërnâtiônàlizætiøn is recommended, though I found this didn’t quite test to the limits that I wanted to support. If you are also interested in supporting multiple languages outside of the basic latin character range, then I’d suggest also testing those languages specifically. For example:
- Chinese: 我是这样改用的
- Russian: Я Б Г Д Ж Й
- German: Ä ä Ü ü ß
- Hankaku: アイウエオカキクケコサシスセソタチツテ
- Kanji: てすと
- Polish: Ł Ą Ż Ę Ć Ń Ś Ź
- Hebrew - which adds the complexity of being a right to left language (and I have no idea what this says): עברית רשמית בבלוגר
- Arabic - another right to left language: عشية قمة
If your application handles this sort of data, then it is likely to handle most. If your data shows up as empty boxes or question marks, then it is likely you haven’t got the right charset or encoding somewhere in the your application pipeline. Or you may have to install the fonts or language support on your computer. This is very easy in windows, just go into your language settings from the control panel and add the languages that you’d like to be able to see.
Some of the language specific test cases come from the external webpage Test for UTF-8: Japanese, German, Russian, Polish.
Last 5 posts by James Little- Drew Ginn resting after the Olympics - September 8th, 2008
- Sneakerplay: The social network for sneakers? - September 8th, 2008
- The Waikato Great Race 2008 - September 7th, 2008
- Facebook about to introduce adult content? - September 4th, 2008
- Gradjobs New Zealand is live - September 2nd, 2008










4 major inventions of ancient China
Eclipse PDT : PHP development tools
The Waikato Great Race 2008
Tomcat JRE_HOME setting
Tomcat JRE_HOME setting
Jagged/Multi dimensional Arrays (C# Programming Guide)
Jagged/Multi dimensional Arrays (C# Programming Guide)
Jagged/Multi dimensional Arrays (C# Programming Guide)
Sandpress sandbox wordpess theme
Facebook maintenance
Jagged/Multi dimensional Arrays (C# Programming Guide)
One Comment
The answer is in the application… or the blogging, as somehow wordpress has no problem with it’s Iñtërnâtiônàlizætiøn!
2 Trackbacks
[…] nzfusion.com James is the Technical Director of NZfusion, and this is his blog. « Portable php-mysql connection charset fix by Advies en zo :: Meedenken en -doen […]
[…] my earlier post php-mysql connection charset fix, I mentioned I was having issues with enforcing an UTF8 charset. I found the […]