Categories
FreeBSD/Unix

PHP 5.4.x and UTF-8

As the PHP 5.4 release note says the default character encoding has been changed from ISO-8859-1 (latin1) to UTF-8. Although the encoding can be controlled by setting default_charset = “iso-8859-1” in php.ini. This change doesn’t seem to have an effect on functions like htmlspecialchars/htmlentities which have their own default encoding (UTF-8) and own function argument for character encoding. If you want to continue using 5.4 the only way to solve this issue is to modify all the places where this function is called to following:


$str_to_be_coded = "äÄöÖüÜ";
$str = htmlspecialchars($str_to_be_coded, ENT_COMPAT | ENT_HTML401 | ENT_SUBSTITUTE, ISO-8859-1);
echo $str;

$str = htmlentities($str_to_be_coded, ENT_COMPAT | ENT_HTML401 | ENT_SUBSTITUTE, ISO-8859-1);
echo $str;

If you omit the ENT_SUBSTITUTE then the htmlspecialchars may return null string if the string contains umlauts (äÄöÖüÜ).

The alternate solution is of course to use UTF-8 encoding but this might require (if your system wide setting is ISO-8859-1) you to configure the Apache server to use UTF-8 encoding and you to update your PHP scripts to use UTF-8 encoding inside. In addition if you are using MySQL you should set the character set with MySQL as well:


/* change character set to utf8 */
if (!$mysqli->set_charset("utf8")) {
printf("Error loading character set utf8: %s\n", $mysqli->error);
} else {
printf("Current character set: %s\n", $mysqli->character_set_name());
}

Finally the solution to which I did with one of the third party software was that I installed from FreeBSD ports the package called lang/php53 which contains the latest 5.3 branch PHP interpreter.