Cleanig form input with foreign/special characters

Argh - is all I can say.

Working on a site hosted in the US with a multi-page form (8 pages worth) collecting tons of info that’s getting used internationally. I’m having a rough time trying to clean form input, handle foreign characters, and get it properly into MySQL and then back out again.

So, I’m trying to clean on-the-fly by looping through all my post variables.
Latest attempt:

[php]
foreach($_POST as $key=>$value){
if ($key!=“submit”){
$value = iconv(‘UTF-8’,‘ASCII//TRANSLIT’,$value);
$value = htmlentities(stripslashes(strip_tags($value)));
echo “\t<input type=“hidden” name=”$key" value="$value">\n";
}
}
[/php]

Still not working properly though.
Entering “Bénjamin” returns “B?njamin” for example.

All form pages are encoded:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

MySQL database is utf-8.

I’ve spent days researching and trying to find a good solution … and end up with hieroglyphics the moment there’s an odd diacritic typed in.

For example, early beta for the same above was generating "B����¯�¿�½���¯���¿���½����¯�¿�½������©njamin "

At least I’ve got less crap now!!!

I’ve just NOT found the magic formula yet. Can somebody point me to a definitive resource that explains exactly what I need to do, cause … ack!

Here is a function I use:

[php]
//Remove special characters
function htmlallentities($str){
$res = ‘’;
$strlen = strlen($str);
for($i=0; $i<$strlen; $i++){
$byte = ord($str[$i]);
if($byte < 128) // 1-byte char
$res .= $str[$i];
elseif($byte < 192); // invalid utf8
elseif($byte < 224) // 2-byte char
$res .= ‘&#’.((63&$byte)*64 + (63&ord($str[++$i]))).’;’;
elseif($byte < 240) // 3-byte char
$res .= ‘&#’.((15&$byte)*4096 + (63&ord($str[++$i]))*64 + (63&ord($str[++$i]))).’;’;
elseif($byte < 248) // 4-byte char
$res .= ‘&#’.((15&$byte)*262144 + (63&ord($str[++$i]))*4096 + (63&ord($str[++$i]))*64 + (63&ord($str[++$i]))).’;’;
}
return $res;
}
[/php]

Currently, the below is working. Client has accepted replacing foreign characters with the English equivalent “fiance” for “fiancé”, for example. To get rid of the question marks, I had to use setlocale

[php]
function clean_data($input) {
$input = trim(htmlentities(stripslashes(strip_tags($input,","))));
$input = mysql_real_escape_string($input);
return $input;
}

setlocale(LC_CTYPE, ‘cs_CZ.UTF-8’);
foreach($_POST as $key=>$value){
if ($key!=“submit”){
$value = trim($value);
$value = iconv(‘UTF-8’,‘ASCII//TRANSLIT’,$value);
$value = clean_data($value);
echo “\t<input type=“hidden” name=”$key" value="$value">\n";
}
}
[/php]

Sponsor our Newsletter | Privacy Policy | Terms of Service