Input Form and Accented Characters


#1

I have an input form which needs to be able to handle umlauts/accented characters which are non english (eg: ? ? etc…) and passes them via POST to a php file I have created.

In my php file I have the following code which needs to read the input and convert them into their english equivalent. For example the german unmlaut ‘?’ when translated to english would become ‘ss’. I am using the following code. Note, I have the characters typed in $umlauts variable they are not html, hex or chr codes. They are the characters themselves (eg: I typed alt-225 for ? in the array). I have also tried substituing them for their chr codes to no success either. Eg: chr(223) etc.

// Convert accents and extended characters into english equivalents. $characters = array( "'" => "", "&" => "And", "(" => "", ")" => "", "-" => " " ); $umlauts = array( "?" => "ae", "?" => "oe", "?" => "ue", "?" => "ss" ); $replacements = array_merge( $characters, $umlauts ); $beer_name = strtr( strip_tags( trim( $_POST['beer_name'] ) ), $replacements );
In my header I have the following code to force utf-8 as the character set:

// Setting the Content-Type header with charset header('Content-Type: text/html; charset=utf-8');

When I try to echo any content, or use it in a mysql table the $characters part is correct and is converting things like the & to And, but the umlauts section is ignored and ? etc is echo’d to the screen instead of the converted string, in this case ‘ae’.

This leads me to think it has something to do with either how the form is handling the POST content (ie: encoding it or something) and/or I’m using an incorrect character set?

I’m at a loss, I’ve tried the html code instead of the alt codes in the array above (eg: ü etc) without success too. I’m not concerned about case sensitivity yet as I’m only testing lower-case at present and upper will be included later once I have lower working.

Does anyone have any ideas on what I am doing wrong. I am using php 5 on a linux/centos server.


#2

If you use ‘only’ this in a separate php file:

<?php
$umlauts = array( "?" => "ae", "?" => "oe", "?" => "ue", "?" => "ss" , "ч" => "ch", "щ" => "sht"); // added some cyrillic
echo $umlauts["щ"];
?>

what do you get?
I get ‘sht’ (without quotes)
This board may be replacing the cyrillic with entities, but when I tried locally with ‘real’ cyrillic, it works…