3

PHP / MySQL: Corrected text utf8 corrupted by implicit connection mysqli :: set_...

 2 years ago
source link: https://www.codesd.com/item/php-mysql-corrected-text-utf8-corrupted-by-implicit-connection-mysqli-set-charset-latin1.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

PHP / MySQL: Corrected text utf8 corrupted by implicit connection mysqli :: set_charset ('latin1')

advertisements

So, for years and years, my PHP application has been connecting to MySQL using the default latin1 charset. Even though I have some fields collated as utf8_general_ci, the actual data that is getting stored into them is some bastardized charset. For example:

Input: ♠ »

is stored as ♠»

Now, when that data is retrieved over the same latin1 connection and displayed on a page with encoding set as utf8, it displays just as it was entered: ♠ » Why this is, I'm not 100% sure, but I'm guessing it's because whatever charset function which is screwing it up going in is fixing it coming out.

I want to fix my data. If I switch my connection charset using mysqli::set_charset('utf8'), the output is displayed as it is stored, i.e. ♠»

So, apparently I need to fix my existing data and then switch my connection charset.

How do I fix the existing bastardized data?

EDIT:

I've discovered a way to emulate the corruption process that is happening in a MySQL query: SELECT CAST(BINARY '♠ »' AS CHAR CHARACTER SET latin1) outputs ♠»

Perhaps if I could figure out how to perform the reverse function I could use that query to fix the existing data.

EDIT 2:

I've discovered such a function: SELECT CAST(BINARY CAST('♠»' AS CHAR CHARACTER SET latin1) AS CHAR CHARACTER SET utf8) outputs ♠ »

My only concern now is what this will do to any data that already happens to be actual utf8 data, which, for some reason, I do have in my database. For example, SELECT CAST(BINARY CAST('♠ »' AS CHAR CHARACTER SET latin1) AS CHAR CHARACTER SET utf8) outputs (nothing)


From http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/:

Auto-detecting function for converting maybe-corrupted latin1 text data to utf8:

DELIMITER $$

CREATE FUNCTION maybe_utf8_decode(str text charset utf8)
RETURNS text CHARSET utf8 DETERMINISTIC
BEGIN
declare str_converted text charset utf8;
declare max_error_count int default @@max_error_count;
set @@max_error_count = 0;
set str_converted = convert(binary convert(str using latin1) using utf8);
set @@max_error_count = max_error_count;
if @@warning_count > 0 then
    return str;
else
    return str_converted;
end if;
END$$

DELIMITER ;

Usage:

update mytable set mycolumn = maybe_utf8_decode(mycolumn);


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK