0

我有 mysql 数据库(不是我的)。在这个数据库中,所有的编码都设置为 utf-8,我用 charset utf-8 连接。但是,当我尝试从数据库中读取时,我得到了这个:

×¢×?ק1
×'ית תו×'× ×” העוסק×'מספר שפות ×ª×•×›× ×”
× × ×œ× ×œ×¤× ×•×ª ×חרי 12 ×'לילה ..

我应该得到什么:

עסק 1
בית תוגנה העוסק במספר שפות תוכנה
נא לא לפנות אחרי 12 בלילה ..

当我从 phpmyadmin 看时,我有同样的事情(pma 中的连接是 utf-8)。我知道数据应该是希伯来语的。有人知道如何解决这些问题吗?

4

1 回答 1

0

You appear to have UTF-8 data that was treated as Windows-1252 and subsequently converted to UTF-8 (sometimes referred to as "double-encoding").

The first thing that you need to determine is at what stage the conversion took place: before the data was saved in the table, or upon your attempts to retrieve it? The easiest way is often to SELECT HEX(the_column) FROM the_table WHERE ... and manually inspect the byte-encoding as it is currently stored:

  • If, for the data above, you see C397C2A9... then the data is stored erroneously (an incorrect connection character set at the time of data insertion is the most common culprit); it can be corrected as follows (being careful to use data types of sufficient length in place of TEXT and BLOB as appropriate):

    1. Undo the conversion from Windows-1252 to UTF-8 that caused the data corruption:

      ALTER TABLE the_table MODIFY the_column TEXT CHARACTER SET latin1;
      
    2. Drop the erroneous encoding metadata:

      ALTER TABLE the_table MODIFY the_column BLOB;
      
    3. Add corrected encoding metadata:

      ALTER TABLE the_table MODIFY the_column TEXT CHARACTER SET utf8;
      

    See it on sqlfiddle.

    Beware to correctly insert any data in the future, or else the table will be partly encoded in one way and partly in another (which can be a nightmare to try and fix).

    If you're unable to modify the database schema, the records can be transcoded to the correct encoding on-the-fly with CONVERT(BINARY CONVERT(the_column USING latin1) USING utf8) (see it on sqlfiddle), but I strongly recommended that you fix the database when possible instead of leaving it containing broken data.

  • However, if you see D7A2D73F... then the data is stored correctly and the corruption is taking place upon data retrieval; you will have to perform further tests to identify the exact cause. See UTF-8 all the way through for guidance.

于 2013-07-13T13:24:43.773 回答