php - PHP Utf8 解码问题

Question

我有以下地址行：Praha 5, Staré Město,

我需要在此字符串上使用 utf8_decode() 函数，然后才能将其写入 PDF 文件（使用 domPDF lib）。

但是，上述地址行的 php utf8 解码函数似乎不正确（或者更确切地说，不完整）。

以下代码：

<?php echo utf8_decode('Praha 5, Staré Město,'); ?>

产生这个：

布拉格 5, Staré M?sto,

知道为什么ě没有被解码吗？

score 14 · Accepted Answer

utf8_decode 将字符串从 UTF-8 编码转换为 ISO-8859-1，即“Latin-1”。
Latin-1 编码不能表示字母“ě”。就是这么简单。
“解码”完全是用词不当，它与iconv('UTF-8', 'ISO-8859-1', $string).

了解每个程序员绝对、肯定需要了解的有关使用文本的编码和字符集的知识。

score 1 · Accepted Answer

问题出在你的PHP文件编码中，把你的文件保存在 UTF-8编码中，然后甚至不需要使用utf8_decode，如果你'Praha 5, Staré Město,'从数据库中获取这些数据，最好将它的字符集更改为UTF-8

score 0 · Accepted Answer

我最终使用了一个本土的 UTF-8 / UTF-16 解码函数（转换为 &#number; 表示），我没有找到任何模式来解释为什么没有检测到 UTF-8，我怀疑这是因为“编码为”序列并不总是在返回的字符串中完全相同的位置。您可能会对此进行一些额外的检查。

三字符 UTF-8 指示符：$startutf8 = chr(0xEF).chr(187).chr(191); （如果你在任何地方看到这个，不仅仅是前三个字符，字符串是 UTF-8 编码的）

按照UTF-8规则解码；这取代了一个字节一个字节的早期版本：使用

function charset_decode_utf_8 ($string) {
/* Only do the slow convert if there are 8-bit characters */
/* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */
if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string))
    return $string;

// decode three byte unicode characters
$string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e",       
"'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",   
$string);

// decode two byte unicode characters
$string = preg_replace("/([\300-\337])([\200-\277])/e",
"'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",
$string);

return $string;
}

score 0 · Accepted Answer

你不需要那个（@Rajeev：这个字符串被自动检测为 utf-8 编码：

echo mb_detect_encoding('Praha 5, Staré Město,');

将始终返回 UTF-8。）。

你宁愿看到： https ://code.google.com/p/dompdf/wiki/CPDFUnicode

php - PHP Utf8 解码问题

4 回答 4

Related

Reference