php - php: file_get_contents 编码问题

Question

我的任务很简单：向 translate.google.com 发出帖子请求并获取翻译。在以下示例中，我使用单词“hello”翻译成俄语。

header('Content-Type: text/plain; charset=utf-8');  // optional
error_reporting(E_ALL | E_STRICT);

$context = stream_context_create(array(
    'http' => array(
        'method' => 'POST',
        'header' => implode("\r\n", array(
            'Content-type: application/x-www-form-urlencoded',
            'Accept-Language: en-us,en;q=0.5', // optional
            'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7' // optional
        )),
        'content' => http_build_query(array(
            'prev'  =>  '_t',
            'hl'    =>  'en',
            'ie'    =>  'UTF-8',
            'text'  =>  'hello',
            'sl'    =>  'en',
            'tl'    =>  'ru'
        ))
    )
));

$page = file_get_contents('http://translate.google.com/translate_t', false, $context);

require '../simplehtmldom/simple_html_dom.php';
$dom = str_get_html($page);
$translation = $dom->find('#result_box', 0)->plaintext;
echo $translation;

标记为可选的行是那些没有输出相同的行。但是我得到了奇怪的字符...

������

我试过

echo mb_convert_encoding($translation, 'UTF-8');

但我明白了

ÐÒÉ×ÅÔ

有谁知道如何解决这个问题？

更新：

忘了说我所有的 php 文件都是用 UTF-8 编码的，没有 BOM
当我将“to”语言更改为“en”时，即从英语翻译成英语，它可以正常工作。
我不认为我正在使用的库搞砸了，因为我试图输出整个 $page 而不将其传递给库函数。
我正在使用 PHP 5

score 10 · Accepted Answer

试试看这篇文章是否可以帮助CURL 导入字符编码问题

你也可以试试这个片段（取自 php.net）

<?php
function file_get_contents_utf8($fn) {
     $content = file_get_contents($fn);
      return mb_convert_encoding($content, 'UTF-8',
          mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true));
}
?>

score 9 · Accepted Answer

首先，您的浏览器是否设置为 UTF-8？在 Firefox 中，您可以在 View->Character Encoding 中设置文本编码。确保您选择了“Unicode (UTF-8)”。我还将 View->Character Encoding->Auto-Detect 设置为“通用”。

其次，您可以尝试传递 FILE_TEXT 标志，如下所示：

$page = file_get_contents('http://translate.google.com/translate_t', FILE_TEXT, $context);

score 1 · Accepted Answer

Accept-Charset并不是那么可选。您应该在那里指定 UTF8。俄语字符在 ISO_8859-1 中无效

php - php: file_get_contents 编码问题

3 回答 3

Related

Reference