perl - 使用 File::Slurp 读取 UTF8 文件

Question

我尝试使用 Perl 模块File::Slurp读取 HTML 文件：

binmode STDOUT, ':utf8';
my $htmlcontent = read_file($file, {binmode => ':utf8'});

但是当我打印$htmlcontent变量时，由于法语口音或特殊字符，某些字符无法理解。

例如："Plus d'actualit\u00e9s"应该是"Plus d'actualités"

我还检查了文件的编码，没关系！

HTML document, UTF-8 Unicode text, with very long lines, with CRLF, LF line terminators

这个模块有问题吗？

谢谢

score 2 · Accepted Answer

\u00e9不是 UTF-8 字符，是 Unicode 字符的 JavaScript 表示。例如，您需要使用Encode::JavaScript::UCS解码文件的内容。

1 回答 1