perl - Perl HTML 编码命名实体

Question

我想将“特殊字符”编码为他们的命名实体。

我的代码：

use HTML::Entities;
print encode_entities('“');

期望的输出：

&ldquo;

并不是：

&#147;

有人有想法吗？问候

score 4 · Accepted Answer

If you don't use use utf8;, the file is expected to be encoded using iso-8859-1 (or subset US-ASCII).

«“» is not found in iso-8859-1's charset.
If you use use utf8;, the file is expected to be encoded using UTF-8.

«“» is found in UTF-8's charset, Unicode.

You indicated your file isn't saved as UTF-8, so as far as Perl is concerned, your source file cannot possibly contain «“».

Odds are that you encoded your file using cp1252, an extension of iso-8859-1 that adds «“». That's not a valid choice.

Options:

[Best option] Save the file as UTF-8 and use the following:

use utf8;
use HTML::Entities;
print encode_entities('“');

Save the file as cp1252, but only use US-ASCII characters.

use charnames ':full';
use HTML::Entities;
print encode_entities("\N{LEFT DOUBLE QUOTATION MARK}");

or

use HTML::Entities;
print encode_entities("\N{U+201C}");

or

use HTML::Entities;
print encode_entities("\x{201C}");

[Unrecommended] Save the file as cp1252 and decode literals explicitly

use HTML::Entities;
print encode_entities(decode('cp1252', '“'));

Perl sees:

use HTML::Entities;
print encode_entities(decode('cp1252', "\x93"));

score 2 · Accepted Answer

Perl 不知道源文件的编码。如果包含任何特殊字符，则应始终使用UTF-8-encoding 保存它并放置

use utf8;

在代码的顶部。这将确保您的字符串文字包含代码点，而不仅仅是字节。

score 1 · Accepted Answer

我遇到了同样的问题并应用了上述所有提示。它在我的 perl 脚本 (CGI) 中工作，例如ä = encode_entities("ä")产生了正确的结果。然而应用encode_entities(param("test"))会对单个字节进行编码。

把它放在一起，这是我最终有效的解决方案：

use CGI qw/:standard/;
use utf8;
use HTML::Entities;
use Encode;
print encode_entities(decode_utf8(param("test")));

我不清楚为什么需要这样做，但它确实有效。高温高压

3 回答 3