1

我想将“特殊字符”编码为他们的命名实体。

我的代码:

use HTML::Entities;
print encode_entities('“');

期望的输出:

“

并不是:

“

有人有想法吗?问候

4

3 回答 3

4
  • If you don't use use utf8;, the file is expected to be encoded using iso-8859-1 (or subset US-ASCII).

    «“» is not found in iso-8859-1's charset.

  • If you use use utf8;, the file is expected to be encoded using UTF-8.

    «“» is found in UTF-8's charset, Unicode.

You indicated your file isn't saved as UTF-8, so as far as Perl is concerned, your source file cannot possibly contain «“».

Odds are that you encoded your file using cp1252, an extension of iso-8859-1 that adds «“». That's not a valid choice.

Options:

  • [Best option] Save the file as UTF-8 and use the following:

    use utf8;
    use HTML::Entities;
    print encode_entities('“');
    
  • Save the file as cp1252, but only use US-ASCII characters.

    use charnames ':full';
    use HTML::Entities;
    print encode_entities("\N{LEFT DOUBLE QUOTATION MARK}");
    

    or

    use HTML::Entities;
    print encode_entities("\N{U+201C}");
    

    or

    use HTML::Entities;
    print encode_entities("\x{201C}");
    
  • [Unrecommended] Save the file as cp1252 and decode literals explicitly

    use HTML::Entities;
    print encode_entities(decode('cp1252', '“'));
    

    Perl sees:

    use HTML::Entities;
    print encode_entities(decode('cp1252', "\x93"));
    
于 2013-07-05T11:03:52.310 回答
2

Perl 不知道源文件的编码。如果包含任何特殊字符,则应始终使用UTF-8-encoding 保存它并放置

use utf8;

在代码的顶部。这将确保您的字符串文字包含代码点,而不仅仅是字节。

于 2013-07-05T10:27:01.433 回答
1

我遇到了同样的问题并应用了上述所有提示。它在我的 perl 脚本 (CGI) 中工作,例如ä = encode_entities("ä")产生了正确的结果。然而应用encode_entities(param("test"))会对单个字节进行编码。

我找到了这个建议:http ://blog.endpoint.com/2010/12/character-encoding-in-perl-decodeutf8.html

把它放在一起,这是我最终有效的解决方案:

use CGI qw/:standard/;
use utf8;
use HTML::Entities;
use Encode;
print encode_entities(decode_utf8(param("test")));

我不清楚为什么需要这样做,但它确实有效。高温高压

于 2016-05-01T20:14:05.723 回答