我想将“特殊字符”编码为他们的命名实体。
我的代码:
use HTML::Entities;
print encode_entities('“');
期望的输出:
“
并不是:
“
有人有想法吗?问候
我想将“特殊字符”编码为他们的命名实体。
我的代码:
use HTML::Entities;
print encode_entities('“');
期望的输出:
“
并不是:
“
有人有想法吗?问候
If you don't use use utf8;
, the file is expected to be encoded using iso-8859-1 (or subset US-ASCII).
«“» is not found in iso-8859-1's charset.
If you use use utf8;
, the file is expected to be encoded using UTF-8.
«“» is found in UTF-8's charset, Unicode.
You indicated your file isn't saved as UTF-8, so as far as Perl is concerned, your source file cannot possibly contain «“».
Odds are that you encoded your file using cp1252, an extension of iso-8859-1 that adds «“». That's not a valid choice.
Options:
[Best option] Save the file as UTF-8 and use the following:
use utf8;
use HTML::Entities;
print encode_entities('“');
Save the file as cp1252, but only use US-ASCII characters.
use charnames ':full';
use HTML::Entities;
print encode_entities("\N{LEFT DOUBLE QUOTATION MARK}");
or
use HTML::Entities;
print encode_entities("\N{U+201C}");
or
use HTML::Entities;
print encode_entities("\x{201C}");
[Unrecommended] Save the file as cp1252 and decode literals explicitly
use HTML::Entities;
print encode_entities(decode('cp1252', '“'));
Perl sees:
use HTML::Entities;
print encode_entities(decode('cp1252', "\x93"));
Perl 不知道源文件的编码。如果包含任何特殊字符,则应始终使用UTF-8
-encoding 保存它并放置
use utf8;
在代码的顶部。这将确保您的字符串文字包含代码点,而不仅仅是字节。
我遇到了同样的问题并应用了上述所有提示。它在我的 perl 脚本 (CGI) 中工作,例如ä = encode_entities("ä")
产生了正确的结果。然而应用encode_entities(param("test"))
会对单个字节进行编码。
我找到了这个建议:http ://blog.endpoint.com/2010/12/character-encoding-in-perl-decodeutf8.html
把它放在一起,这是我最终有效的解决方案:
use CGI qw/:standard/;
use utf8;
use HTML::Entities;
use Encode;
print encode_entities(decode_utf8(param("test")));
我不清楚为什么需要这样做,但它确实有效。高温高压