windows - Windows Perl --> 移植后 Unix 不工作，可能是编码问题

Question

我有一个在 Windows 上编写的 Perl 程序。它开始于：

$unused_header = <STDIN>;
my @header_fields = split('\|\^\|', $unused_header, -1);

哪个应该拆分包含一个非常大的文件的输入：

The|^|Quick|^|Brown|^|Fox|!|

进入：

{The, Quick, Brown, Fox|!|}

注意：这条线只是单独做headre，还有另一条类似的做重复的数据线。

它在 Windows 上运行良好，但在 linux 上却失败了。但是，如果我在 Perl 中定义一个具有相同内容的字符串，并在其上运行拆分，它就可以正常工作。

我认为这是一个 UTF-16 编码处理问题，但我不确定如何处理它。有谁知道我如何让 perl 理解 UTF-16 被输送到 STDIN 中？

我发现：http ://www.haboogo.com/matching_patterns/2009/01/utf-16-processing-issue-in-perl.html但我不知道该怎么办。

score 5 · Accepted Answer

如果 STDIN 是 UTF-16，请使用以下之一

binmode(STDIN, ':encoding(UTF-16le)');   # Byte order used by Windows.
binmode(STDIN, ':encoding(UTF-16be)');   # The other byte order.
binmode(STDIN, ':encoding(UTF-16)');     # Use BOM to determine byte order.

score 3 · Accepted Answer

Tom就 perl 和 unicode写了一个冗长的答案。它包含一些正确且完全支持 UTF-8 的样板代码，但您可以根据需要替换为 UTF-16。

score 0 · Accepted Answer

我怀疑这是一个 UTF-xx 编码问题，因为 Windows Perl 和 Unix Perl 都不会尝试使用这些编码读取数据，除非你告诉它。

如果 Unix 脚本正在读取与 Windows 脚本完全相同的文件，但行为不同，则可能是行尾问题。大多数 Unix-y 系统上的dos2unix命令可以更改文件的行尾，或者您可以在 Perl 脚本中自己删除行尾

$unused_header = <STDIN>;
$unused_header =~ s/\r?\n$//;   # chop \r\n (Windows) or \n (Unix)

windows - Windows Perl --> 移植后 Unix 不工作，可能是编码问题

3 回答 3

Related

Reference