windows - 如何在 Windows 10 中强制 perl >=5.18 中的输出文件的代码集 cp1252？

Question

我需要确保我使用 perl 脚本创建的输出文件具有代码集 cp1252 而不是 UTF-8，因为它将在 UNIX SQLplus 框架中使用，该框架在将德语“变音符号”插入数据库时无法正确处理列（我在 Windows 10 中使用草莓 perl v5.18，我无法在 UNIX SQL 环境中设置 NLS_LANG 或 chcp）。

使用这个小测试脚本，我可以重现输出文件“testfile1.txt”始终为 UTF-8 但“testfile2.txt”如预期的那样是 CP1252。即使文本中没有“特殊”字符，如何强制“testfile1.txt”的输出也为 CP1252？

#!/usr/bin/env perl -w
use strict;
use Encode;

# the result file under Windows 10 will have UTF-8 codeset
open(OUT,'> testfile1.txt');    
binmode(OUT,"encoding(cp-1252)");
print OUT encode('cp-1252',"this is a test");
close(OUT);

# the result file under Windows 10 will have Windows-cp1252 codeset
open(OUT,'> testfile2.txt');    
binmode(OUT,"encoding(cp-1252)");
print OUT encode('cp-1252',"this is a test with german umlauts <ÄäÜüÖöß>");
close(OUT);

score 5 · Accepted Answer

我认为你的问题是基于一个误解。testfile1.txt包含文本this is a test。这些字符在 ASCII、Latin-1、UTF-8 和 CP-1252 中具有相同的编码。testfile1.txt在所有这些编码中同时有效。

要在源代码中包含文字 Unicode 字符，如下所示：

print OUT encode('cp-1252',"this is a test with german umlauts <ÄäÜüÖöß>");

你需要

use utf8;

在顶部。

此外，不要将文件句柄上的编码层与显式encode()调用结合起来。要么设置一个编码层并向其打印 Unicode 文本，要么使用binmode(OUT)并将原始字节（从返回encode()）打印到它。

顺便说一句，你不应该再使用-w了。它已被

use warnings;

语用。

同样，bareword 文件句柄和两个参数 open 是 5.6 之前的样式代码，不应在 2000 年之后编写的代码中使用。（perl 5.005 和更早的版本无论如何都不支持 Unicode/编码。）

您的代码的固定版本如下所示：

#!/usr/bin/env perl
use strict;
use warnings;
use utf8;

{
    open(my $out, '>:encoding(cp-1252)', 'testfile1.txt') or die "$0: testfile1.txt: $!\n";    
    print $out "this is a test\n";
    close($out);
}

{
    open(my $out, '>encoding(cp-1252)', 'testfile2.txt') or die "$0: testfile2.txt: $!\n";    
    print $out "this is a test with german umlauts <ÄäÜüÖöß>\n";
    close($out);
}

windows - 如何在 Windows 10 中强制 perl >=5.18 中的输出文件的代码集 cp1252？

1 回答 1

Related

Reference