ruby - Notepad++ 只是调用“ANSI”的编码，有谁知道 Ruby 怎么称呼它？

Question

我有一堆.txt，Notepad++ 说（在其下拉“编码”菜单中）是“ANSI”。

它们中有德语字符 [äöüß]，在 Notepad++ 中显示良好。

但是当我他们时，他们并没有出现在 irb 中File.read 'this is a German text example.txt'。

那么有人知道我应该给出什么论据Encoding.default_external=吗？

（我假设那将是解决方案，对吗？）

当'utf-8'或时'cp850'，它会将带有“äöüß”的“ANSI”文件读取为“\xE4\xF6\xFC\xDF”...

（请不要犹豫，在你的回答中提到明显“显而易见”的事情；我几乎和你一样新手，但仍然知道足以提出这个问题。）

score 13 · Accepted Answer

他们的意思可能是 ISO/IEC 8859-1（又名 Latin-1）、ISO-8859-1、ISO/IEC 8859-15（又名 Latin-9）或 Windows-1252（又名 CP 1252）。他们四个都有äat 位置0xE4。

score 9 · Accepted Answer

我在 Notepad++ 论坛上找到了这个问题的答案，似乎是权威的 CChris 在 2010 年回答了这个问题。

问题：编码ANSI？

回答：

这将是您计算机的系统代码页（代码页 0）。

更多信息：

显示您当前的代码页。

>help chcp
Displays or sets the active code page number.

CHCP [nnn]

  nnn   Specifies a code page number.

Type CHCP without a parameter to display the active code page number.

>chcp
Active code page: 437

代码页标识符

Identifier  .NET Name  Additional information
437         IBM437     OEM United States

score 3 · Accepted Answer

我认为它是'cp1252'，别名'windows-1252'。

在阅读了 Jörg 的回答后，我回到了 ruby-doc.org 上的编码页面，试图找到对他提到的特定编码的引用，这就是我发现该Encodings.aliases方法的时候。

所以我在这个答案的最后总结了这个方法。

然后我查看了 notepad++ 中的输出，将其视为“ANSI”和 utf-8，并将其与 irb 中的输出进行了比较...

我只能在 irb 输出中找到两个地方，其中 utf-8 文件出现乱码的方式与将其视为“ANSI”时在 notepad++ 中出现的方式完全相同，这些地方用于 cp1252 和 cp1254。

cp1252 显然是我的“文件系统”编码，所以我要这么做。

我编写了一个脚本来复制所有转换为 utf-8 的文件，尝试从 1252 和 1254 开始。

到目前为止，utf-8 正则表达式似乎适用于两组文件。

现在，在遇到所有这些编码难题之前，我必须尝试记住我真正想要完成的工作。xD

def compare_encodings file1, file2
    file1_probs = []
    file2_probs = []

    txt = File.open('encoding_test_output.txt','w')

    Encoding.aliases.sort.each do |k,v|
        Encoding.default_external=k
        ename = [k.downcase, v.downcase].join "  ---  "
        s = ""
        begin
            s << "#{File.read(file1)}" 
        rescue
            s << "nope nope nope"
            file1_probs << ename
        end
        s << "\t| #{ename} |\t"
        begin
            s << "#{File.read(file2)}"
        rescue
            s << "nope nope nope"
            file2_probs << ename
        end
        Encoding.default_external= 'utf-8'
        txt.puts s.center(58)
        puts s.center(58)
    end
    puts
    puts "file1, \"#{file1}\" exceptions from trying to convert to:\n\n"
    puts file1_probs
    puts
    puts "file2, \"#{file2}\" exceptions from trying to convert to:\n\n"
    puts file2_probs
    txt.close
end

compare_encodings "utf-8.txt", "np++'ANSI'.txt"

ruby - Notepad++ 只是调用“ANSI”的编码，有谁知道 Ruby 怎么称呼它？

3 回答 3

Related

Reference