java - 字符串之间的拉丁字符

Question

我有一个程序可以读取包含诸如“\ xed”之类的拉丁词的文件。这些拉丁词可以出现在任何行之间的任何位置，因此我有程序解析这些字符。有没有图书馆可以做到这一点？

score 0 · Accepted Answer

我经常做的简单方法是“UTF8”格式的 InputStreamReader。例如：

         try {
            File fileDir = new File("c:/temp/sample.txt");

            BufferedReader in = new BufferedReader(
                    new InputStreamReader(
                            new FileInputStream(fileDir), "UTF8"));

            String str;

            while ((str = in.readLine()) != null) {
                System.out.println(str);
            }

            in.close();
        } 
        catch (UnsupportedEncodingException e) 
        {
            System.out.println(e.getMessage());
        } 
        catch (IOException e) 
        {
            System.out.println(e.getMessage());
        }
        catch (Exception e)
        {
            System.out.println(e.getMessage());
        }

score 0 · Accepted Answer

如果您的意思是文本以字节为单位，并且您有一个带有十六进制值的字节ED，那么该字节的解释取决于您的代码页。

JavaString在内部以 UTF-16 存储 all 。这意味着在读取和写入文件时几乎总是应用代码页转换（UTF-16 不是常见的文件编码）。

默认情况下，Java 将使用平台默认字符集。如果这不是正确的，您必须指定Charset要使用的。

作为问题的一个例子，字节ED是：

ISO-8859-1： í（unicode 00ED）美国 Windows
Windows-1251： н（unicode 043D）俄语
代码页 437： φ（unicode 03C6）美国 Windows 命令行（Win 7）

要控制代码页，请像这样读取文件：

File file = new File("C:\\path\\to\\file.txt");
try (BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(file), "ISO-8859-1"))) {
    String line;
    while ((line = in.readLine()) != null) {
        // process line here
    }
}

或者使用更新的PathAPI：

Path path = Paths.get("C:\\path\\to\\file.txt");
try (BufferedReader in = Files.newBufferedReader(path, Charset.forName("ISO-8859-1"))) {
    String line;
    while ((line = in.readLine()) != null) {
        // process line here
    }
}

java - 字符串之间的拉丁字符

2 回答 2

Related

Reference