1

我有一个csv文件如下(粗体控制字符)

"ID","NAME","CLASS" CRLF 
"1","JOHN X","A" CRLF 
"2","DOE LF 
Y","B" CRLF 
"3","OTHER S", "D " CRLF

请注意,第 3 行以 LF 结尾,而不是 CRLF。在用 Java 读取这个 CSV 文件时,我得到 5 行而不是 4 行(标题行 + 3 数据行)。有没有一种方法可以在保留 CRLF 的同时用空格替换 LF(按摩输入文件或更改 Java 代码)。我做了很多谷歌搜索,我可以看到每个解决方案都替换了 LF 和 CRLF。

谢谢

4

3 回答 3

1

您可以使用Scanner带有分隔符的 a \n。使用 jlordo 的技术摆脱LF,您一次将内容写入OutputStream某一行。这样你就永远不会在内存中拥有整个 2GB+ 的文件

public static void main(String[] args) throws Exception {   
    File file = new File("C:\\Users\\Soto\\Downloads\\person.xml");
    Scanner scanner = new Scanner(new FileInputStream(file));
    String lineSeparator = System.getProperty("line.separator"); // Assuming you are on Windows, otherwise set it to \n
    scanner.useDelimiter(lineSeparator);
    ByteArrayOutputStream out = new ByteArrayOutputStream(); // would be a real outputstream, like FileOutputStream
    char LF = 0xA; 

    while (scanner.hasNext()) { // looks up to the next delimiter
        String line = scanner.next();
        line = line.replace("" + LF, "");
        out.write(line.getBytes());
        out.write(lineSeparator.getBytes());
    }

    // the OutputStream now contains the content with new lines but no LF
}

LF是十六进制A,请参见此处

于 2013-09-02T22:44:16.933 回答
1

这应该有效:

char LF = 0x0A;
char CR = 0x0D;
String content = ... // your lines(s)
content = content.replaceAll("(?<!" + CR + ")" + LF, " ");

正则表达式被构造为LF仅在没有前面的情况下才用空格替换CR

于 2013-09-02T22:44:58.697 回答
-1

您必须按照此处的说明设置正确的系统属性(line.separator):http: //docs.oracle.com/javase/tutorial/essential/environment/sysprop.html

希望它能解决问题。干杯

于 2013-09-02T22:47:39.647 回答