哦,您将无法如此轻松地自己修复这种输入。正则表达式会进一步破坏您的数据。
你能用 Java 创建一个小脚本来处理它吗?如果这样做,则使用uniVocity-parsers读取您的 CSV 输入并使用正确的引号转义将其写回:
这是唯一可以处理断引号转义的 CSV 解析器。试试这个例子:
import com.univocity.parsers.csv;
import java.io.*;
import java.util.*;
public class Test {
public static void main(String ... args){
CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setLineSeparator("\r\n");
settings.setParseUnescapedQuotes(true); // THIS IS IMPORTANT FOR YOU
CsvParser parser = new CsvParser(settings);
String line1 = "something,\"a quoted value \"with unescaped quotes\" can be parsed\", something\r\n";
System.out.println("Input line: " + line1);
String line2 = "\"after the newline \r\n you will find \" more stuff\r\n";
System.out.println("Input line: " + line2);
List<String[]> allInputLines = parser.parseAll(new StringReader(line1 + line2));
System.out.println("===============\nParsed input values\n===============");
int count = 0;
for(String[] line : allInputLines){
System.out.println("From line " + ++count + ":");
for(String element : line){
System.out.println("\t" + element);
}
System.out.println();
}
//Let's write your output CSV
StringWriter output = new StringWriter();
CsvWriterSettings writerSettings = new CsvWriterSettings();
writerSettings.getFormat().setLineSeparator("\r\n");
writerSettings.getFormat().setQuoteEscape('\\'); //it seems you are using backslash as quote escape
writerSettings.getFormat().setCharToEscapeQuoteEscaping('\\'); //when your quote escape character is not the same as the quote character, you might need to escape the escape character as well
writerSettings.setQuoteAllFields(true); //let's force quotes on all fields so whatever is parsing your input file has more chance of doing it properly
CsvWriter writer = new CsvWriter(output, writerSettings);
for(String[] row : allInputLines){
writer.writeRow(row);
}
writer.close();
System.out.println("===============\nNicely formatted output\n===============");
System.out.println(output.toString());
}
}
此代码将产生以下输出(可能会被您的数据导入工具读取):
Input line: something,"a quoted value "with unescaped quotes" can be parsed", something
Input line: "after the newline
you will find " more stuff
===============
Parsed input values
===============
From line 1:
something
a quoted value "with unescaped quotes" can be parsed
something
From line 2:
after the newline
you will find " more stuff
===============
Nicely formatted output
===============
"something","a quoted value \"with unescaped quotes\" can be parsed","something"
"after the newline
you will find \" more stuff"
披露:我是这个库的作者。它是开源和免费的(Apache V2.0 许可证)。
ColdFusion 10+ 示例:
在 Application.cfc 中加载 jar
this.javaSettings = { loadPaths: ["C:\path\to\univocity-parsers-1.5.6.jar" ]};
使用 createObject 创建解析器类的实例:
filePath = "c:\path\to\yourFile.csv";
settings = createObject("java", "com.univocity.parsers.csv.CsvParserSettings").init();
settings.getFormat().setLineSeparator(chr(13)& chr(10));
settings.getFormat().setQuoteEscape("\");
settings.setParseUnescapedQuotes(true); // THIS IS IMPORTANT FOR YOU
parser = createObject("java", "com.univocity.parsers.csv.CsvParser").init(settings);
reader = createObject("java", "java.io.StringReader").init(fileRead(filePath));
arrayOfLines = parser.parseAll(reader);
// display results
counter = 1;
for (line in arrayOfLines) {
writeOutput("<br>From line "& (counter++) & ":");
for (element in line) {
writeOutput("<br>"& element);
}
}