假设我们有一个逗号分隔的文件 (csv),如下所示:
"name of movie","starring","director","release year"
"dark knight rises","christian bale, anna hathaway","christopher nolan","2012"
"the dark knight","christian bale, heath ledger","christopher nolan","2008"
"The "day" when earth stood still","Michael Rennie,the 'strong' man","robert wise","1951"
"the 'gladiator'","russel "the awesome" crowe","ridley scott","2000"
从上面可以看出,在第 4 行和第 5 行中,引号中有引号。输出应如下所示:
"name of movie","starring","director","release year"
"dark knight rises","christian bale, anna hathaway","christopher nolan","2012"
"the dark knight","christian bale, heath ledger","christopher nolan","2008"
"The day when earth stood still","Michael Rennie,the strong man","robert wise","1951"
"the gladiator","russel the awesome crowe","ridley scott","2000"
如何摆脱出现在 csv 文件中这样的引号内的此类引号(单引号和双引号)。请注意,单个字段中的逗号是可以的,因为解析器会识别它在引号内并将其作为一个字段。这只是安排 csv 文件的预处理步骤,以便可以将其输入多个解析器以转换为我们想要的任何格式。Bash、awk、python 都可以。请不要 perl,我厌倦了那种语言:D 在此先感谢!