我有一个从特定供应商的门户网站下载的.CSV文件(可以说是 tab_delimited_file.csv)。当我将文件移动到我的 Linux 目录之一时,我注意到这个特定的.CSV文件实际上是一个制表符分隔的文件,名为.CSV。请在下面找到该文件的几个示例记录。
"""column1""" """column2""" """column3""" """column4""" """column5""" """column6""" """column7"""
12 455 string with quotes, and with a comma in between 4432 6787 890 88
4432 6787 another, string with quotes, and with two comma in between 890 88 12 455
11 22 simple string 77 777 333 22
上述样本记录由 . 分隔tabs
。我知道文件的标题很奇怪,但这是我收到文件格式的方式。
我尝试使用tr
命令来替换tabs
,commas
但由于记录值中的额外逗号,文件完全搞砸了。我需要将带有逗号的记录值括在双引号中。我使用的命令如下。
tr '\t' ',' < tab_delimited_file.csv > comma_separated_file.csv
这会将文件转换为以下格式。
"""column1""","""column2""","""column3""","""column4""","""column5""","""column6""","""column7"""
12,455,string with quotes, and with a comma in between,4432,6787,890,88
4432,6787,another, string with quotes, and with two comma in between,890,88,12,455
11,22,simple string,77,777,333,22
我需要帮助将示例文件转换为以下格式。
column1,column2,column3,column4,column5,column6,column7
12,455,"string with quotes, and with a comma in between",4432,6787,890,88
4432,6787,"another, string with quotes, and with two comma in between",890,88,12,455
11,22,"simple string",77,777,333,22
任何使用sed
或awk
将非常有用的解决方案。