我有一个 .CSV 文件,该文件在标题后几乎没有记录,但是,在文件末尾之前有一个重复的标题,并且在该重复的标题之后还有一些记录(我不需要)。有没有一种方法可以检查第二次出现的标头模式并删除该重复标头之后的文件的其余部分?下面是该文件的示例。
col0,col1, col2, col3, col4, col5, col6,
1value0,1value1,1value2,1value3,1value4,1value5,1value6,
2value0, 2value1, 2value2, 2value3, 2value4, 2value5, 2value6,
3value, 3value1, 3value2, 3value3, 3value4, 3value5, 3value6,
2value0, 4value1, 4value2, 4value3, 4value4, 4value5, 4value6, 5value0, 5value1, 5value2, 5value3,
5value4, 5value5, 5value6, 6value0, 6value1,
6value2, 6value3, 6value4, 6value5, 6value6,
,, ,,,,,
,,,,,,,
,,,,,,,
(n-1)value0, (n-1)value1, (n-1)value2, (n-1)value3, (n- 1)value4, (n-1)value5, (n-1)value6,
(n)value0, (n)value1, (n)value2, (n)value3, (n)value4, (n)value5, (n )值6,
col0,col1, col2, col3, col4, col5, col6,
1,unwanted, records, after, the, duplicate, header
2,unwanted, records, after, the, duplicate, header
3,unwanted, records, after, the,重复,标题
这里我期待的输出如下所示
col0,col1, col2, col3, col4, col5, col6,
1value0,1value1,1value2,1value3,1value4,1value5,1value6,
2value0, 2value1, 2value2, 2value3, 2value4, 2value5, 2value6,
3value, 3value1, 3value2, 3value3, 3value4, 3value5, 3value6,
2value0, 4value1, 4value2, 4value3, 4value4, 4value5, 4value6, 5value0, 5value1, 5value2, 5value3,
5value4, 5value5, 5value6, 6value0, 6value1,
6value2, 6value3, 6value4, 6value5, 6value6,
,, ,,,,,
,,,,,,,
,,,,,,,
(n-1)value0, (n-1)value1, (n-1)value2, (n-1)value3, (n- 1)value4, (n-1)value5, (n-1)value6,
(n)value0, (n)value1, (n)value2, (n)value3, (n)value4, (n)value5, (n )值6,
PS:我有 GNU sed 版本 4.1.5 和 GNU Awk 3.1.5
非常感谢任何帮助。