1

我有一个 .CSV 文件,该文件在标题后几乎没有记录,但是,在文件末尾之前有一个重复的标题,并且在该重复的标题之后还有一些记录(我不需要)。有没有一种方法可以检查第二次出现的标头模式并删除该重复标头之后的文件的其余部分?下面是该文件的示例。

col0,col1, col2, col3, col4, col5, col6,
1value0,1value1,1value2,1value3,1value4,1value5,1value6,
2value0, 2value1, 2value2, 2value3, 2value4, 2value5, 2value6,
3value, 3value1, 3value2, 3value3, 3value4, 3value5, 3value6,
2value0, 4value1, 4value2, 4value3, 4value4, 4value5, 4value6, 5value0, 5value1, 5value2, 5value3,
5value4, 5value5, 5value6, 6value0, 6value1,
6value2, 6value3, 6value4, 6value5, 6value6,
,, ,,,,,
,,,,,,,
,,,,,,,
(n-1)value0, (n-1)value1, (n-1)value2, (n-1)value3, (n- 1)value4, (n-1)value5, (n-1)value6,
(n)value0, (n)value1, (n)value2, (n)value3, (n)value4, (n)value5, (n )值6,
col0,col1, col2, col3, col4, col5, col6,
1,unwanted, records, after, the, duplicate, header
2,unwanted, records, after, the, duplicate, header
3,unwanted, records, after, the,重复,标题

这里我期待的输出如下所示

col0,col1, col2, col3, col4, col5, col6,
1value0,1value1,1value2,1value3,1value4,1value5,1value6,
2value0, 2value1, 2value2, 2value3, 2value4, 2value5, 2value6,
3value, 3value1, 3value2, 3value3, 3value4, 3value5, 3value6,
2value0, 4value1, 4value2, 4value3, 4value4, 4value5, 4value6, 5value0, 5value1, 5value2, 5value3,
5value4, 5value5, 5value6, 6value0, 6value1,
6value2, 6value3, 6value4, 6value5, 6value6,
,, ,,,,,
,,,,,,,
,,,,,,,
(n-1)value0, (n-1)value1, (n-1)value2, (n-1)value3, (n- 1)value4, (n-1)value5, (n-1)value6,
(n)value0, (n)value1, (n)value2, (n)value3, (n)value4, (n)value5, (n )值6,

PS:我有 GNU sed 版本 4.1.5 和 GNU Awk 3.1.5

非常感谢任何帮助。

4

4 回答 4

2

这可能对您有用(GNU sed 4.2.1):

sed 's/,/\n/8;T;s/\n.*//;q' file

这通过尝试,自行替换第 8 行并且如果它无法像往常一样退出并打印该行来工作。大多数行(在您的示例中)只有 7 个逗号,因此将被单独保留,而包含重复标题的行将被缩短并在处理退出时打印出来。

于 2013-07-31T14:09:02.980 回答
2

尝试这个:

awk 'a~$0{exit}NR==1{a=$0}1' file
于 2013-08-01T06:16:48.200 回答
2

可能比它需要的复杂得多:

awk 'BEGIN{flag=0} $0==head{flag=1}; NR==1{head=$0}; flag==0{print $0}' file
于 2013-07-31T14:24:03.720 回答
0

尝试

awk 'd<2{print} /col1, col2, col3 , col4 , col5, col6/{d++}' file
于 2013-07-31T14:04:27.157 回答