perl - awk 帮助将半 csv 文件修改为新格式

Question

我被一个小问题困住了，无法解决问题，

我有一个文件，其中有几行是这样的：

fig|1671.3.peg.2935,fig|1671.3.peg.2936,fig|1671.3.peg.29370 operon1

我想要这样的东西：

fig|1671.3.peg.2935    operon1
fig|1671.3.peg.2936    operon1
fig|1671.3.peg.29370    operon1

该文件没有固定数量的逗号分隔元素，在这种情况下为 3，而其他情况有时为 1 到 8。

提前致谢。CS

score 3 · Accepted Answer

用这个：

awk -F'[, ]' '{for(i=1;i<NF;i++) {print $i,$NF}}' <filename>

您可以指定正则表达式作为分隔符。-F '[, ]告诉 awk of,或(space) 可以作为分隔符。其余的很明显。NF是字段数，$NF是最后一个字段。

score 2 · Accepted Answer

$ awk '{split($1, a, ","); for (i in a) {print a[i], $2}}' file
fig|1671.3.peg.2935 operon1
fig|1671.3.peg.2936 operon1
fig|1671.3.peg.29370 operon1

请注意，它适用于任意数量的逗号分隔数量的字段：

$ cat file
hello,how,are,you good!
$ awk '{split($1, a, ","); for (i in a) {print a[i], $2}}' file
hello good!
how good!
are good!
you good!

score 2 · Accepted Answer

这个脚本应该做你想做的事：

$ awk -F '[, ]+' '{for (i=1;i<NF;i++) print $i, $NF}' file
fig|1671.3.peg.2935 operon1
fig|1671.3.peg.2936 operon1
fig|1671.3.peg.29370 operon1

score 2 · Accepted Answer

这可能对您有用（GNU sed）：

sed -r 's/,(.*\s(\S+))/ \2\n\1/;P;D' file

在每一行上，将 a 替换为,空格和行上的最后一个字符串，然后是换行符和行的其余部分。打印然后删除直到并包括引入的换行符并重复直到找不到更多,的 '。

score 1 · Accepted Answer

一个 awk 版本，没有循环。

awk '{gsub(/,/," "$2"\n")}1' file
fig|1671.3.peg.2935 operon1
fig|1671.3.peg.2936 operon1
fig|1671.3.peg.29370 operon1

5 回答 5