2

我有一个文件:allele_freq.vcf。这些是文件中的前 10 行

NC_000001.9   144148243  rs2236566     G  T    .  .  .  AN:AC  2806   236
NC_000001.9   146267105  rs1553119693  T  G    .  .  .  AN:AC  33978  26317
NC_000001.10  13832431   rs1553119928  T  C    .  .  .  AN:AC  1220   0
NC_000001.10  74439690   rs1553119957  A  C    .  .  .  AN:AC  1220   0
NC_000001.11  10498      rs1338146081  G  A,T  .  .  .  AN:AC  2072   0      0
NC_000001.11  10509      rs1262211809  G  A    .  .  .  AN:AC  2072   0
NC_000001.11  10527      rs1246002416  C  T    .  .  .  AN:AC  2072   0
NC_000001.11  10531      rs1293328578  C  G    .  .  .  AN:AC  2072   0

对于第 12 列,我想用 NA 替换空单元格

我努力了

awk -F='' '$12== "" {$12="NA"; print; next} {print}' OFS='' allele_freq.vcf

这给了我这个

NC_000001.9   144148243  rs2236566     G  T    .  .  .  AN:AC  2806   236    NA
NC_000001.9   146267105  rs1553119693  T  G    .  .  .  AN:AC  33978  26317  NA
NC_000001.10  13832431   rs1553119928  T  C    .  .  .  AN:AC  1220   0      NA
NC_000001.10  74439690   rs1553119957  A  C    .  .  .  AN:AC  1220   0      NA
NC_000001.11  10498      rs1338146081  G  A,T  .  .  .  AN:AC  2072   0      0NA
NC_000001.11  10509      rs1262211809  G  A    .  .  .  AN:AC  2072   0      NA
NC_000001.11  10527      rs1246002416  C  T    .  .  .  AN:AC  2072   0      NA
NC_000001.11  10531      rs1293328578  C  G    .  .  .  AN:AC  2072   0      NA
NC_000001.11  10534      rs1486704209  A  G    .  .  .  AN:AC  2072   0      NA
NC_000001.11  10535      rs1184627952  G  C    .  .  .  AN:AC  2072   0      NA

第 5 行中的 0 附有一个“NA”。

我想要的是这样的:

NC_000001.9   144148243  rs2236566     G  T    .  .  .  AN:AC  2806   236    NA
NC_000001.9   146267105  rs1553119693  T  G    .  .  .  AN:AC  33978  26317  NA
NC_000001.10  13832431   rs1553119928  T  C    .  .  .  AN:AC  1220   0      NA
NC_000001.10  74439690   rs1553119957  A  C    .  .  .  AN:AC  1220   0      NA
NC_000001.11  10498      rs1338146081  G  A,T  .  .  .  AN:AC  2072   0      0
NC_000001.11  10509      rs1262211809  G  A    .  .  .  AN:AC  2072   0      NA
NC_000001.11  10527      rs1246002416  C  T    .  .  .  AN:AC  2072   0      NA
NC_000001.11  10531      rs1293328578  C  G    .  .  .  AN:AC  2072   0      NA
NC_000001.11  10534      rs1486704209  A  G    .  .  .  AN:AC  2072   0      NA
NC_000001.11  10535      rs1184627952  G  C    .  .  .  AN:AC  2072   0      NA

谢谢你的帮助。

4

3 回答 3

1
awk '{print $0 (NF<12 ? OFS "NA" : "")}' file
于 2020-05-04T22:33:49.140 回答
0

根据您的代码输出和固定宽度格式的结构,$11即使$12为空,也有空格。所以:

$ awk '{sub(/ +$/,"&NA")}1' file

部分输出:

...
NC_000001.10  74439690   rs1553119957  A  C    .  .  .  AN:AC  1220   0      NA
NC_000001.11  10498      rs1338146081  G  A,T  .  .  .  AN:AC  2072   0      0
NC_000001.11  10509      rs1262211809  G  A    .  .  .  AN:AC  2072   0      NA
...
于 2020-05-05T05:01:28.783 回答
0

这有效,除了之前的空间量NA。根据需要调整:

awk '{if (NF<12) print $0 " NA"; else print $0}'
于 2020-05-04T22:08:04.273 回答