1

我是 linux 和命令行的新手。我正在尝试找到一个命令,该命令将允许我用white space分号替换(在 .csv 文本文件中)除第一个之外的所有字段。请看下面的例子,任何帮助将不胜感激,我花了很长时间寻找解决方案。如果您确实有答案,请解释一下命令,以便我可以尝试了解如何以及为什么。非常感谢。

输入文本示例:

0   k__Bacteria  p__Firmicutes   c__Bacilli             
1   k__Bacteria  p__Firmicutes   c__Clostridia      
2   k__Bacteria  p__Bacteroidetes    c__Bacteroidia     
3   k__Bacteria  p__Bacteroidetes    c__Bacteroidia

我需要的输出是:

0   k__Bacteria;p__Firmicutes;c__Bacilli        
1   k__Bacteria;p__Firmicutes;c__Clostridia    
2   k__Bacteria;p__Bacteroidetes;c__Bacteroidia   
3   k__Bacteria;p__Bacteroidetes;c__Bacteroidia
4

4 回答 4

1
$ cat file
0   k__Bacteria  p__Firmicutes   c__Bacilli     foo     bar
1   k__Bacteria  p__Firmicutes   c__Clostridia  the   quick     brown
2   k__Bacteria  p__Bacteroidetes    c__Bacteroidia     fox jumped      over
3   k__Bacteria  p__Bacteroidetes    c__Bacteroidia     the lazy dogs back

$ awk -v skip=1 '{match($0,"([^[:space:]]+[[:space:]]+){"skip"}"); head=substr($0,1,RSTART+RLENGTH); tail=substr($0,RSTART+RLENGTH+1); gsub(/[[:space:]]+/,";",tail); print head tail}' file
0   k__Bacteria;p__Firmicutes;c__Bacilli;foo;bar
1   k__Bacteria;p__Firmicutes;c__Clostridia;the;quick;brown
2   k__Bacteria;p__Bacteroidetes;c__Bacteroidia;fox;jumped;over
3   k__Bacteria;p__Bacteroidetes;c__Bacteroidia;the;lazy;dogs;back

$ awk -v skip=2 '{match($0,"([^[:space:]]+[[:space:]]+){"skip"}"); head=substr($0,1,RSTART+RLENGTH); tail=substr($0,RSTART+RLENGTH+1); gsub(/[[:space:]]+/,";",tail); print head tail}' file
0   k__Bacteria  p__Firmicutes;c__Bacilli;foo;bar
1   k__Bacteria  p__Firmicutes;c__Clostridia;the;quick;brown
2   k__Bacteria  p__Bacteroidetes;c__Bacteroidia;fox;jumped;over
3   k__Bacteria  p__Bacteroidetes;c__Bacteroidia;the;lazy;dogs;back

$ awk -v skip=3 '{match($0,"([^[:space:]]+[[:space:]]+){"skip"}"); head=substr($0,1,RSTART+RLENGTH); tail=substr($0,RSTART+RLENGTH+1); gsub(/[[:space:]]+/,";",tail); print head tail}' file
0   k__Bacteria  p__Firmicutes   c__Bacilli;foo;bar
1   k__Bacteria  p__Firmicutes   c__Clostridia;the;quick;brown
2   k__Bacteria  p__Bacteroidetes    c__Bacteroidia;fox;jumped;over
3   k__Bacteria  p__Bacteroidetes    c__Bacteroidia;the;lazy;dogs;back
于 2013-01-15T16:03:18.413 回答
0

你可以像这样在python中做到这一点:

#!/usr/bin/env python
import sys

if __name__ == '__main__':
    for line in sys.stdin:
        cols = line.split()
        print ' '.join([cols[0], ';'.join(cols[1:])])

只是chmod +x script文件并执行它./script < input

请注意, line.split() 将被多个空格分割,即'a b\tc'['a', 'b', 'c'].

于 2013-01-15T00:55:16.070 回答
0

这就是解决办法awk。它可能很脏,有人可以改进它,但它有效

awk 'OFS=";"{a=$1;$1="";$0=a";"$0}sub(/;;/," ",$0) ' temp.txt

输出是

0 k_Bacteria;p_Firmicutes;c_Bacilli
1 k_Bacteria;p_Firmicutes;c_Clostridia
2 k_Bacteria;p_Bacteroidetes;c_Bacteroidia
3 k_Bacteria;p_Bacteroidetes;c_Bacteroidia

cat temp.txt
0 k_Bacteria p_Firmicutes c_Bacilli
1 k_Bacteria p_Firmicutes c_Clostridia
2 k_Bacteria p_Bacteroidetes c_Bacteroidia
3 k_Bacteria p_Bacteroidetes c_Bacteroidia

编辑:根据评论更新

试试这个 awk 脚本myawk.sh

 BEGIN { print "Begin Processing "}
   OFS=";"{
       $9=$9"%%"
   b = $0;
   split($0,a,"%%");
   gsub(/;/," ",a[1])
   print a[1]a[2]
   }
  END {print "Process Complete"}

执行awk -f myawk.sh temp.txt$9 是变量 uptill ,你想保留空格

于 2013-01-15T01:17:42.957 回答
0
awk -v OFS=";" '{$1=$1" "$2;$2="";gsub(/;;/,";",$0);print}' your_file

或者可能在 perl 中:

perl -F -lane 'print join ";",@F' your_file| perl -pe 's/;/ /'
于 2013-01-15T07:00:57.173 回答