0

我有一些文本文件,只有当该列有四个字符时,我才需要从第四列中删除第一个字符

文件1如下

ATOM   5181  N  AMET K 406      12.440   6.552  25.691  0.50  7.37           N 
ATOM   5182  CA AMET K 406      13.685   5.798  25.578  0.50  5.87           C  
ATOM   5183  C  AMET K 406      14.045   5.179  26.909  0.50  5.07           C   
ATOM   5184  O   MET K 406      14.595   4.083  27.003  0.50  7.07           O 
ATOM   5185  CB  MET K 406      14.812   6.674  25.044  0.50  6.80           C  
ATOM   5185  CB  MET K 406      14.812   6.674  25.044  0.50  6.80           C  
ATOM   5202  N  AARG K 408      12.186   3.982  29.147  0.50  6.55           N  

文件2如下

ATOM     41  CA ATRP A   6     -18.975 -29.894  -7.425  0.50 19.50           C  
ATOM     42  CA BTRP A   6     -18.979 -29.890  -7.428  0.50 19.16           C
ATOM     43  C   HIS A   6     -18.091 -29.845  -8.669  1.00 19.84           C 
ATOM     44  O   HIS A   6     -17.015 -30.452  -8.696  1.00 20.10           O
ATOM     45  CB ASER A   9     -18.499 -28.879  -6.370  0.50 19.73           C  
ATOM     46  CB BSER A   9     -18.565 -28.837  -6.367  0.50 19.13           C 
ATOM     47  CG CHIS A   12    -19.421 -27.711  -6.216  0.50 21.30           C

期望的输出

文件 1

ATOM   5181  N   MET K 406      12.440   6.552  25.691  0.50  7.37           N 
ATOM   5182  CA  MET K 406      13.685   5.798  25.578  0.50  5.87           C  
ATOM   5183  C   MET K 406      14.045   5.179  26.909  0.50  5.07           C   
ATOM   5184  O   MET K 406      14.595   4.083  27.003  0.50  7.07           O 
ATOM   5185  CB  MET K 406      14.812   6.674  25.044  0.50  6.80           C  
ATOM   5185  CB  MET K 406      14.812   6.674  25.044  0.50  6.80           C  
ATOM   5202  N   ARG K 408      12.186   3.982  29.147  0.50  6.55           N  

文件2

ATOM     41  CA  TRP A   6     -18.975 -29.894  -7.425  0.50 19.50           C  
ATOM     42  CA  TRP A   6     -18.979 -29.890  -7.428  0.50 19.16           C
ATOM     43  C   HIS A   6     -18.091 -29.845  -8.669  1.00 19.84           C 
ATOM     44  O   HIS A   6     -17.015 -30.452  -8.696  1.00 20.10           O
ATOM     45  CB  SER A   9     -18.499 -28.879  -6.370  0.50 19.73           C  
ATOM     46  CB  SER A   9     -18.565 -28.837  -6.367  0.50 19.13           C 
ATOM     47  CG  HIS A   12    -19.421 -27.711  -6.216  0.50 21.30           C
4

2 回答 2

1

这可能对您有用(GNU sed):

sed -r 's/^((\S+\s+){3})\S(\S{3}\s)/\1 \3/' file

如果该列有四个非空格字符,这将用空格替换第四列的第一个字符。

于 2013-08-13T09:27:39.833 回答
0

使用length()函数查找列的长度和substr()打印所需子字符串的函数:

$ awk 'length($4)==4{$4=substr($4,2)}1' file | column -t
ATOM  5181  N   MET  K  406  12.440  6.552  25.691  0.50  7.37  N
ATOM  5182  CA  MET  K  406  13.685  5.798  25.578  0.50  5.87  C
ATOM  5183  C   MET  K  406  14.045  5.179  26.909  0.50  5.07  C
ATOM  5184  O   MET  K  406  14.595  4.083  27.003  0.50  7.07  O
ATOM  5185  CB  MET  K  406  14.812  6.674  25.044  0.50  6.80  C
ATOM  5185  CB  MET  K  406  14.812  6.674  25.044  0.50  6.80  C
ATOM  5202  N   ARG  K  408  12.186  3.982  29.147  0.50  6.55  N

管道column -t重建一个很好的表格格式。要将更改存储回文件,请使用重定向运算符:

$ awk 'length($4)==4{$4=substr($4,2)}1' file | column -t > new_file

sed可以这样做:

$ sed -r 's/^((\S+\s+){3})\S(\S{3}\s)/\1\3/' file
ATOM   5181  N  MET K 406      12.440   6.552  25.691  0.50  7.37           N
ATOM   5182  CA MET K 406      13.685   5.798  25.578  0.50  5.87           C
ATOM   5183  C  MET K 406      14.045   5.179  26.909  0.50  5.07           C
ATOM   5184  O  MET K 406      14.595   4.083  27.003  0.50  7.07           O
ATOM   5185  CB MET K 406      14.812   6.674  25.044  0.50  6.80           C
ATOM   5185  CB MET K 406      14.812   6.674  25.044  0.50  6.80           C
ATOM   5202  N  ARG K 408      12.186   3.982  29.147  0.50  6.55           N

要将更改存储回原始文件,您可以使用以下-i选项:

$ sed -ri 's/^((\S+\s+){3})\S(\S{3}\s)/\1\3/' file
于 2013-08-13T07:26:34.777 回答