2

我有一个文件,我需要根据另一个文件中给出的字符范围从中提取段。我想使用 awk 命令来完成。

文件一看起来像这样(单行):

AATTGTGAAGGTAGATGGCTCGCTCCGCGGCGGGGCGCGCGCGCGCGCGCGGGCTCGCTATATAGAGATATATGCGCGCGGCGCGCGGCGCGCGCGGCGCGCGCGTATATATATAGGCGCGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCCCCCCCCCCC

第二个文件如下所示:

5 10
13 20
22 24

输出将是:

GTGAAG
AGATGGCT
GCT
4

2 回答 2

3

这个单线将解决您的问题:

awk 'BEGIN{getline sequence < "first_file"} {print substr(sequence, $1, $2 - $1 + 1) }' second_file

说明:sequence此脚本使用函数从名为的文件中读取字符串first_file(将其调整为实际文件名)getline。然后对于第二个文件的每一行(包含处理范围),它使用substr函数提取必要的子字符串。substr接受三个参数:string( sequence)、position( $1) 和 length( $2 - $1 + 1)。

于 2012-08-22T19:23:08.317 回答
1

Nya为您提供了awk解决方案,这是基于coreutils.

细绳

AATTGTGAAGGTAGATGGCTCGCTCCGCGGCGGGGCGCGCGCGCGCGCGCGGGCTCGCTATATAGAGATATATGCGCGCGGCGCGCGGCGCGCGCGGCGCGCGCGTATATATATAGGCGCGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCCCCCCCCCCC

关闭

5 10
13 20
22 24

你可以得到你想要的输出:

while read off len; do cut -c${off}-${len} string; done < offlen

输出:

GTGAAG
AGATGGCT
GCT
于 2012-08-22T20:50:53.673 回答