0

真的很抱歉,我遇到了同样的问题 - awk 和 sed

我想转换包含以下内容的大型文本文件:

>hg19_ct_UserTrack_3545_12513 range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT
TGTCTGATTCTTTCTGCATACCATGC
>hg19_ct_UserTrack_3545_13212 range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT
CCATAAAATAT

等等等等

到:

>range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT
TGTCTGATTCTTTCTGCATACCATGC
>range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT
CCATAAAATAT

我试过了awk 'NR==1{sub(/^[^ ]* /,"")} 1'sed -i '1s/\w\+ //'但没有任何效果。

4

3 回答 3

1

我假设您想删除以大于号开头的行中的第一个单词。在这种情况下,您可以awk像这样使用:

awk '{sub(/^>[^ ]* /,">")} 1'

去掉限制,NR==1意味着后面的块只会在第一行执行。还包括>在模式和替换。

输出:

>range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT
TGTCTGATTCTTTCTGCATACCATGC
>range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT
CCATAAAATAT
于 2013-02-11T11:04:52.150 回答
1

这是一种使用方法sed

sed '/^>/s/[^ ]* />/' file

结果:

>range=chr1:52035541-52035716 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CACACATACTTTTATTCAAGCCTCAGAGCAACCCTGCAAAATGAGTATTA
TCTCCACTTTACAATCAGGAGGCTGAGTCATAAGGAGGTGAGTCACCTGC
CTAGGGCCACATAGCTAGCAAGGAGCCAAGCTGGAATTTTAAGCCACGTT
TGTCTGATTCTTTCTGCATACCATGC
>range=chr1:186122154-186122314 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATCTTCAGGGACAAGTTTTTACAAACTCTCTTAATGGTTTTACCACCCTC
CCTATCAGGACCAAGATCAAATACTTGATGTAAGGCATTTGTTTAATTTT
CTTTAGACAAAGAGGATAGTAATTCTTGCATAAACGTTTTTGTGTATCAT
CCATAAAATAT
于 2013-02-11T12:15:29.917 回答
0

似乎您只想删除第一个字段,直到第一个空格。你可以这样做:

cut -f2- -d ' '
于 2013-02-11T10:54:09.633 回答