regex - 在 Perl 中，如何读取符合条件的部分行？

Question

样本数据：

603       Some garbage data not related to me, 55, 113 ->

1-ENST0000        This is sample data blh blah blah blahhhh
2-ENSBTAP0        This is also some other sample data
21-ENADT)$        DO NOT WANT TO READ THIS LINE. 
3-ENSGALP0        This is third sample data
node #4           This is 4th sample data
node #5           This is 5th sample data

This is also part of the input file but i dont wish to read this. 
Branch -> 05 13, 
      44, 1,1,4,1

17, 1150

637                   YYYYYY: 2 : %

编辑：在上述数据中。这些部分的列宽是固定的，但可能有些部分我不想阅读。以上样本数据已被编辑以反映这一点。

所以在这个输入文件中，我想将第一部分“1-ENST0000”的内容读入一个数组，将“2-ENSBTAP0”的内容读入一个单独的数组，依此类推。

我无法想出一个定义模式的正则表达式......前三行有<someNumber>-ENS<someotherstuf>，然后也可能有node #<some number here>

score 1 · Accepted Answer

这真的是固定列文件吗？如果是这样，那么不要打扰正则表达式。只需在列宽处拆分，可能会从第 1 列修剪尾随空白。

score 0 · Accepted Answer

好的，根据您后来的评论，这与上一个问题有些不同。另外，我现在意识到这node #54是第一列中的有效条目。

更新：我现在也意识到你不需要第一列。

更新：一般来说，您既不想也不需要在 Perl 中处理字符数组。

更新：既然您已经澄清了应该跳过和不应该跳过的内容，这里有一个版本来处理这个问题。if在条件下添加模式以品尝。

#!/usr/bin/perl

use strict;
use warnings;

my @data;

while ( <DATA> ) {
    chomp;

    if ( /^[0-9]+-ENS.{5} +(.+)$/
            or /^node #[0-9]+ +(.+)$/
    ) {
        push @data, [ split //, $1 ];
    }
}

use Data::Dumper;
print Dumper \@data;

__DATA__
603       Some garbage data not related to me, 55, 113 ->

1-ENST0000        This is sample data blh blah blah blahhhh
2-ENSBTAP0        This is also some other sample data
21-ENADT)$        DO NOT WANT TO READ THIS LINE. 
3-ENSGALP0        This is third sample data
node #4           This is 4th sample data
node #5           This is 5th sample data

This is also part of the input file but i dont wish to read this. 
Branch -> 05 13, 
      44, 1,1,4,1

17, 1150

637                   YYYYYY: 2 : %

至于学习如何钓鱼，我建议您阅读perldoc perltoc中的所有相关内容。

regex - 在 Perl 中，如何读取符合条件的部分行？

2 回答 2

Related

Reference