样本数据:
603 Some garbage data not related to me, 55, 113 ->
1-ENST0000 This is sample data blh blah blah blahhhh
2-ENSBTAP0 This is also some other sample data
21-ENADT)$ DO NOT WANT TO READ THIS LINE.
3-ENSGALP0 This is third sample data
node #4 This is 4th sample data
node #5 This is 5th sample data
This is also part of the input file but i dont wish to read this.
Branch -> 05 13,
44, 1,1,4,1
17, 1150
637 YYYYYY: 2 : %
编辑:在上述数据中。这些部分的列宽是固定的,但可能有些部分我不想阅读。以上样本数据已被编辑以反映这一点。
所以在这个输入文件中,我想将第一部分“1-ENST0000”的内容读入一个数组,将“2-ENSBTAP0”的内容读入一个单独的数组,依此类推。
我无法想出一个定义模式的正则表达式......前三行有<someNumber>-ENS<someotherstuf>
,然后也可能有node #<some number here>