我有 2 个文件 annotation.txt 和motif_list.txt。我被困在一个点,我必须在匹配一个模式后打印下一个连续的行,直到下一个模式出现。每个模式之后的行数是可变的。该模式的末尾总是有“/Homer”。需要一点帮助。谢谢
注释.txt
AT1G10720(BSD)/col-AT1G10720-DAP-Seq(GSE60143)/Homer
gene1
gene2
gene3
ERF3(AP2EREBP)/colamp-ERF3-DAP-Seq(GSE60143)/Homer
gene1
gene5
gene4
gene10
--------------------------------
主题列表.txt
AT1G10720(BSD)/col-AT1G10720-DAP-Seq(GSE60143)/Homer BSD
E2F4(E2F)/K562-E2F4-ChIP-Seq(GSE31477)/Homer ERF
ERF3(AP2EREBP)/colamp-ERF3-DAP-Seq(GSE60143)/Homer ERF
代码:
import re
file1 = open("annotation.txt", "r")
file2 = open("motif_list.txt", "r")
annot=file1.readlines()
motif=file2.readlines()
for i in annot:
if re.search("/Homer", i):
for j in motif:
motif_info=j.split("\t")
if motif_into[0]==i:
print the next few lines until the next motif comes, "\t", i, "\t", motif_into[1]
期望的输出:
gene1 AT1G10720(BSD)/col-AT1G10720-DAP-Seq(GSE60143)/Homer BSD
gene2 AT1G10720(BSD)/col-AT1G10720-DAP-Seq(GSE60143)/Homer BSD
gene3 AT1G10720(BSD)/col-AT1G10720-DAP-Seq(GSE60143)/Homer BSD
gene1 ERF3(AP2EREBP)/colamp-ERF3-DAP-Seq(GSE60143)/Homer ERF
gene5 ERF3(AP2EREBP)/colamp-ERF3-DAP-Seq(GSE60143)/Homer ERF
gene4 ERF3(AP2EREBP)/colamp-ERF3-DAP-Seq(GSE60143)/Homer ERF
gene10 ERF3(AP2EREBP)/colamp-ERF3-DAP-Seq(GSE60143)/Homer ERF