2

这是我的程序,如果我给出完整的名称,就像我输入一样eng,它会显示值,而不是只显示eng

import re
sent = "eng"
#sent=raw_input("Enter word")
#regex = re.compile('(^|\W)sent(?=(\W|$))')
for line in open("sir_try.txt").readlines():
    if sent == line.split()[0].strip():
        k = line.rsplit(',',1)[0].strip()
        print k
gene name        utr length
ensbta                  24
ensg1                   12
ensg24                  30
ensg37                  65
enscat                  22
ensm                    30

实际上我想做的是我想搜索highest value from the text file不通过单词,它从文本文件中删除相同单词的所有值,其值小于上面应该12 , 30为 ensg 删除的文本中的最大值,然后it should find the minimum value from the utr values and display it with name 你们的回答是,我已经完成了,我在展示我的程序之前提到了它。

4

6 回答 6

0

尝试代替if sent ==并将其替换为if sent in (line.split()[0].strip()):

在这种情况下,这应该检查 sent (engs) 的值是否在参数 (line.split()[0].strip()) 中的任何位置。

如果你仍然试图只取最高值,我只会创建一个变量值,然后是类似的东西

if line.split()[1].strip() > value:
    value = line.split()[1].strip()

测试一下,让我们知道它是如何为您工作的。

于 2013-03-19T20:05:20.467 回答
0

请试试这个

file=open("sir_try.txt","r")
list_line=file.readlines()
file.close()
all_text=""

dic={}
sent="ensg"
temp_list=[]
for line in list_line:
    all_text=all_text+line
    name= line.rsplit()[0].strip()
    score=line.rsplit()[1].strip()
    dic[name]=score
for i in dic.keys():
    if sent in i:
        temp_list.append(dic[i])
hiegh_score=max(temp_list)

def check(index):
    reverse_text=all_text[index+1::-1]
    index2=reverse_text.find("\n")
    if sent==reverse_text[:index2+1][::-1][1:len(sent)+1]:
        return False
    else:
        return True

list_to_min=dic.values()
for i in temp_list:
    if i!=hiegh_score:
        index=all_text.find(str(i))
        while check(index):
            index=all_text.find(str(i),index+len(str(i)))
        all_text=all_text[0:index]+all_text[index+len(str(i)):]
        list_to_min.remove(str(i))
#write all text to "sir_try.txt"
file2=open("sir_try.txt","w")
file2.write(all_text)
file2.close()
min_score= min(list_to_min)
for j in dic.keys():
    if min_score==dic[j]:
        print "min score is :"+str(min_score)+" for person "+j

功能检查是针对独奏中的错误,以解释您的文件何时

gene name        utr length
ali                     12
ali87                   30
ensbta                  24
ensg1                   12
ensg24                  30
ensg37                  65
enscat                  22
ensm                    30

程序删除阿里分数,但我们没有
通过添加检查功能我解决它
,这个版本是最终版本答案

于 2013-03-19T20:17:17.580 回答
0
import operator
f = open('./sir_try.txt', 'r')
f = f.readlines()
del f[0]

gene = {}
matched_gene = {}

for line in f:
    words = line.strip().split(' ')
    words = [word for word in words if not word == '']
    gene[words[0]] = words[1]

# getting user input
user_input = raw_input('Enter gene name: ')
for gene_name, utr_length in gene.iteritems():
    if user_input in gene_name:
        matched_gene[gene_name] = utr_length
m = max(matched_gene.iteritems(), key=operator.itemgetter(1))[0]
print m, matched_gene[m]  # expected answer

# code to remove redundant gene names as per requirement

for key in matched_gene.keys():
    if not key == m:
        matched_gene.pop(key)
for key in gene.keys():
    if user_input in key:
        gene.pop(key)

final_gene = dict(gene.items() + matched_gene.items())
out = open('./output.txt', 'w')
out.write('gene name' + '\t\t' + 'utr length' + '\n\n')
for key, value in final_gene.iteritems():
    out.write(key + '\t\t\t\t' + value + '\n')
out.close()

输出:

Enter gene name: ensg
ensg37 65
于 2013-03-19T20:17:36.427 回答
0

要找出关联最大值(第二列)的名称(第一列),您需要首先在名称和值之间的空白处拆分行。然后您可以使用内置max()函数找到最大值。让它以值列作为排序标准。然后,您可以轻松找到相应的名称。

例子:

file_content = """
gene name        utr length
ensbta                  24
ensg1                   12
ensg24                  30
ensg37                  65
enscat                  22
ensm                    30
"""

# split lines at whitespace
l = [line.split() for line in file_content.splitlines()]

# skip headline and empty lines
l = [line for line in l if len(line) == 2]

print l

# find the maximum of second column
max_utr_length_tuple = max(l, key=lambda x:x[1])

print max_utr_length_tuple

print max_utr_length_tuple[0]

输出是:

$ python test.py
[['ensbta', '24'], ['ensg1', '12'], ['ensg24', '30'], ['ensg37', '65'], ['enscat', '22'], ['ensm', '30']]
['ensg37', '65'] 
ensg37
于 2013-03-19T20:18:15.137 回答
0

简短而甜蜜:

In [01]: t=file_content.split()[4:]
In [02]: b=((zip(t[0::2], t[1::2])))
In [03]: max(b, key=lambda x:x[1])
Out[03]: ('ensg37', '65')
于 2013-03-19T20:34:19.973 回答
0

既然你已经标记了你的问题
这是你想看到的东西,它是(目前)唯一使用 regex 的东西!

import re

sent = 'ensg' # your sequence
# regex that will "filter" the lines containing value of sent  
my_re = re.compile(r'(.*?%s.*?)\s+?(\d+)' % sent)

with open('stack.txt') as f:
    lines = f.read() # get data from file

filtered = my_re.findall(lines) # "filter" your data
print filtered

# get the desired (tuple with maximum "utr length")
max_tuple = max(filtered, key=lambda x: x[1]) 
print max_tuple

输出:

[('ensg1', '12'), ('ensg24', '30'), ('ensg37', '65')]
('ensg37', '65')
于 2013-03-20T06:25:10.460 回答