1

我正在开展一个项目,该项目要求我pubmed使用电子表格中的输入进行搜索Excel并打印结果计数。我一直在使用xlrdentrez完成这项工作。这是我尝试过的。

  1. 我需要pubmed使用作者姓名、他/她的医学院、年份范围和他/她的导师的名字进行搜索,这些都在Excel电子表格中。我曾经xlrd将包含所需信息的每一列转换为字符串列表。

    from xlrd import open_workbook
    book = xlrd.open_workbook("HEENT.xlsx").sheet_by_index(0)
    med_name = []
    for row in sheet.col(2):
        med_name.append(row)
    med_school = []
    for row in sheet.col(3):
        med_school.append(row)
    mentor = []
    for row in sheet.col(9):
        mentor.append(row)
    
  2. 我已经设法使用 Entrez 打印了我的特定查询的计数。

    from Bio import Entrez
    Entrez.email = "your@email.edu"
    handle = Entrez.egquery(term="Jennifer Runch AND ((2012[Date - Publication] : 2017[Date - Publication])) ")
    handle_1 = Entrez.egquery(term = "Jennifer Runch AND ((2012[Date - Publication] : 2017[Date - Publication])) AND Leoard P. Byk")
    handle_2 = Entrez.egquery(term = "Jennifer Runch AND ((2012[Date - Publication] : 2017[Date - Publication])) AND Southern Illinois University School of Medicine")
    record = Entrez.read(handle)
    record_1 = Entrez.read(handle_1)
    record_2 = Entrez.read(handle_2)
    pubmed_count = []
    for row in record["eGQueryResult"]:
        if row["DbName"] == "pubmed":
            pubmed_count.append(row["Count"])
    
    for row in record_1["eGQueryResult"]:
        if row["DbName"] == "pubmed":
             pubmed_count.append(row["Count"])
    
    for row in record_2["eGQueryResult"]:
         if row["DbName"] == "pubmed":
             pubmed_count.append(row["Count"])
    print(pubmed_count)   
    >>>['3', '0', '0']
    

    问题是我需要将学生姓名(“Jennifer Runch”)替换为学生姓名列表中的下一个学生姓名(“med_name”),将医学院替换为下一个学校,并将当前导师的姓名替换为下一个列表中的导师姓名。

我想我应该在将我的电子邮件声明为之后编写一个 for 循环pubmed,但我不确定如何将这两个代码块链接在一起。有谁知道连接两个代码块的有效方法,或者知道如何以比我尝试过的更有效的方式来做到这一点?谢谢!

4

1 回答 1

1

你得到了大部分代码。它只需要稍微修改一下。

假设您的表格如下所示:

Jennifer Bunch  |Southern Illinois University School of Medicine|Leonard P. Rybak
Philipp Robinson|Stanford University School of Medicine         |Roger Kornberg

您可以使用以下代码

import xlrd
from Bio import Entrez
sheet = xlrd.open_workbook("HEENT.xlsx").sheet_by_index(0)

med_name = list()
med_school = list()
mentor = list()
search_terms = list()
for row in range(0, sheet.nrows):
    search_terms.append([sheet.cell_value(row, 0), sheet.cell_value(row,1), sheet.cell_value(row, 2)])

pubmed_counts = list()

for search_term in search_terms:
    handle = Entrez.egquery(term="{0} AND ((2012[Date - Publication] : 2017[Date - Publication])) ".format(search_term[0]))
    handle_1 = Entrez.egquery(term = "{0} AND ((2012[Date - Publication] : 2017[Date - Publication])) AND {1}".format(search_term[0], search_term[2]))
    handle_2 = Entrez.egquery(term = "{0} AND ((2012[Date - Publication] : 2017[Date - Publication])) AND {1}".format(search_term[0], search_term[1]))
    record = Entrez.read(handle)
    record_1 = Entrez.read(handle_1)
    record_2 = Entrez.read(handle_2)

    pubmed_count = ['', '', '']

    for row in record["eGQueryResult"]:
        if row["DbName"] == "pubmed":
            pubmed_count[0] = row["Count"]
    for row in record_1["eGQueryResult"]:
        if row["DbName"] == "pubmed":
            pubmed_count[1] = row["Count"]
    for row in record_2["eGQueryResult"]:
        if row["DbName"] == "pubmed":
            pubmed_count[2] = row["Count"]

    print(pubmed_count)
    pubmed_counts.append(pubmed_count)

输出


['3', '0', '0']
['1', '0', '0']

所需的修改是使用format使查询变量。

其他一些不必要但可能有帮助的修改:

  • 只在工作表上循环Excel一次
  • 将 存储pubmed_count在预定义的列表中,因为如果值返回为空,则输出的大小会有所不同,因此很难猜测哪个值属于哪个查询
  • 一切都可以进一步优化和美化,例如将查询存储在一个列表中并循环它们,这将减少代码重复,但现在它可以完成工作。
于 2016-10-11T07:39:08.313 回答