python - 用于操作excel表格的python脚本

Question

我正在尝试编写一个 python 脚本来操作 excel 电子表格。

假设如果，我有样本数据：

Gene        chrom    strand  TSS        TES         Name

NM_145215   chr5     +       135485168  135488045   Abhd11

NM_1190437  chr5     +       135485021  135488045   Abhd11

NM_1205181  chr14    +       54873803   54888844    Abhd4

NM_134076   chr14    +       54878906   54888844    Abhd4

NM_9594     chr2     +       31615464   31659747    Abl1

NM_1112703  chr2     +       31544075   31659747    Abl1

NM_207624   chr11    +       105829258  105851278   Abl1

NM_9598     chr11    +       105836521  105851278   Ace2

NM_1130513  chrX     +       160577273  160626350   Ace2

NM_27286    chrX     +       160578411  160626350   Ace2

对于那些相似的名称（第 6 列），我想检索 TSS 最少的整行。例如，对于前 2 行 - Abhd11 名称，我想将第二行保存在我的结果中，因为 TSS 135485021 < 135485168。对于所有具有相同名称的集合，依此类推。

任何想法和意见表示赞赏。

score 4 · Accepted Answer

输入

如果可能的话，我会将 excel 文件保存为 csv 文件，然后使用csv模块加载到 python 中。

或者，您可以使用该xlrd模块来读取 excel 文件- 尽管我没有使用过它并且对此了解不多。

openpyxl是解析 excel 文件的附加选项（干杯只是另一个笨蛋）。

操纵

ernie 的想法似乎可行，我将按如下方式实施。假设这linesreadfromfile是使用读取的列表列表，csv.reader即每个列表元素是对应于文件中该行的分隔条目的值列表，

finaldict = {}
for row in linesreadfromfile:
    if finaldict.has_key(row[5]):
        if finaldict[row[5]][3] > row[3]:
            finaldict[row[5]] = row
    else:
        finaldict[row[5]] = row

score 2 · Accepted Answer

我同意 mutzmatron 并推荐该xlrd模块。这是一个简单的例子：

import xlrd

# Create your file handle
file_handle = xlrd.open_workbook(file_name)

# Use the first page in the spreadsheet (0-based indexes)
sheet = file_handle.sheet_by_index(0)

# Create dictionary for storing values
abc = {}

# Loop through every row
for i in range(sheet.nrows):
  line = sheet.row_values(i)

  # Get your 'Name' and 'TSS' columns
  name = line[5]
  tss = line[3]

  # Add this 'Name' to your dictionary if it's new, or keep the max value
  if name not in abc.keys():
    abc[name] = tss
  else:
    abc[name] = max(abc[name],tss)

显然，根据您的规范更改您需要保存的内容（整行、某些值等）。

- - 编辑 - -

  # If this 'Name' is new, save this line
  if name not in abc.keys():
    abc[name] = {'tss': tss, 'line': line}

  # Else, if this 'Name' is not new and the TSS is less, keep this new line
  elif tss < abc[name]['tss']:
    abc[name]['line'] = line

score 0 · Accepted Answer

您可以使用IronSpread，它为您提供了一个 python 控制台和一种在 python 中编写此类操作脚本的方法。它还支持 UDF，您可以将其用作普通的 excel 函数，这很好。

score 0 · Accepted Answer

您可以使用 Python Tools for Visual Studio 团队提供的 Pyvot。它提供了一个全面的 API，用于处理来自 CPython 的 Excel 电子表格。

您可以从 PyPi 获取代码：http ://pypi.python.org/pypi/Pyvot 您可以从 Pytools 站点获取文档：http ://pytools.codeplex.com/wikipage?title=Pyvot

python - 用于操作excel表格的python脚本

4 回答 4

Related

Reference