python - Python：在一个for循环中将列表项与字典键进行两次比较？

Question

我被困在必须编写的脚本中，找不到出路...

我有两个信息部分重叠的文件。根据一个文件中的信息，我必须从另一个文件中提取信息并将其保存到多个新文件中。第一个只是一个带有 ID 和组信息的表（用于拆分）。另一个包含相同的 ID，但每个 ID 有两次略有不同的信息。

我在做什么： 我创建了一个带有 ID 和组信息的列表列表，如下所示：

table = [[ID, group], [ID, group], [ID, group], ...]

然后，因为第二个文件很大并且排序方式与第一个不同，我想创建一个字典作为索引。在这个索引中，我想保存 ID 以及可以在文件中找到的位置，以便以后可以快速跳转到那里。当然，那里的问题是每个 ID 出现两次。我的简单解决方案（但我对此表示怀疑）是在 ID 中添加 -a 或 -b ：

index = {"ID-a": [FPos, length], "ID-b": [FPOS, length], "ID-a": [FPos, length], ...}

代码：

for line in file:
    read = (line.split("\t"))[0]
    if not (read+"-a") in indices:
        index = read + "-a"
        length = len(line)
        indices[index] = [FPos, length]
    else:
        index = read + "-b"
        length = len(line)
        indices[index] =  [FPos, length]
    FPos += length

我现在想知道的是下一步是否真的有效（我没有收到错误，但我对输出文件有一些疑问）。

for name in table:
    head = name[0]
    ## first round
    (FPos,length) = indices[head+"-a"]
    file.seek(FPos)
    line = file.read(length)
    line = line.rstrip()
    items = line.split("\t")
    output = ["@" + head +" "+ "1:N:0:" +"\n"+ items[9] +"\n"+ "+" +"\n"+ items[10] +"\n"]
    name.append(output)
    ##second round
    (FPos,length) = indices[head+"-b"]
    file.seek(FPos)
    line = file.read(length)
    line = line.rstrip()
    items = line.split("\t")
    output = ["@" + head +" "+ "2:N:0:" +"\n"+ items[9] +"\n"+ "+" +"\n"+ items[10] +"\n"]
    name.append(output)

可以使用这样的 for 循环吗？

有没有更好、更清洁的方法来做到这一点？

score 2 · Accepted Answer

使用 adefaultdict(list)按 ID 保存所有文件偏移量：

from collections import defaultdict

index = defaultdict(list)

for line in file:
    # ...code that loops through file finding ID lines...
    index[id_value].append((fileposn,length))

defaultdict将在给定 id_value 的第一次出现时初始化为一个空列表，然后将 (fileposn,length) 元组附加到它上面。

这会将每个 id 的所有引用累积到索引中，无论是 1、2 还是 20 个引用。然后你可以通过给定的文件位置搜索相关数据。

python - Python：在一个for循环中将列表项与字典键进行两次比较？

我被困在必须编写的脚本中，找不到出路...

1 回答 1

Related

Reference