python - Python文件解析-> IndexError

Question

我正在解析一个包含数百条记录的 ISI 文件，这些记录都以“”标记开头并以“ PT J”标记结尾ER。我试图从嵌套循环中的每条记录中提取标记信息，但不断收到 IndexError。我知道我为什么会得到它，但是有没有人有比检查前几个字符更好的方法来识别新记录的开始？

    while file:
        while line[1] + line[2] + line[3] + line[4] != 'PT J':
            ...                
            Search through and record data from tags
            ...

我正在使用同样的方法，因此偶尔会在识别标签时遇到同样的问题，所以如果您对此也有任何建议，我将不胜感激！

您会注意到，示例数据并不总是包含每条记录的每个标签：

    PT J
    AF Bob Smith
    TI Python For Dummies
    DT July 4, 2012
    ER

    PT J
    TI Django for Dummies
    DT 4/14/2012
    ER

    PT J
    AF Jim Brown
    TI StackOverflow
    ER

score 3 · Accepted Answer

with open('data1.txt') as f:
    for line in f:
        if line.strip()=='PT J':
            for line in f:
                if line.strip()!='ER' and line.strip():
                    #do something with data
                elif line.strip()=='ER':
                     #this record ends here move to the next record
                     break

score 2 · Accepted Answer

这些'ER'行是否仅包含“ER”？这就是你得到IndexErrors 的原因，因为 line[4] 不存在。

首先要尝试的是：

while not line.startswith('PT J'):

而不是您现有的 while 循环。

此外，切片：

line[1] + line[2] + line[3] + line[4] == line[1:5]

（切片的末端不包含在内）

score 0 · Accepted Answer

您可以尝试这样的方法来读取您的文件。

with open('data.txt') as f:
    for line in f:
        line = line.split() # splits your line into a list of character sequences
                            # separated based on whitespace (blanks, tabs)
        llen = len(line)
        if llen == 2 and line[0] == 'PT' and line[1] == 'J': # found start of record
           # process
           # examine line[0] for 'tags', such as "AF", "TI", "DT" and proceed
           # as dictated by your needs. 
           # e.g., 

        if llen > 1 and line[0] == "AF": # grab first/last name in line[1] and line[2]

           # The data will be on the same line and
           # accessible via the correct index values.

        if lline == 1 and line[0] == 'ER': # found end of record.

这肯定需要更多的“编程逻辑”（很可能是嵌入式循环，或者更好的是，调用函数）以将所有内容按正确的顺序/顺序放置，但基本结构已经存在，我希望能让你开始并给你一些想法.

python - Python文件解析-> IndexError

3 回答 3

Related

Reference