我有一个文本文件被分解成以下格式的字符串列表:
['DATE','NAME', 'RT','1A','541','09947','199407',552','09949','BOON','101C','SMITH','00321','1553678','1851243','561','559','004789',1749201',ANDERSON']
我想使用 item[0:-1].isdigit() 和 item[-1].isalpha() 的项目创建一个字典,因此在上面的示例中,这将是 1A 和 101C。然后我只想添加 int(item.isdigit()) > 100000 的项目,其中符合此条件的项目通过 for 循环(或者可能是 while 循环)组装到一个新列表中,直到循环点击下一个键价值。
结果将是 dct ={'1A': ['199407'], '101C':['1553678','1851243','1749201']}
尽管一旦迭代达到键列表中项目的长度,我目前遇到了一个索引错误,尽管在一段时间条件下会中断。在收到此错误之前,我以不同的方式索引值并获得一个空字典。一旦索引错误得到修复,我期望得到另一个空字典。
这是我的代码:
# create a list of the dictionary keys to find values in 1A format
# in order to avoid key error when building dict, do not add duplicate
# values to list. Needs to be a list andd not tuple so it can be indexed
for line in lines:
if line[0:-1].isdigit() and line[-1].isalpha() and line not in keys:
keys.append(line)
print str(keys) + " " + str(len(keys))
# build a list of values for each item in keys. Should find the first
# key and check if a converted string to number is > 100000. If it is
# the value is appended to the valLst. If the next key is encountered
# the nested loop breaks and valLst is added to the current key. The
# primary loop moves to the next key while the nested loop should only
# consider items between the current primary iterable and the next.
passes = 0
while passes <=len(keys): # exit loop before index error
for key in keys:
passes += 1
curKey = keys.index(key) # current primary iterable position
nextKey = curKey + 1 # next primary iterable position
print "Passes: " + str(passes)
valLst = [] # empty list for dct values--resets after nested loop break
for line in lines: #iterate through text
if line == keys[nextKey]: # the next key value is encountered in text
break
dict[key] = valLst # valList added to current dict key
curLine = lines.index(line) # start at current key value found in text
if curLine == key: # find current key in text
nextLine = curLine + 1 # get index of next value after current key in text
val = lines[nextLine] # next text value
if val.isdigit(): #append value to valLst if it is > 100000
num = int(val)
if num > 100000:
valLst.append(num)
这是我当前的错误:
Traceback (most recent call last):
File "C:\Python27\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 323, in RunScript
debugger.run(codeObject, __main__.__dict__, start_stepping=0)
File "C:\Python27\Lib\site-packages\pythonwin\pywin\debugger\__init__.py", line 60, in run
_GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
File "C:\Python27\Lib\site-packages\pythonwin\pywin\debugger\debugger.py", line 654, in run
exec cmd in globals, locals
File "C:\Users\user\Desktop\Scripts\PDF_Extractor.py", line 1, in <module>
from cStringIO import StringIO
IndexError: list index out of range
我一直在研究列表推导,但还没有很好地掌握它们以在这种情况下应用一个。我是否使用上面的代码朝着正确的方向前进,或者是否有一种我可以采用的列表理解方法,例如:
valLst = {key for keys in lines for line in line if line == key and int(line.isdigit()) > 100000 valLst.append(line)}