我正在尝试将具有以下格式的文本文件制成表格。它就像多次出现的数据块。前 5 个字段通常在每个信息块中出现一次,在输出中我想让它们填写(绿色值)。
SOME TEXT
SOME TEXT
SOME TEXT
GSHSH = 0 OK:SUCCESS
ABC = 1
TDE = 0
TNLH = WL_CS
TKKJW = ZZR
MBTYIE = PRM
MHGT = 165
MRLL = CTM
TTDDX = 0
ZDTR = FALSE
UEEM = FALSE
KQTY = FALSE
MHGT = 211
MRLL = CTM
TTDDX = 0
ZDTR = FALSE
UEEM = FALSE
KQTY = FALSE
MHGT = 32
MRLL = CTM
TTDDX = 0
ZDTR = FALSE
UEEM = FALSE
KQTY = FALSE
SOME TEXT
SOME TEXT
SOME TEXT
GSHSH = 23 OK:SUCCESS
ABC = 1
TDE = 0
TNLH = WL_PS
KKJW = ZZZN
MBTYIE = PRM
MHGT = 9254
MRLL = PRM
ZDTR = FALSE
UEEM = FALSE
KQTY = FALSE
SOME TEXT
SOME TEXT
SOME TEXT
GSHSH = 0 OK:SUCCESS
ABC = 1
TDE = 1
TNLH = RTC_RMN
TKKJW = ZZR
BTYIE = RTC
MHGT = 1150
MRLL = PRM
ZDTR = FALSE
UEEM = FALSE
KQTY = FALSE
MHGT = 41
MRLL = CTM
TTDDX = 0
ZDTR = FALSE
UEEM = FALSE
KQTY = FALSE
SOME TEXT
SOME TEXT
SOME TEXT
GSHSH = 1 OK:SUCCESS
我想要的输出是这样的:
我当前的代码如下所示,我能够读取数据并将值存储在 defaultdict 中。之后,我尝试转换为 pandas 数据框,但出现错误。而且我被困在如何组织要在正确列中打印的值。谢谢你的帮助
import re
from collections import defaultdict
from tabulate import tabulate
import pandas as pd
file = 'file.txt'
f=open(file,"r").read().splitlines()
lst=[]
for line in f:
if re.match(r'[ \t]', line):
lst.append(line.replace(' ', '').split('='))
print(lst)
d = defaultdict(list)
for k, v in lst:
d[k].append(v)
>>> d
defaultdict(<class 'list'>, {'ABC': ['1', '1', '1'], 'TDE': ['0', '0', '1'], 'TNLH': ['WL_CS',
'WL_PS', 'RTC_RMN'], 'TKKJW': ['ZZR', 'ZZR'], 'MBTYIE': ['PRM', 'PRM'], 'MHGT': ['165', '211',
'32', '9254', '1150', '41'], 'MRLL': ['CTM', 'CTM', 'CTM', 'PRM', 'PRM', 'CTM'], 'TTDDX':
['0', '0', '0', '0'], 'ZDTR': ['FALSE', 'FALSE', 'FALSE', 'FALSE', 'FALSE', 'FALSE'], 'UEEM':
['FALSE', 'FALSE', 'FALSE', 'FALSE', 'FALSE', 'FALSE'], 'KQTY': ['FALSE', 'FALSE', 'FALSE',
'FALSE', 'FALSE', 'FALSE'], 'KKJW': ['ZZZN'], 'BTYIE': ['RTC']})
df = pd.DataFrame.from_dict(d)
>> ValueError: arrays must all be same length