python - 将包含其他列表和字典的列表转换为熊猫数据框

Question

我想将一个似乎是字典列表（以及其中的其他列表）的列表转换为熊猫数据框。

这是我的数据示例：

['b"{',
 'n  boxers: [',
 'n    {',
 'n      age: 30,',
 'n      hasBoutScheduled: true,',
 'n      id: 489762,',
 'n      last6: [Array],',
 "n      name: 'Andy Ruiz Jr',",
 'n      points: 754,',
 'n      rating: 100,',
 'n      record: [Object],',
 'n      residence: [Object],',
 "n      stance: 'orthodox'",
 'n    },',
 'n    {',
 'n      age: 34,',
 'n      hasBoutScheduled: true,',
 'n      id: 468841,',
 'n      last6: [Array],',
 "n      name: 'Deontay Wilder',",
 'n      points: 622,',
 'n      rating: 100,',
 'n      record: [Object],',
 'n      residence: [Object],',
 "n      stance: 'orthodox'",
 'n    },',
 'n    {',
 'n      age: 30,',
 'n      hasBoutScheduled: true,',
 'n      id: 659461,',
 'n      last6: [Array],',
 "n      name: 'Anthony Joshua',",
 'n      points: 603,',
 'n      rating: 100,',
 'n      record: [Object],',
 'n      residence: [Object],',
 "n      stance: 'orthodox'",
 'n    },'

这是我迄今为止尝试过的：

pd.DataFrame.from_records(unclean_file)

这会产生大约 27 列 - 大概是每个空格、逗号等的列。

我也尝试过使用ChainMap from collections import ChainMap

pd.DataFrame.from_dict(ChainMap(*unclean_file),orient='index',columns=['age','hasBoutScheduled','id','last6','name','points','rating','record','residence','stance'])

这会产生错误消息： ValueError: dictionary update sequence element #0 has length 1; 2 是必需的

注意：当我提取数据时，我将其转换为列表 - 以澄清我正在使用裸包运行 node.js 文件，该文件返回 json 输出，然后我将其保存到变量成功，最初以字节字符串格式然后转换为一个列表：

success = muterun_js('index.js')
unclean_file = [str(success.stdout).split('\\')]

score 0 · Accepted Answer

您正在读取 json 格式的数据，因此使用 unclean_file = json.loads(success)而不是unclean_file = [str(success.stdout).split('\\')].

这应该会返回一个 dict 对象，您可以将其直接插入到 DataFrame 中。

此外，您可能需要解码您的数据。

import json
import pandas as pd

success= success.decode('utf-8') # decode your content. Might not be necessary. 
unclean_file = json.loads(success)
data = pd.DataFrame(unclean_file , index=[0])

score 0 · Accepted Answer

拆分数据字符串无济于事——它使解析变得更加困难。

错误消息：JSONDecodeError：期望用双引号括起来的属性名称：第 2 行第 3 列（字符 4）

这清楚地表明一个问题是未引用的键。进一步的问题是未引用的值true和。但要纠正这一切并不难：ArrayObject

unclean_string = success.stdout.decode()
import re
clean_string = re.sub(r'\w+(?=[],:])', r'"\g<0>"', unclean_string)

上面引用了所有后跟:, ,or的标识符]，我们得到了一个格式良好的dict表示，我们可以对其进行评估和制作DataFrame：

pd.DataFrame(eval(clean_string)['boxers'])

python - 将包含其他列表和字典的列表转换为熊猫数据框

2 回答 2

Related

Reference