python - 如何使用熊猫解析 jsonlines 文件

Question

我是 python 新手，并试图从包含数百万行的文件中解析数据。试图去老学校使用 excel 解析它，但它失败了。如何有效地解析信息并将其导出为 excel 文件，以便其他人阅读？

我尝试使用其他人提供的此代码，但到目前为止没有运气

import re
import pandas as pd

def clean_data(filename):
    with open(filename, "r") as inputfile:
        for row in inputfile:
            if re.match("\[", row) is None:
                yield row

with open(clean_file,  'w') as outputfile:
    for row in clean_data(filename):
        outputfile.write(row)

NameError: name 'clean_file' is not defined

score 0 · Accepted Answer

看起来clean_file没有定义，这可能是复制/粘贴代码的问题。

您是要写入一个名为“clean_file”的文件吗？在这种情况下，您需要将其括在引号中：with open("clean_file", 'w')

如果你想使用 json，我建议查看json 包，它有很多用于加载和解析 json 的工具。否则，如果 json 是平的，你可以使用内置的 pandas 函数read_json

python - 如何使用熊猫解析 jsonlines 文件

1 回答 1

Related

Reference