python - Python：将 Tweet unicode 数据导入 pandas 数据框对象

Question

我正在尝试导入具有以下结构的文件（推文转储，带有 unicode 字符串）。目标是使用 pandas 模块将其转换为 DataFrame。我假设第一步是加载到一个 json 对象，然后转换为一个 DataFrame（根据 McKinney 的 Python for Data Analysis 书的第 166 页），但我不确定并且可以使用一些指针来管理它。

import sys, tailer
tweet_sample = tailer.head(open(r'<MyFilePath>\usTweets0.json'), 3)
tweet_sample # returns
['{u\'contributors\': None, u\'truncated\': False, u\'text\': u\'@KREAYSHAWN is...

score 2 · Accepted Answer

只需使用 DataFrame 构造函数...

In [6]: tweet_sample = [{'contributers': None, 'truncated': False, 'text': 'foo'}, {'contributers': None, 'truncated': True, 'text': 'bar'}]

In [7]: df = pd.DataFrame(tweet_sample)

In [8]: df
Out[8]:
  contributers text truncated
0         None  foo     False
1         None  bar      True

如果您将文件作为 JSON 格式，您可以使用以下命令打开它json.load：

import json
with open('<MyFilePath>\usTweets0.json', 'r') as f:
    tweet_sample = json.load(f)

from_jsonPandas即将推出...

python - Python：将 Tweet unicode 数据导入 pandas 数据框对象

1 回答 1

Related

Reference