0

我需要帮助从领先的广告平台上的以下解析数据创建 NDJSON 对象。我打算将数据上传到 bigquery。

我使用 pandas 成功创建了 NDJSON,但我无法控制数据类型,并且在加载过程中会产生错误。[特别是在 Int 和 Floats 之间]

这是我的对象

datadict = {
 'start_time': ['2019-03-26','2019-03-27','2019-03-28'],
 'id': ['campaignid10', 'campaignid10', 'campaignid10'],
 'impression': [100, 200, 0],
 'tweets' : [10, None, None]
}

期望的输出:也 None 应该为空

{'start_time':'2019-03-26', 'id':'campaignid10', 'impression':100, 'tweets':10 }
{'start_time':'2019-03-27', 'id':'campaignid10','impression':200, 'tweets':null}
{'start_time':'2019-03-28', 'id':'campaignid10', 'impression':0, 'tweets':null}
4

1 回答 1

0
import functools
import operator
import ndjson
def transform(dd, days):
    obs = days
    data = [[lst[idx] for lst in list(dd.values())] for idx in range(obs)]
    pre_label = [[elm]*obs for elm in list(dd.keys())]
    labels = [[lst[idx] for lst in pre_label] for idx in range(obs)]
    return [dict(zip(labels[i], data[i])) for i in range(obs)]


jsonList = [transform(_dd, 3) for _dd in dd]
jsonList = functools.reduce(operator.iconcat, jsonList, [])
output_ndjson = ndjson.dumps(jsonList)
print(output_ndjson) 

如果有人可以帮助我简化解决方案,我将不胜感激?

于 2019-08-23T09:07:19.417 回答