2

我收到了来自 Mixpanel API 的原始数据。我希望将其转换为 CSV 文件,以便可以在 Excel 中操作数据。我已经尝试过这个在线工具(http://jsfiddle.net/sturtevant/vUnF9/),但它似乎无法处理嵌套的 json 结果。做这个的最好方式是什么?

这是示例输出:

{"event":"Event.Name","properties":{"time":1376784014,"distinct_id":"distinctID","$app_version":"1.XX","$city":"cityName","$ios_ifa":"iosIfa","$lib_version":"X.Y.Z","$manufacturer":"Apple","$model":"model","$os":"iPhone OS","$os_version":"X.Y.Z","$region":"Region","$screen_height":999,"$screen_width":999,"$wifi":true,"App Version":"1.XX","BattleDuration":"99","BattleNum":"2","Episode Num":"2","PlayerVictory":"1","mp_country_code":"CODE","mp_device_model":"Model","mp_lib":"iphone"}}
4

2 回答 2

1

我猜这只是您可能要处理的众多记录之一。基本上,您需要在不丢失键及其关系的情况下将 JSON 对象转换为更扁平的对象而不进行嵌套。

这个...

{
    "event":"Event.Name",
    "properties":{
        "time":1376784014,
        "distinct_id":"distinctID",
    ....
    ....
}

可以转换为...(您可以将 _ 替换为任何其他分隔符)

{
    "mixpanel_event":"Event.Name",
    "mixpanel_properties_time":"1376784014",
    "mixpanel_properties_distinct_id":"distinctID",
    ....
    ....
}

然后,您可以使用 csv.DictWriter 将此结构写入 csv 文件。

您可以使用这样的递归函数...

def reduce_item(key, value):
    global reduced_item

    #Reduction Condition 1
    if type(value) is list:
        i=0
        for sub_item in value:
            reduce_item(key+'_'+str(i), sub_item)
            i=i+1

    #Reduction Condition 2
    elif type(value) is dict:
        sub_keys = value.keys()
        for sub_key in sub_keys:
            reduce_item(key+'_'+str(sub_key), value[sub_key])

    #Base Condition
    else:
        reduced_item[str(key)] = str(value)

然后你可以调用这个函数......

raw_data = json.loads("your_json_string")
reduced_item = {}
reduce_item("mixpanel", raw_data)

我写了一个脚本来做到这一点。你可以在Github上查看完整的代码。可以在这里找到详细的解释。

于 2013-12-10T16:31:58.280 回答
0

您可以尝试下面给出的示例代码。您可以使用递归函数来获取键和值(您必须以某种方式确保保持顺序)

import sys
import json

def getKeys(newDict):
    retv = []
    for key in newDict.keys():
        try:
            keyForEmbeddedDict = newDict[key].keys()
            retv.extend(getKeys(newDict[key]))
        except AttributeError:
            retv.append(key)      
    return retv

def getValues(newDict):
    retv = []
    for key in newDict.keys():
        try:
            keyForEmbeddedDict = newDict[key].keys()
            retv.extend(getValues(newDict[key]))
        except AttributeError:
            retv.append(newDict[key])      
    return retv

def main():
    t = {}
    filename = '' # Add your filename
    with open(filename) as f:
        t = json.load(f)
    keys = getKeys(t)
    result = getValues(t)

    print keys
    print result
    return

if __name__ == '__main__':
    main()
    sys.exit(0)
于 2013-09-04T19:24:19.017 回答