0

我正在努力处理来自国际货币基金组织的 JSON 格式的数据。在检查了一些帖子后,我不知道该怎么做。

我试过的

import requests
import pandas as pd
import json

# These are the variables I want to have as columns, plus setting a time index
var = ['NGDP_XDC', 'NCP_XDC', 'NCGG_XDC', 'NFI_XDC', 'NINV_XDC', 'NX_XDC', 
       'NM_XDC', 'NSDGDP_XDC', 'NGDP_R_K_IX', 'NGDP_D_IX']

# URL for the IMF JSON Restful Web Service,
# IFS database
base = 'http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/IFS/'
period = 'A'
country = 'MX'

var = 'NGDP_XDC+NCP_XDC+NCGG_XDC+NFI_XDC+NINV_XDC+NX_XDC+NM_XDC+NSDGDP_XDC+NGDP_R_K_IX+NGDP_D_IX'
    
time = '?startPeriod=1970&endPeriod=2019'

# Get data from the above URL using the requests package
url = base + period + '.' + country + '.' + var + '.' + time

response = requests.get(url)
dictr = response.json()

...到目前为止一切顺利...但是,这是我正在努力的步骤


flat = dictr['CompactData']['DataSet']['Series']

temp = pd.json_normalize(flat)
temp = temp.drop(columns=['@FREQ', '@REF_AREA', '@UNIT_MULT', '@BASE_YEAR'])

我期待一个可以根据自己的意愿转动的平面文件。但是,这就是我得到的


    @INDICATOR @TIME_FORMAT                                                Obs
0     NINV_XDC          P1Y  [{'@TIME_PERIOD': '1970', '@OBS_VALUE': '37.21...
1       NX_XDC          P1Y  [{'@TIME_PERIOD': '1970', '@OBS_VALUE':

我不知道如何将其转换为

year variable1 ... variableN

1970    10     ...    45
1980    20     ...    12
. 
.
.
2019    15     ...    10
4

2 回答 2

1

我以一种不太优雅的方式实现了您的轻推,因为我无法理解如何从您的过程中检索变量代码和时间索引。这也适用:

url = f"{base}{period}.{country}.{'+'.join(var)}.{time}"
response = requests.get(url).json()
series = response['CompactData']['DataSet']['Series']

nipa = pd.DataFrame(index=range(1970, 2020))
N = len(var)

for n in range(0, N):
    temp = pd.DataFrame(series[n]['Obs'], index=range(1970, 2020))
    temp = temp.drop(columns='@TIME_PERIOD')
    temp.rename(columns={'@OBS_VALUE': var[n]}, inplace=True)
    nipa = pd.merge(nipa, temp, left_index=True, right_index=True)

于 2020-10-11T15:47:12.950 回答
1

也许这会推动你朝着正确的方向前进。

的值['CompactData']['DataSet']['Series']是 a dict其中包含一个 dicts 列表作为您所追求的值。

所以你必须把它弄平:

series = response['CompactData']['DataSet']['Series']
flat = [item for sublist in [i['Obs'] for i in series] for item in sublist]

把它们放在一起:

import requests
import pandas as pd

# These are the variables I want to have as columns, plus setting a time index
var = [
    'NGDP_XDC', 'NCP_XDC', 'NCGG_XDC', 'NFI_XDC', 'NINV_XDC', 'NX_XDC',
    'NM_XDC', 'NSDGDP_XDC', 'NGDP_R_K_IX', 'NGDP_D_IX',
]

base = 'http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/IFS/'
period = 'A'
country = 'MX'
time = '?startPeriod=1970&endPeriod=2019'

# Get data from the above URL using the requests package
url = f"{base}{period}.{country}.{'+'.join(var)}.{time}"
response = requests.get(url).json()

series = response['CompactData']['DataSet']['Series']
flat = [item for sublist in [i['Obs'] for i in series] for item in sublist]
print(pd.DataFrame(flat))

输出:

    @TIME_PERIOD        @OBS_VALUE @OBS_STATUS
0           1970   37.210816346586         NaN
1           1971  35.6027864361386         NaN
2           1972   36.123021665698         NaN
3           1973  50.9603299629663         NaN
4           1974   80.992068185601         NaN
..           ...               ...         ...
[499 rows x 3 columns]
于 2020-10-10T17:54:31.793 回答