10

I'm having problems while parsing a JSON with python, and now I'm stuck.
The problem is that the entities of my JSON are not always the same. The JSON is something like:

"entries":[
{
"summary": "here is the sunnary",
"extensions": {
   "coordinates":"coords",
   "address":"address",
   "name":"name"
   "telephone":"123123"
   "url":"www.blablablah"
},
}
]

I can move through the JSON, for example:

for entrie in entries:
  name =entrie['extensions']['name']
  tel=entrie['extensions']['telephone']

The problem comes because sometimes, the JSON does not have all the "fields", for example, the telephone field, sometimes is missing, so, the script fails with KeyError, because the key telephone is missing in this entry.
So, my question: how could I run this script, leaving a blank space where telephone is missing? I've tried with:

if entrie['extensions']['telephone']:
    tel=entrie['extensions']['telephone']

but I think is not ok.

4

4 回答 4

17

使用dict.get代替[]

entries['extensions'].get('telephone', '')

或者,简单地说:

entries['extensions'].get('telephone')

get将返回第二个参数(默认值, ),而不是在找不到密钥时None引发 a 。KeyError

于 2013-05-10T23:51:38.637 回答
8

如果数据只有一处缺失,那么可以使用dict.get来填补缺失的值:

tel = d['entries'][0]['extensions'].get('telelphone', '')

如果问题更普遍,您可以让 JSON 解析器使用defaultdict或自定义字典而不是常规字典。例如,给定 JSON 字符串:

json_txt = '''{
    "entries": [
        {
            "extensions": {
                "telephone": "123123", 
                "url": "www.blablablah", 
                "name": "name", 
                "coordinates": "coords", 
                "address": "address"
            }, 
            "summary": "here is the summary"
        }
    ]
}'''

解析它:

>>> class BlankDict(dict):
        def __missing__(self, key):
            return ''

>>> d = json.loads(json_txt, object_hook=BlankDict)

>>> d['entries'][0]['summary']
u'here is the summary'

>>> d['entries'][0]['extensions']['color']
''

附带说明一下,如果您想清理数据集并强制保持一致性,有一个名为Kwalify的好工具,可以对 JSON(和 YAML)进行模式验证;

于 2013-05-10T23:55:43.493 回答
0

有几个有用的字典功能可用于处理此问题。

首先,您可以使用in来测试字典中是否存在键:

if 'telephone' in entrie['extensions']:
    tel=entrie['extensions']['telephone']

get也可能有用;如果缺少键,它允许您指定默认值:

tel=entrie['extensions'].get('telephone', '')

除此之外,您可以查看标准库的collections.defaultdict,但这可能有点矫枉过正。

于 2013-05-10T23:53:18.763 回答
0

两种方式。

一是确保您的字典是标准的,并且当您阅读它们时,它们具有所有字段。另一个是访问字典时要小心。

这是确保您的字典是标准的示例:

__reference_extensions = {
   # fill in with all standard keys
   # use some default value to go with each key
   "coordinates" : '',
   "address" : '',
   "name" : '',
   "telephone" : '',
   "url" : ''
}

entrie = json.loads(input_string)
d = entrie["extensions"]
for key, value in __reference_extensions:
    if key not in d:
        d[key] = value

以下是访问字典时要小心的示例:

for entrie in entries:
   name = entrie['extensions'].get('name', '')
   tel = entrie['extensions'].get('telephone', '')
于 2013-05-10T23:56:30.633 回答