python - 创建字典列表会生成同一字典的副本列表

Question

我想iframe从网页中获取所有内容。

代码：

site = "http://" + url
f = urllib2.urlopen(site)
web_content =  f.read()

soup = BeautifulSoup(web_content)
info = {}
content = []
for iframe in soup.find_all('iframe'):
    info['src'] = iframe.get('src')
    info['height'] = iframe.get('height')
    info['width'] = iframe.get('width')
    content.append(info)
    print(info)       

pprint(content)

结果print(info)：

{'src': u'abc.com', 'width': u'0', 'height': u'0'}
{'src': u'xyz.com', 'width': u'0', 'height': u'0'}
{'src': u'http://www.detik.com', 'width': u'1000', 'height': u'600'}

结果pprint(content)：

[{'height': u'600', 'src': u'http://www.detik.com', 'width': u'1000'},
{'height': u'600', 'src': u'http://www.detik.com', 'width': u'1000'},
{'height': u'600', 'src': u'http://www.detik.com', 'width': u'1000'}]

为什么内容的价值不对？它应该与 I 时的值相同print(info)。

score 66 · Accepted Answer

您没有为每个 iframe 创建单独的字典，您只是一遍又一遍地修改同一个字典，并且不断在列表中添加对该字典的附加引用。

请记住，当您执行类似的操作时content.append(info)，您并没有制作数据的副本，您只是附加了对数据的引用。

您需要为每个 iframe 创建一个新字典。

for iframe in soup.find_all('iframe'):
    info = {}
    ...

更好的是，您不需要先创建一个空字典。只需一次创建它：

for iframe in soup.find_all('iframe'):
    info = {
        "src": iframe.get('src'),
        "height": iframe.get('height'),
        "width": iframe.get('width'),
    }
    content.append(info)

还有其他方法可以实现这一点，例如遍历属性列表，或者使用列表或字典推导，但很难提高上述代码的清晰度。

score 42 · Accepted Answer

您误解了 Pythonlist对象。它类似于 C pointer-array。它实际上并没有“复制”您附加到它的对象。相反，它只是存储一个指向该对象的“指针”。

试试下面的代码：

>>> d={}
>>> dlist=[]
>>> for i in xrange(0,3):
    d['data']=i
    dlist.append(d)
    print(d)

{'data': 0}
{'data': 1}
{'data': 2}
>>> print(dlist)
[{'data': 2}, {'data': 2}, {'data': 2}]

那么为什么print(dlist)不一样print(d)呢？

下面的代码告诉你原因：

>>> for i in dlist:
    print "the list item point to object:", id(i)

the list item point to object: 47472232
the list item point to object: 47472232
the list item point to object: 47472232

所以你可以看到里面的所有项目dlist实际上都指向同一个dict对象。

这个问题的真正答案是附加目标项目的“副本”，使用d.copy().

>>> dlist=[]
>>> for i in xrange(0,3):
    d['data']=i
    dlist.append(d.copy())
    print(d)

{'data': 0}
{'data': 1}
{'data': 2}
>>> print dlist
[{'data': 0}, {'data': 1}, {'data': 2}]

试试这个id()技巧，你可以看到列表项实际上指向完全不同的对象。

>>> for i in dlist:
    print "the list item points to object:", id(i)

the list item points to object: 33861576
the list item points to object: 47472520
the list item points to object: 47458120

score 5 · Accepted Answer

5

如果你想要一行：

list_of_dict = [{} for i in range(list_len)]

于 2015-06-22T14:54:20.827 回答

score 4 · Accepted Answer

info是指向字典的指针 - 您不断将相同的指针添加到列表中contact。

插入info = {}循环，它应该可以解决问题：

...
content = []
for iframe in soup.find_all('iframe'):
    info = {}
    info['src'] = iframe.get('src')
    info['height'] = iframe.get('height')
    info['width'] = iframe.get('width')
...

python - 创建字典列表会生成同一字典的副本列表

4 回答 4

Related

Reference