python - 从python中的列表创建一个元组列表

Question

我urllib2在python中使用模块从一些url中的锚标签中获取某种信息http://www.google.co.in/，下面是代码

import urllib2
import urlparse
from BeautifulSoup import BeautifulSoup

url = "http://www.google.co.in/"
page = urllib2.urlopen(url)
html = page.read()
page.close()
soup = BeautifulSoup(html)
for tag in soup.findAll('a', href=True):
   text = tag.text 
   tag['href'] = urlparse.urljoin(url, tag['href'])
   print '       '.join([text,tag['href']])

结果：

Web History       http://www.google.co.in/history/optout?hl=en
Settings       http://www.google.co.in/preferences?hl=en
Sign in       https://accounts.google.com/ServiceLogin?hl=en&continue=http://www.google.co.in/
Advanced search       http://www.google.co.in/advanced_search?hl=en-IN&authuser=0
Language tools       http://www.google.co.in/language_tools?hl=en-IN&authuser=0
.......................

现在很好，但我想将信息存储为元组列表，如下所示

[('Web History','http://www.google.co.in/history/optout?hl=en'),('Settings','http://www.google.co.in/preferences?hl=en'),('Sign in','https://accounts.google.com/ServiceLogin?hl=en&continue=http://www.google.co.in/')................]

那么任何人都可以让我知道我们如何格式化来自for循环的数据，如上面的元组列表

score 2 · Accepted Answer

尝试这样的事情：

[(tag.text, urlparse.urljoin(url, tag['href'])) 
        for tag in soup.findAll('a', href=True)]

score 0 · Accepted Answer

您可以尝试创建一个哈希并items()从中提取元组，这只是一个 hack：

def __init__(self, *args, **kwargs):
    super(IndicatorForm, self).__init__(*args, **kwargs)
    d = dir(indicators)
    b = {}
    for a in d:
        b[a] = a
    b = b.items()
    b.sort()
    self.fields["choice"].choices = b

这里 dir(indicators) 是一个数组。

python - 从python中的列表创建一个元组列表

2 回答 2

Related

Reference