python - 如何对列表中的每个项目应用函数

Question

我有一个站点地图，上面有大约 21 个网址，每个网址都包含大约 2000 个网址。我正在尝试编写一些东西，让我能够解析每个原始 21 个 url 并获取它们包含的 2000 个 url，然后将其附加到列表中。

几天来，我一直在用头撞墙，试图让它发挥作用，但它一直返回“无”列表。我现在只使用 python 大约 3 周，所以我可能会遗漏一些非常明显的东西。任何帮助都会很棒！

storage = []
storage1 = []

for x in range(21):
url = 'first part of the url' + str(x) + '.xml'
storage.append(url)

def parser(any):
    tree = ET.parse(urlopen(any))
    root = tree.getroot()
    for i in range(len(storage)):
        x = (root[i][0]).text
        storage1.append(x)

storage2 = [parser(x) for x in storage]

我还尝试使用带计数器的 while 循环，但它总是在前 2000 个 url 之后停止。

score 1 · Accepted Answer

parser() never returns anything, so it defaults to returning None, hence why storage2 contains a list of Nones. Perhaps you want to look at what's in storage1?

score 1 · Accepted Answer

If you don't declare a return for a function in python, it automatically returns None. Inside parser you're adding elements to storage1, but aren't returning anything. I would give this a shot instead.

storage = []

for x in range(21):
    url = 'first part of the url' + str(x) + '.xml'
    storage.append(url)

def parser(any):
    storage1 = []
    tree = ET.parse(urlopen(any))
    root = tree.getroot()
    for i in range(len(storage)):
        x = (root[i][0]).text
        storage1.append(x)
    return storage1

storage2 = [parser(x) for x in storage]

EDIT: As Amber said, you should also see that all your elements were actually being stored in storage1.

score 1 · Accepted Answer

如果我正确理解您的问题，您的程序有两个阶段：

您生成 21 个 URL 的初始列表
您在每个 URL 处获取页面，并从页面中提取其他 URL。

您的第一步可能如下所示：

initial_urls = [('http://...%s...' % x) for x in range(21)]

然后，要从页面填充大量 URL 列表，您可以执行以下操作：

big_list = []

def extract_urls(source):
    tree = ET.parse(urlopen(any))
    for link in get_links(tree):
        big_list.append(link.attrib['href'])

def get_links(tree):
    ... - define the logic for link extraction here

for url in initial_urls:
    extract_urls(url)

print big_list

请注意，您必须自己编写从文档中提取链接的过程。

希望这可以帮助！

score 0 · Accepted Answer

您必须在解析器函数中返回 storage1

def parser(any):
    tree = ET.parse(urlopen(any))
    root = tree.getroot()
    for i in range(len(storage)):
        x = (root[i][0]).text
        storage1.append(x)
    return storage1

我想这就是你想要的。

python - 如何对列表中的每个项目应用函数

4 回答 4

Related

Reference