python - 带有参数的循环中的Python线程模块？

Question

我正在尝试创建一个爬虫来爬取网站上的前 100 个页面：

我的代码是这样的：

def extractproducts(pagenumber):
    contenturl = "http://websiteurl/page/" + str(pagenumber)

    content = BeautifulSoup(urllib2.urlopen(contenturl).read())
    print pagehtml



pagenumberlist = range(1, 101)

for pagenumber in pagenumberlist:
    extractproducts(pagenumber)

在这种情况下，我该如何使用线程模块，以便 urllib 使用多线程一次抓取 X 个 URL？

/新出

score 0 · Accepted Answer

最有可能的是，您想使用multiprocessing。有一个Pool你可以用来并行执行多个事情：

from multiprocessing import Pool

# Note: This many threads may make your system unresponsive for a while
p = Pool(100)

# First argument is the function to call,
# second argument is a list of arguments
# (the function is called on each item in the list)
p.map(extractproducts, pagenumberlist)

如果您的函数返回任何内容，Pool.map将返回一个返回值列表：

def f(x):
    return x + 1

results = Pool().map(f, [1, 4, 5])
print(results) # [2, 5, 6]

python - 带有参数的循环中的Python线程模块？

1 回答 1

Related

Reference