我有一些 Python 经验,但由于缺乏正规培训,我从未使用过 try & except 函数来捕获错误。
我正在从维基百科中提取一些文章。为此,我有一系列标题,其中一些最后没有任何文章或搜索结果。我希望页面检索功能只是跳过这几个名称并继续在其余部分上运行脚本。可重现的代码如下。
import wikipedia
# This one works.
links = ["CPython"]
test = [wikipedia.page(link, auto_suggest=False) for link in links]
test = [testitem.content for testitem in test]
print(test)
#The sequence breaks down if there is no wikipedia page.
links = ["CPython","no page"]
test = [wikipedia.page(link, auto_suggest=False) for link in links]
test = [testitem.content for testitem in test]
print(test)
运行它的库使用这样的方法。通常这是非常糟糕的做法,但由于这只是一次性的数据提取,我愿意更改库的本地副本以使其正常工作。编辑我现在包含了完整的功能。
def page(title=None, pageid=None, auto_suggest=True, redirect=True, preload=False):
'''
Get a WikipediaPage object for the page with title `title` or the pageid
`pageid` (mutually exclusive).
Keyword arguments:
* title - the title of the page to load
* pageid - the numeric pageid of the page to load
* auto_suggest - let Wikipedia find a valid page title for the query
* redirect - allow redirection without raising RedirectError
* preload - load content, summary, images, references, and links during initialization
'''
if title is not None:
if auto_suggest:
results, suggestion = search(title, results=1, suggestion=True)
try:
title = suggestion or results[0]
except IndexError:
# if there is no suggestion or search results, the page doesn't exist
raise PageError(title)
return WikipediaPage(title, redirect=redirect, preload=preload)
elif pageid is not None:
return WikipediaPage(pageid=pageid, preload=preload)
else:
raise ValueError("Either a title or a pageid must be specified")
我应该怎么做才能只检索没有给出错误的页面。也许有一种方法可以过滤掉列表中出现此错误或某种错误的所有项目。对于不存在的页面,返回“NA”或类似内容会很好。在没有通知的情况下跳过它们也可以。谢谢!