3

我目前pywikibot用于获取给定维基百科页面的类别(例如,support-vector machine),如下所示。

import pywikibot as pw

print([i.title() for i in list(pw.Page(pw.Site('en'), 'support-vector machine').categories())])

我得到的结果是:

[
  'Category:All articles with specifically marked weasel-worded phrases',
  'Category:All articles with unsourced statements',
  'Category:Articles with specifically marked weasel-worded phrases from May 2018',
  'Category:Articles with unsourced statements from June 2013',
  'Category:Articles with unsourced statements from March 2017',
  'Category:Articles with unsourced statements from March 2018',
  'Category:CS1 maint: Uses editors parameter',
  'Category:Classification algorithms',
  'Category:Statistical classification',
  'Category:Support vector machines',
  'Category:Wikipedia articles needing clarification from November 2017',
  'Category:Wikipedia articles with BNF identifiers',
  'Category:Wikipedia articles with GND identifiers',
  'Category:Wikipedia articles with LCCN identifiers'
]

如您所见,我得到的结果包括许多维基百科的跟踪和维护类别,例如;

  • 类别:所有带有明确标记的黄鼠狼短语的文章
  • 分类:所有带有非来源陈述的文章
  • 类别:CS1 maint:使用编辑器参数
  • 等等

但是,我只感兴趣的类别是;

  • 类别:分类算法
  • 分类:统计分类
  • 分类:支持向量机

我想知道是否有办法获取所有tracing or maintenance维基百科类别,以便我可以将它们从结果中删除以仅获取信息类别。

或者,如果有任何其他方法可以从结果中消除它们,请建议我。

如果需要,我很乐意提供更多详细信息。

4

1 回答 1

3

pywikibot目前不提供一些用于过滤隐藏类别的API 功能。您可以通过搜索以下hidden键手动执行此操作categoryinfo

import pywikibot as pw

site = pw.Site('en', 'wikipedia')
print([
    cat.title()
    for cat in pw.Page(site, 'support-vector machine').categories()
    if 'hidden' not in cat.categoryinfo
])

给出:

['Category:Classification algorithms', 
 'Category:Statistical classification', 
 'Category:Support vector machines']

有关更多信息,请参阅https://www.mediawiki.org/wiki/Help:Categories#Hidden_​​categorieshttps://en.wikipedia.org/wiki/Wikipedia:Categorization#Hiding_categories

于 2019-02-05T04:55:27.333 回答