python - 如何在 Python 中使用 OAuth 对 Wikimedia Commons 查询服务进行身份验证？

Question

我正在尝试使用 Python 以编程方式使用 Wikimedia Commons 查询服务 [1]，但无法通过 OAuth 1 进行身份验证。

下面是一个自包含的 Python 示例，它不能按预期工作。预期的行为是返回结果集，而是返回登录页面的 HTML 响应。您可以使用pip install --user sparqlwrapper oauthlib certifi. 然后应该为脚本提供一个文本文件的路径，该文件包含在申请仅所有者令牌后给出的粘贴输出[2]。例如

Consumer token
    deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
Consumer secret
    deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
Access token
    deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
Access secret
    deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef

[1] https://wcqs-beta.wmflabs.org/；https://diff.wikimedia.org/2020/10/29/sparql-in-the-shadow-of-structured-data-on-commons/

[2] https://www.mediawiki.org/wiki/OAuth/Owner-only_consumers

import sys
from SPARQLWrapper import JSON, SPARQLWrapper
import certifi
from SPARQLWrapper import Wrapper
from functools import partial
from oauthlib.oauth1 import Client
 
 
ENDPOINT = "https://wcqs-beta.wmflabs.org/sparql"
QUERY = """
SELECT ?file WHERE {
  ?file wdt:P180 wd:Q42 .
}
"""
 
 
def monkeypatch_sparqlwrapper():
    # Deal with old system certificates
    if not hasattr(Wrapper.urlopener, "monkeypatched"):
        Wrapper.urlopener = partial(Wrapper.urlopener, cafile=certifi.where())
        setattr(Wrapper.urlopener, "monkeypatched", True)
 
 
def oauth_client(auth_file):
    # Read credential from file
    creds = []
    for idx, line in enumerate(auth_file):
        if idx % 2 == 0:
            continue
        creds.append(line.strip())
    return Client(*creds)
 
 
class OAuth1SPARQLWrapper(SPARQLWrapper):
    # OAuth sign SPARQL requests

    def __init__(self, *args, **kwargs):
        self.client = kwargs.pop("client")
        super().__init__(*args, **kwargs)
 
    def _createRequest(self):
        request = super()._createRequest()
        uri = request.get_full_url()
        method = request.get_method()
        body = request.data
        headers = request.headers
        new_uri, new_headers, new_body = self.client.sign(uri, method, body, headers)
        request.full_url = new_uri
        request.headers = new_headers
        request.data = new_body
        print("Sending request")
        print("Url", request.full_url)
        print("Headers", request.headers)
        print("Data", request.data)
        return request
 
 
monkeypatch_sparqlwrapper()
client = oauth_client(open(sys.argv[1]))
sparql = OAuth1SPARQLWrapper(ENDPOINT, client=client)
sparql.setQuery(QUERY)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
 
print("Results")
print(results)

我也尝试过不使用 SPARQLWrapper，但只使用 requests+requests_ouathlib。但是，我遇到了同样的问题 --- 返回了登录页面的 HTML --- 所以看起来它实际上可能是 Wikimedia Commons 查询服务的问题。

import sys
import requests
from requests_oauthlib import OAuth1


def oauth_client(auth_file):
    creds = []
    for idx, line in enumerate(auth_file):
        if idx % 2 == 0:
            continue
        creds.append(line.strip())
    return OAuth1(*creds)


ENDPOINT = "https://wcqs-beta.wmflabs.org/sparql"
QUERY = """
SELECT ?file WHERE {
  ?file wdt:P180 wd:Q42 .
}
"""


r = requests.get(
    ENDPOINT,
    params={"query": QUERY},
    auth=oauth_client(open(sys.argv[1])),
    headers={"Accept": "application/sparql-results+json"}
)


print(r.text)

score 2 · Accepted Answer

免责声明：我是 WCQS 的作者之一（也是问题中链接的文章的作者，显然有点误导）。

这种身份验证方式用于通过 Wikimedia Commons（或任何其他 wikimedia 应用程序）进行身份验证的应用程序，但不适用于 WCQS - 它本身就是通过 Wikimedia Commons 进行身份验证的应用程序。在这种情况下，OAuth 严格用于 Web 应用程序对用户进行身份验证，但目前，您无法使用 OAuth 对机器人和其他应用程序进行身份验证。任何类型的使用都需要用户登录。

这是来自我们当前设置和基础架构的限制，我们计划在投入生产时克服这个限制（服务目前以 beta 状态发布）。不幸的是，我不能告诉你什么时候会发生这种情况——但这对我们很重要。

如果您想在此之前试用您的机器人，您可以随时登录浏览器并在代码中使用令牌，但它肯定会过期，并且需要重复该过程。对您的第二个清单进行简单修改即可：

import sys
import requests

ENDPOINT = "https://wcqs-beta.wmflabs.org/sparql"
QUERY = """
SELECT ?file WHERE {
  ?file wdt:P180 wd:Q42 .
}
"""

r = requests.get(
    ENDPOINT,
    params={"query": QUERY},
    headers={"Accept": "application/sparql-results+json", "wcqsSession": "<token retrieved after logging in"}
)


print(r.text)

请注意，直接在 irc (freenode:#wikimedia-discovery) 上询问邮件列表或创建Phabricator票证是获得 WCQS 帮助的最佳方式。

score 1 · Accepted Answer

为什么不试试看是否可以使用requests+ OAuth 等“手动”回答 SPARQL 查询，然后，如果可以，您会知道我们在 SPARQLWrapper 中有一个错误，因为反对您的应用程序代码中的问题。

requests代码应类似于以下 + OAuth 内容：


r = requests.get(
    ENDPOINT,
    params={"query": QUERY},
    auth=auth,
    headers={"Accept": "application/sparql-results+json"}
)

缺口

score 0 · Accepted Answer

我会尝试使用不同的端点运行您的代码。而不是https://wcqs-beta.wmflabs.org/sparql尝试使用https://query.wikidata.org/sparql. 当我使用第一个端点时，我还获得了您获得的登录页面的 HTML 响应，但是，当我使用第二个端点时，我得到了正确的响应：

from SPARQLWrapper import SPARQLWrapper, JSON

endpoint = "https://query.wikidata.org/sparql"
sparql = SPARQLWrapper(endpoint)

# Example query to return a list of movies that Christian Bale has acted in:
query = """
SELECT ?film ?filmLabel (MAX(?pubDate) as ?latest_pubdate) WHERE {
   ?film wdt:P31 wd:Q11424 .
   ?film wdt:P577 ?pubDate .
   ?film wdt:P161 wd:Q45772 .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
 }
GROUP BY ?film ?filmLabel
ORDER BY DESC(?latest_pubdate)
LIMIT 50
"""

sparql.setQuery(query)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

# Define a quick function to get json into pandas dataframe:
import pandas as pd
from pandas import json_normalize

def df_from_res(j):
    df = json_normalize(j['results']['bindings'])[['filmLabel.value','latest_pubdate.value']]
    df['latest_pubdate.value'] = pd.to_datetime(df['latest_pubdate.value']).dt.date
    return df

df_from_res(results).head(5)


#   filmLabel.value   latest_pubdate.value
# 0 Ford v Ferrari    2019-11-15
# 1 Vice              2019-02-21
# 2 Hostiles          2018-05-31
# 3 The Promise       2017-08-17
# 4 Song to Song      2017-05-25

这个端点也requests以类似的方式与库一起工作：

import requests

payload = {'query': query, 'format': 'json'}

results = requests.get(endpoint, params=payload).json()

score 0 · Accepted Answer

如果您要求进行 MediaWiki OAuth v1 身份验证

我将此解释为您正在寻找一种仅针对 WikiMedia 站点（使用 v1）进行 OAuth 的方法，您的其余代码并不是问题的一部分吗？如我错了请纠正我。

您无需指定您正在开发哪种类型的应用程序，对于使用具有正确后端支持的 Flask 或 Django 的 Web 应用程序，有不同的方法可以使用 OAuth 对 Wikimedia 页面进行身份验证。

一种更“通用”的方式是mwoauth从任何应用程序中使用库 (python-mwoauth)。Python 3 和 Python 2 仍然支持它。

我假设如下：

目标服务器安装了带有 OAuth 扩展的 MediaWiki。
您希望与此服务器进行 OAuth 握手以进行身份验证。

使用 Wikipedia.org 作为示例目标平台：

$ pip install mwoauth

# Find a suitable place, depending on your app to include the authorization code:

from mwoauth import ConsumerToken, Handshaker
from six.moves import input # For compatibility between python 2 and 3

# Construct a "consumer" from the key/secret provided by the MediaWiki site
import config
consumer_token = ConsumerToken(config.consumer_key, config.consumer_secret)

# Construct handshaker with wiki URI and consumer
handshaker = Handshaker("https://en.wikipedia.org/w/index.php",
                        consumer_token)

# Step 1: Initialize -- ask MediaWiki for a temporary key/secret for user
redirect, request_token = handshaker.initiate()

# Step 2: Authorize -- send user to MediaWiki to confirm authorization
print("Point your browser to: %s" % redirect) #
response_qs = input("Response query string: ")

# Step 3: Complete -- obtain authorized key/secret for "resource owner"
access_token = handshaker.complete(request_token, response_qs)
print(str(access_token))

# Step 4: Identify -- (optional) get identifying information about the user
identity = handshaker.identify(access_token)
print("Identified as {username}.".format(**identity))

# Fill in the other stuff :)

我可能完全误解了你的问题，如果是这样，请通过我的左耳向我喊。

GitHub：

使用来源，卢克

这是文档的链接，其中包括使用 Flask 的示例： WikiMedia OAuth - Python

python - 如何在 Python 中使用 OAuth 对 Wikimedia Commons 查询服务进行身份验证？

4 回答 4

如果您要求进行 MediaWiki OAuth v1 身份验证

Related

Reference