php - 将 Wikipedia API 与 Rest 客户端一起使用

Question

我正在尝试使用 MediaWiki 获取维基百科页面（来自特定类别）。为此，我正在学习本教程清单 3。列出类别中的页面。我的问题是：如何在不使用 Zend 框架的情况下获取维基百科页面？有没有不需要安装的基于php的Rest Clients？因为 Zend 需要先安装他们的包和一些配置......我不想做所有这些事情。

经过谷歌搜索和一些调查，我发现了一个名为 cURL 的工具，将 cURL 与 PHP 一起使用也可以构建一个 rest 服务。我在实现休息服务方面真的很新，但已经尝试在 php 中实现一些东西：

<?php
    header('Content-type: application/xml; charset=utf-8');

    function curl($url) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        $data = curl_exec($ch);
        curl_close($ch);
        return $data;
    }
    $wiki = "http://de.wikipedia.org/w/api.php?action=query&list=allcategories&acprop=size&acprefix=haut&format=xml";
    $result = curl($wiki);
    var_dump($result);
?>

但是得到了结果中的错误。有人可以帮忙吗？

更新：

This page contains the following errors:
error on line 1 at column 1: Document is empty
Below is a rendering of the page up to the first error.

score 0 · Accepted Answer

抱歉这么久才回复，但迟到总比没有好...

当我在命令行上运行您的代码时，我得到的输出是：

string(120) "Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice.
"

因此，问题似乎在于您通过不告诉 cURL 发送自定义 User-Agent 标头而遇到了Wikimedia bot User-Agent 策略。要解决此问题，请遵循该页面底部给出的建议，并将以下行添加到您的脚本中（与其他curl_setopt()调用一起）：

$agent = 'ProgramName/1.0 (http://example.com/program; your_email@example.com)';
curl_setopt($ch, CURLOPT_USERAGENT, $agent);

附言。application/xml除非您确定内容实际上是有效的 XML，否则您可能也不想设置内容类型。特别是，即使输入是有效的 XML，输出var_dump()也不会是有效的 XML。

对于测试和开发，我建议从命令行运行 PHP 或使用text/plain内容类型。或者，如果您愿意，可以text/html使用htmlspecialchars().

附言。将此作为社区 wiki 答案，因为我意识到这个问题之前已经被问过和回答过。

php - 将 Wikipedia API 与 Rest 客户端一起使用

1 回答 1

Related

Reference