python - 将 Grobid curl 命令转换为 Python 中的请求

Question

我正在尝试将curl脚本转换为将 pdf 文件从grobid服务器解析为requestsPython。

基本上，如果我grobid按如下方式运行服务器，

./gradlew run

我可以使用以下内容curl获取学术论文的解析 XML 的输出，example.pdf如下所示

curl -v --form input=@example.pdf localhost:8070/api/processHeaderDocument

但是，我不知道如何将此脚本转换为 Python。这是我尝试使用requests：

GROBID_URL = 'http://localhost:8070'
url = '%s/processHeaderDocument' % GROBID_URL
pdf = 'example.pdf'
xml = requests.post(url, files=[pdf]).text

score 2 · Accepted Answer

我得到了答案。基本上，我错过api了，GROBID_URL而且输入files应该是字典而不是列表。

GROBID_URL = 'http://localhost:8070'
url = '%s/api/processHeaderDocument' % GROBID_URL
pdf = 'example.pdf'
xml = requests.post(url, files={'input': open(pdf, 'rb')}).text

score 0 · Accepted Answer

这是来自http://ceur-ws.bitplan.com/index.php/Grobid的示例 bash 脚本。请注意，还有一个现成的 python 客户端可用。见https://github.com/kermitt2/grobid_client_python

#!/bin/bash
# WF 2020-08-04
# call grobid service with paper from ceur-ws
v=2644
p=44
vol=Vol-$v
pdf=paper$p.pdf
if [ ! -f $pdf ]
then
  wget http://ceur-ws.org/$vol/$pdf
else
  echo "paper $p from volume $v already downloaded" 
fi
curl -v --form input=@./$pdf http://grobid.bitplan.com/api/processFulltextDocument > $p.tei

python - 将 Grobid curl 命令转换为 Python 中的请求

2 回答 2

Related

Reference