python - 通过python将PDF文件发布到SOLR

Question

我在使用 urllib2 通过 python 将 PDF 文件发布到 SOLR 时遇到问题。我正在尝试的代码如下

import urllib2
with open('key.pdf', 'rb') as data_file:
   my_data = data_file.read()
req = urllib2.Request(url='http:// localhost:8983/solr/update/pdf?commit=true',data=my_data)
req.add_header('Content-type', 'application/pdf')
f = urllib2.urlopen(req)

我收到错误 HTTP 404 错误代码。

但是我能够使用此命令成功发布：

http:// localhost:8983/solr/update/extract?literal._id=doc2 -Dtype=application/pdf -jar post.jar key.pdf

你能否让我知道我正在做的错误。对于上述命令，我已经配置了 SOLR 提取处理程序。

除此之外，进行了如下更改

import urllib2
with open('key.pdf', 'rb') as data_file:
my_data = data_file.read()
req = urllib2.Request(url='http ://localhost:8983/solr/update/extract?commit=true',data=my_data)
req.add_header('Content-type', 'application/pdf')
f = urllib2.urlopen(req)

我现在收到 HTTP 400 错误，在 SOLR 登录中可以看到错误“文档缺少强制性唯一键字段：_id”

我如何将 _id 包含到上面的 python 代码中。如果是这样怎么办？

谢谢

score 1 · Accepted Answer

使用提取处理程序时literal.fieldname用于包含到 Solr。fieldname

http://wiki.apache.org/solr/ExtractingRequestHandler#Literals

请求可以接受paramsdict 之类{'commit': 'true', 'field':'this/ ?text may invalidate your url'}的操作，并使其成为 URL 安全的。

python - 通过python将PDF文件发布到SOLR

1 回答 1

Related

Reference