0

我如何从包含许多文件的目录中索引日光浴室丰富的数据(msword 和 pdf 文件),我的配置是

$config = array(
         "endpoint" => array("localhost" => array("host"=>"127.0.0.1",
         "port"=>"8983", "path"=>"/solr", "core"=>"demo",)
        ) );

我试试这段代码:

$dir = new Folder($dossier);
$files = $dir->find('.*\.*');
foreach ($files as $file) {
$file = new File($dir->pwd() . DS . $file);

$update = $client->createUpdate();

$query = $client->createExtract();
$query->setFile($file->pwd());
$query->setCommit(true);
$query->setOmitHeader(false);
$doc = $query->createDocument();
$doc->id =$file->pwd();
$doc->name = $file->name;
$doc->title = $file->name();
$query->setDocument($doc);

$result = $client->extract($query);
}

但10秒后我有这个错误:

Solr HTTP error: HTTP request failed, Operation timed out after 5039 milliseconds with 0 out of -1 bytes received

Error: An Internal Error Has Occurred.

以及跟踪 og solr 日志中的此错误:

 org.apache.solr.common.SolrException: URLDecoder: Invalid character encoding detected after position 79 of query string / form data (while parsing as UTF-8)
    at org.apache.solr.servlet.SolrRequestParsers.decodeChars(SolrRequestParsers.java:388)
    at org.apache.solr.servlet.SolrRequestParsers.decodeBuffer(SolrRequestParsers.java:405)
    at org.apache.solr.servlet.SolrRequestParsers.parseFormDataContent(SolrRequestParsers.java:373)
    at org.apache.solr.servlet.SolrRequestParsers.parseQueryString(SolrRequestParsers.java:273)
    at org.apache.solr.servlet.SolrRequestParsers.parseQueryString(SolrRequestParsers.java:243)
    at org.apache.solr.servlet.HttpSolrCall.<init>(HttpSolrCall.java:172)
    at org.apache.solr.servlet.SolrDispatchFilter.getHttpSolrCall(SolrDispatchFilter.java:236)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:212)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.Server.handle(Server.java:499)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
    at java.lang.Thread.run(Thread.java:745)
4

1 回答 1

0

听起来像是连接超时:您确定 Solr 正在运行并且您的指针是正确的吗?

最重要的是,

  • 如果是 Solr 问题,您应该在日志中看到确切原因
  • 如果 Solr 日志中没有任何内容,则 Solr(很可能)与此无关(即您的应用程序从未与 Solr 连接)
于 2015-12-01T10:49:34.140 回答