我如何从包含许多文件的目录中索引日光浴室丰富的数据(msword 和 pdf 文件),我的配置是
$config = array(
"endpoint" => array("localhost" => array("host"=>"127.0.0.1",
"port"=>"8983", "path"=>"/solr", "core"=>"demo",)
) );
我试试这段代码:
$dir = new Folder($dossier);
$files = $dir->find('.*\.*');
foreach ($files as $file) {
$file = new File($dir->pwd() . DS . $file);
$update = $client->createUpdate();
$query = $client->createExtract();
$query->setFile($file->pwd());
$query->setCommit(true);
$query->setOmitHeader(false);
$doc = $query->createDocument();
$doc->id =$file->pwd();
$doc->name = $file->name;
$doc->title = $file->name();
$query->setDocument($doc);
$result = $client->extract($query);
}
但10秒后我有这个错误:
Solr HTTP error: HTTP request failed, Operation timed out after 5039 milliseconds with 0 out of -1 bytes received
Error: An Internal Error Has Occurred.
以及跟踪 og solr 日志中的此错误:
org.apache.solr.common.SolrException: URLDecoder: Invalid character encoding detected after position 79 of query string / form data (while parsing as UTF-8)
at org.apache.solr.servlet.SolrRequestParsers.decodeChars(SolrRequestParsers.java:388)
at org.apache.solr.servlet.SolrRequestParsers.decodeBuffer(SolrRequestParsers.java:405)
at org.apache.solr.servlet.SolrRequestParsers.parseFormDataContent(SolrRequestParsers.java:373)
at org.apache.solr.servlet.SolrRequestParsers.parseQueryString(SolrRequestParsers.java:273)
at org.apache.solr.servlet.SolrRequestParsers.parseQueryString(SolrRequestParsers.java:243)
at org.apache.solr.servlet.HttpSolrCall.<init>(HttpSolrCall.java:172)
at org.apache.solr.servlet.SolrDispatchFilter.getHttpSolrCall(SolrDispatchFilter.java:236)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:212)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)