1

我是 Solr 的新手,并尝试使用 Solr 的 DIH 索引文件系统。有趣的是,它工作得很好——有一段时间。现在 DIH 不会初始化并且我不断收到 SAXParseException: Content is not allowed in prolog。

有任何想法吗?我在 Debian 上使用 Solr 3.6.0。我用十六进制编辑器检查了配置文件,但什么也没找到。

这是 data-config.xml:

<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
<dataSource type="BinFileDataSource" name="bin"/>
<document>      
    <entity name="files"
        processor="FileListEntityProcessor"
        fileName=".*.(pdf)|(doc)|(docx)|(ppt)|(pptx)"
        baseDir="/mnt/C"
        rootEntity="false"
        dataSource="null"
        recursive="true"
        onError="skip">
        <field name="id" column="fileAbsolutePath"/>
        <field name="lastModified" column="fileLastModified"/>
        <entity name="f"
            processor="TikaEntityProcessor"
            url="${files.fileAbsolutePath}"
            dataSource="bin"
            format="text"
            onError="skip">
            <field name="fileName" column="file"/>
            <field name="author" column="Author" meta="true"/>
            <field name="title" column="title" meta="true"/>
            <field name="text" column="text"/>
        </entity>
    </entity>
</document>
</dataConfig>

和错误:

org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurred while initializing context
at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:231)
at org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.java:119)
at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:168)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:679) Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:391)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1404)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1014)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:625)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:488)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:819)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:748)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:239)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:288)
at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:216)
... 18 more
4

1 回答 1

0

XML 为我验证。错误点在第 1 行,第 1 个字符。有时可能是字节顺序标记问题,这是文件开头的一个不可见字符。

也许您在某个过于兴奋的编辑器中编辑了文件并明确添加了 BOM。重新检查十六进制编辑器中的前 1-2 个字符。或者尝试将这些内容复制到一个非常纯文本的编辑器中,看看是否可行。

于 2014-10-15T20:32:07.080 回答