Find centralized, trusted content and collaborate around the technologies you use most.
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
我想将维基百科的 xml 文件索引到 Solr 中。
但是我收到一个错误,它无法索引。Solr 具有特定的 xml 文件格式。我更改了schema.xml和data-config.xml文件以适应维基百科文件的标签。
schema.xml
data-config.xml
仍然无法索引文件。我的实际意图是索引维基百科,它是一个 30 GB 的 xml 文件。
我将如何将所有维基百科文件索引到 Solr 中?
There's an example section in the DataImportHandler documentation for exactly this: indexing Wikipedia.
DataImportHandler
Basically, you use the DataImportHandler and some XPath to pull the metadata you care about out of the Wikipedia XML, and put it in flat Solr field listings.