我正在使用 apachemanifoldcf 开源项目将 Google Drive 中的文档索引到我的 solr 中。我经常看到它在索引数据时非常不一致。即使在 solr 中反映少量文档也需要时间。您真的认为使用它来索引 Google Drive 是一个不错的选择吗?
2 回答
由于响应时间和谷歌驱动器本身的限制限制,它目前有点慢。但是如果你从谷歌购买额外的带宽,这个限制可能会得到缓解。使用当前设置,如果您希望在谷歌驱动器中索引大量文档,它可能不会像您预期的那样快
Manifold CF is good for crawling through file-system. You can go for Apache Nutch if you are interested in web crawling.
Yes ManifoldCF does take a lot of time to reflect a small number of document. Also it has very less documentation. Although, you can join the mailing list where you can ask questions to the lead developer "Karl". He is very helpful and usually answers withing a few hours.
P.S. :I have worked using ManifoldCF over a project for a span of 10 months.