7

我有超过 100 个 CSV 文件,其中有 10000 行,我正在编制索引。然后查询拼写是类似的拼写。虽然这样做索引非常慢。

我找到了一些很好的解决方案

  1. 使用主索引和从属进行查询的主从。如何更快地索引 Solr 中的记录(并且不影响 ColdFusion Web 服务器)?两个JVM?

  2. 使用三范围http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/

我知道这两种解决方案是不同的我想要一些应该更高优先级的评论?第二种解决方案是否适合我的问题?如果我的拼写检查问题有更多解决方案。

提前致谢

4

2 回答 2

9

索引通常会使查询变慢。如果您有快速磁盘,索引将使用 100% 的 CPU,否则,它将使用 100% 的磁盘带宽。无论哪种方式,查询都会很慢。

主/从配置是此问题的标准解决方案。从服务器专用于搜索查询。它们唯一变慢的时间是在复制之后,当创建具有新缓存的新搜索器时。

主/从配置可能不会使索引更快,但会避免查询性能变慢。已经进行了多线程索引的工作,因此您可能希望一次测试多个索引任务。如果瓶颈是磁盘 IO,这将无济于事,只有当它使用 100% 的 CPU 时。

Trie 字段非常适合范围查询。我怀疑它们会对索引速度产生很大影响。

最后,您可能需要调整拼写建议选项。拼写建议可能需要大量工作,并且您可以通过不同的、更便宜的选项获得良好的结果。

于 2012-04-23T17:36:37.950 回答
1

在进行批量索引时,您通常可以实现良好的查询性能,而无需借助蓝/绿设置。

以下是实现这一目标的一些提示:

  • 如果您要插入大量文档,请尽可能使用https://github.com/lucidworks/spark-solr 。这有一个相对容易使用的强大的批量导入机制。如果没有必要,不要编写自己的批量导入到 Solr 代码。
  • 如果您必须使用 solrj,请确保您是分批提交。见add(Collection<SolrInputDocument)方法。如果你用插入 http 请求压倒 solr,它会大大降低查询速度。
  • 不要过于频繁地提交。提交是昂贵的!
  • 使用自动加热。这是一个 solr 功能,它会在每次创建新搜索器时(提交后)使文件系统缓存预热。这对于确保通过将所有搜索保留在有效缓存中来确保在提交后获得良好的快速查询至关重要。
  • 如果您有很多分片(例如,超过 20 个),请考虑采用“交错提交”方法,一次提交一个分片。这是一个 Python 示例,它有 16 个 solr 服务器,总共有 64 个分片。通过一次只提交 1 个分片,它可以防止整个集合同时提交,从而减少提交的影响。
while :
do
  curl -v 'http://solr-0.solr:8983/solr/MyCollection_shard8_0_0_replica_n127/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-0.solr:8983/solr/MyCollection_shard8_0_1_replica_n128/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-0.solr:8983/solr/MyCollection_shard8_1_0_replica_n131/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-0.solr:8983/solr/MyCollection_shard8_1_1_replica_n132/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  
  curl -v 'http://solr-1.solr:8983/solr/MyCollection_shard4_0_0_replica_n95/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-1.solr:8983/solr/MyCollection_shard4_0_1_replica_n96/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-1.solr:8983/solr/MyCollection_shard4_1_0_replica_n99/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-1.solr:8983/solr/MyCollection_shard4_1_1_replica_n100/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-2.solr:8983/solr/MyCollection_shard3_0_0_replica_n87/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-2.solr:8983/solr/MyCollection_shard3_0_1_replica_n88/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-2.solr:8983/solr/MyCollection_shard3_1_0_replica_n91/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-2.solr:8983/solr/MyCollection_shard3_1_1_replica_n92/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-3.solr:8983/solr/MyCollection_shard7_0_0_replica_n119/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-3.solr:8983/solr/MyCollection_shard7_0_1_replica_n120/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-3.solr:8983/solr/MyCollection_shard7_1_0_replica_n123/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-3.solr:8983/solr/MyCollection_shard7_1_1_replica_n124/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-4.solr:8983/solr/MyCollection_shard2_0_0_replica_n79/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-4.solr:8983/solr/MyCollection_shard2_0_1_replica_n80/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-4.solr:8983/solr/MyCollection_shard2_1_0_replica_n83/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-4.solr:8983/solr/MyCollection_shard2_1_1_replica_n84/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-5.solr:8983/solr/MyCollection_shard1_0_0_replica_n71/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-5.solr:8983/solr/MyCollection_shard1_0_1_replica_n72/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-5.solr:8983/solr/MyCollection_shard1_1_0_replica_n75/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-5.solr:8983/solr/MyCollection_shard1_1_1_replica_n76/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-6.solr:8983/solr/MyCollection_shard5_0_0_replica_n159/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-6.solr:8983/solr/MyCollection_shard5_0_1_replica_n161/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-6.solr:8983/solr/MyCollection_shard5_1_0_replica_n163/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-6.solr:8983/solr/MyCollection_shard5_1_1_replica_n165/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-7.solr:8983/solr/MyCollection_shard6_0_0_replica_n151/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-7.solr:8983/solr/MyCollection_shard6_0_1_replica_n153/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-7.solr:8983/solr/MyCollection_shard6_1_0_replica_n155/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-7.solr:8983/solr/MyCollection_shard6_1_1_replica_n157/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-8.solr:8983/solr/MyCollection_shard8_0_0_replica_n135/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-8.solr:8983/solr/MyCollection_shard8_0_1_replica_n137/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-8.solr:8983/solr/MyCollection_shard8_1_0_replica_n139/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-8.solr:8983/solr/MyCollection_shard8_1_1_replica_n141/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-9.solr:8983/solr/MyCollection_shard4_0_0_replica_n143/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-9.solr:8983/solr/MyCollection_shard4_0_1_replica_n145/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-9.solr:8983/solr/MyCollection_shard4_1_0_replica_n147/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-9.solr:8983/solr/MyCollection_shard4_1_1_replica_n149/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-10.solr:8983/solr/MyCollection_shard6_0_0_replica_n111/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-10.solr:8983/solr/MyCollection_shard6_0_1_replica_n112/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-10.solr:8983/solr/MyCollection_shard6_1_0_replica_n115/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-10.solr:8983/solr/MyCollection_shard6_1_1_replica_n116/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-11.solr:8983/solr/MyCollection_shard5_0_0_replica_n103/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-11.solr:8983/solr/MyCollection_shard5_0_1_replica_n104/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-11.solr:8983/solr/MyCollection_shard5_1_0_replica_n107/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-11.solr:8983/solr/MyCollection_shard5_1_1_replica_n108/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-12.solr:8983/solr/MyCollection_shard2_0_0_replica_n167/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-12.solr:8983/solr/MyCollection_shard2_0_1_replica_n169/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-12.solr:8983/solr/MyCollection_shard2_1_0_replica_n171/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-12.solr:8983/solr/MyCollection_shard2_1_1_replica_n173/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-13.solr:8983/solr/MyCollection_shard1_0_0_replica_n175/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-13.solr:8983/solr/MyCollection_shard1_0_1_replica_n177/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-13.solr:8983/solr/MyCollection_shard1_1_0_replica_n179/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-13.solr:8983/solr/MyCollection_shard1_1_1_replica_n181/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-14.solr:8983/solr/MyCollection_shard3_0_0_replica_n183/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-14.solr:8983/solr/MyCollection_shard3_0_1_replica_n185/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-14.solr:8983/solr/MyCollection_shard3_1_0_replica_n187/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-14.solr:8983/solr/MyCollection_shard3_1_1_replica_n189/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  curl -v 'http://solr-15.solr:8983/solr/MyCollection_shard7_0_0_replica_n191/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-15.solr:8983/solr/MyCollection_shard7_0_1_replica_n193/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-15.solr:8983/solr/MyCollection_shard7_1_0_replica_n195/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'
  curl -v 'http://solr-15.solr:8983/solr/MyCollection_shard7_1_1_replica_n197/update?update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=blah&commit_end_point=true&version=2&expungeDeletes=false'

  # 15 minute commit timer
  sleep 900
done
  • 确保您的过滤器缓存足够大。这可以防止过滤占用 CPU 和磁盘 IO,因此您会注意到索引时的影响要小得多。
  • 考虑定期进行 Solr 优化。通过进行 solr 优化,更多的索引可能适合内存,从而减少磁盘 IO。通过减少磁盘 IO,新文档不太可能需要访问磁盘,从而提高磁盘性能。
  • 如果 Solr 在共享硬件上,请确保正在运行的其他应用程序没有占用 CPU 和磁盘资源。例如,如果您使用 Apache Tika 解析文档,请确保 Tika 在另一台主机上运行。Solr 最好自己留下。
  • 确保 VM 上有足够的内存用于文件系统缓存。例如。如果你有一台 32G RAM 的机器,你应该考虑不要让你的最大堆大小太大。例如,如果您有 -Xmx28G 它不会为操作系统的文件系统缓存留下足够的内存。使用经验分析来确定您实际需要多少堆。例如 -Xmx12G 会将 50% 的内存留给 FS 缓存。您在查询期间使用缓存的次数越多,索引对您的影响就越小。
于 2021-06-02T22:19:32.780 回答