我在 lucene 中使用搜索系统。默认情况下它不是分布式的,所以我正在考虑迁移到 HBase 或 Hadoop 之类的东西。
HBase 或 Hypertable 等解决方案是否具有内置搜索功能,或者我需要在它们之上实现 Lucene?
我在 lucene 中使用搜索系统。默认情况下它不是分布式的,所以我正在考虑迁移到 HBase 或 Hadoop 之类的东西。
HBase 或 Hypertable 等解决方案是否具有内置搜索功能,或者我需要在它们之上实现 Lucene?
Lucene is very different from BigTable clones like HBase or Hypertable. If you are simply looking for a distributed Lucene, then you should look at projects such as Elastic Search or Katta.
Solr/Lucene also has the ability to operate over a cluster, but the partitioning is not automatic. You have to create shards and replicas manually to match the distribution of that data you are looking for. If your underlying data is stored in something like HBase this is much easier to set up, modify, and update.
Fundamentally HBase and Lucene solve different problems. Lucene is an index that allows keyword and other types of searches to return quickly. HBase is a data repository that can serve individual rows in real time; however, HBase does not have a online query capability. For best results, you have to combine them. One example in this area is Lily (http://outerthought.org/site/products/lily.html)
你可能还想看看 Lucandra,一个带有 Cassandra 后端的 Lucene:
另一项值得关注的技术是 Katta 或分布式 Lucene,它可以在 HDFS 上运行
Lucene 提供了两个主要功能:结构化搜索和全文搜索。Hbase 不提供任何这些,结构化搜索可以使用 hbase 以相对简单的方式完成,这就是 Lilly 的想法。但是重建全文搜索会更加困难。要扩展 Lucene,您仍然可以尝试通过查看可以将数据拆分到单独区域中的属性来对索引进行分区(您将无法进行跨区域搜索)。然后每个区域可以有一个集群。