我是 Lucene 的新手,我想实现自动建议,就像 google 一样,当我输入像 'G' 这样的字符时,它会给我一个列表,你可以自己试试。
我已经在整个网络上搜索过。没有人这样做过,它在包中为我们提供了一些新工具建议
但我需要一个例子来告诉我该怎么做
有没有人可以帮忙?
我是 Lucene 的新手,我想实现自动建议,就像 google 一样,当我输入像 'G' 这样的字符时,它会给我一个列表,你可以自己试试。
我已经在整个网络上搜索过。没有人这样做过,它在包中为我们提供了一些新工具建议
但我需要一个例子来告诉我该怎么做
有没有人可以帮忙?
我会给你一个非常完整的例子,向你展示如何使用AnalyzingInfixSuggester
. 在此示例中,我们将假设我们是亚马逊,并且我们想要自动完成产品搜索字段。我们将利用 Lucene 建议系统的特性来实现以下功能:
首先,我将定义一个简单的类来保存有关 Product.java 中产品的信息:
import java.util.Set;
class Product implements java.io.Serializable
{
String name;
String image;
String[] regions;
int numberSold;
public Product(String name, String image, String[] regions,
int numberSold) {
this.name = name;
this.image = image;
this.regions = regions;
this.numberSold = numberSold;
}
}
AnalyzingInfixSuggester
要使用's方法索引记录,build
您需要向它传递一个实现org.apache.lucene.search.suggest.InputIterator
接口的对象。AnInputIterator
可以访问每条记录的key、contexts、payload和weight。
关键是您实际想要搜索和自动完成的文本。在我们的示例中,它将是产品的名称。
上下文是一组额外的任意数据,您可以使用它们来过滤记录。在我们的示例中,上下文是将特定产品运送到的国家/地区的一组 ISO 代码。
有效负载是您要存储在记录索引中的附加任意数据。在此示例中,我们将实际序列化每个Product
实例并将生成的字节存储为有效负载。然后当我们稍后进行查找时,我们可以反序列化有效负载并访问产品实例中的信息,例如图像 URL。
权重用于排序建议结果;首先返回权重较高的结果。我们将使用给定产品的销售数量作为其权重。
这是 ProductIterator.java 的内容:
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.UnsupportedEncodingException;
import java.util.Comparator;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import org.apache.lucene.search.suggest.InputIterator;
import org.apache.lucene.util.BytesRef;
class ProductIterator implements InputIterator
{
private Iterator<Product> productIterator;
private Product currentProduct;
ProductIterator(Iterator<Product> productIterator) {
this.productIterator = productIterator;
}
public boolean hasContexts() {
return true;
}
public boolean hasPayloads() {
return true;
}
public Comparator<BytesRef> getComparator() {
return null;
}
// This method needs to return the key for the record; this is the
// text we'll be autocompleting against.
public BytesRef next() {
if (productIterator.hasNext()) {
currentProduct = productIterator.next();
try {
return new BytesRef(currentProduct.name.getBytes("UTF8"));
} catch (UnsupportedEncodingException e) {
throw new Error("Couldn't convert to UTF-8");
}
} else {
return null;
}
}
// This method returns the payload for the record, which is
// additional data that can be associated with a record and
// returned when we do suggestion lookups. In this example the
// payload is a serialized Java object representing our product.
public BytesRef payload() {
try {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream out = new ObjectOutputStream(bos);
out.writeObject(currentProduct);
out.close();
return new BytesRef(bos.toByteArray());
} catch (IOException e) {
throw new Error("Well that's unfortunate.");
}
}
// This method returns the contexts for the record, which we can
// use to restrict suggestions. In this example we use the
// regions in which a product is sold.
public Set<BytesRef> contexts() {
try {
Set<BytesRef> regions = new HashSet();
for (String region : currentProduct.regions) {
regions.add(new BytesRef(region.getBytes("UTF8")));
}
return regions;
} catch (UnsupportedEncodingException e) {
throw new Error("Couldn't convert to UTF-8");
}
}
// This method helps us order our suggestions. In this example we
// use the number of products of this type that we've sold.
public long weight() {
return currentProduct.numberSold;
}
}
在我们的驱动程序中,我们将做以下事情:
StandardTokenizer
.AnalyzingInfixSuggester
使用 RAM 目录和标记器创建一个。ProductIterator
。这是驱动程序 SuggestProducts.java:
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester;
import org.apache.lucene.search.suggest.Lookup;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.Version;
public class SuggestProducts
{
// Get suggestions given a prefix and a region.
private static void lookup(AnalyzingInfixSuggester suggester, String name,
String region) {
try {
List<Lookup.LookupResult> results;
HashSet<BytesRef> contexts = new HashSet<BytesRef>();
contexts.add(new BytesRef(region.getBytes("UTF8")));
// Do the actual lookup. We ask for the top 2 results.
results = suggester.lookup(name, contexts, 2, true, false);
System.out.println("-- \"" + name + "\" (" + region + "):");
for (Lookup.LookupResult result : results) {
System.out.println(result.key);
Product p = getProduct(result);
if (p != null) {
System.out.println(" image: " + p.image);
System.out.println(" # sold: " + p.numberSold);
}
}
} catch (IOException e) {
System.err.println("Error");
}
}
// Deserialize a Product from a LookupResult payload.
private static Product getProduct(Lookup.LookupResult result)
{
try {
BytesRef payload = result.payload;
if (payload != null) {
ByteArrayInputStream bis = new ByteArrayInputStream(payload.bytes);
ObjectInputStream in = new ObjectInputStream(bis);
Product p = (Product) in.readObject();
return p;
} else {
return null;
}
} catch (IOException|ClassNotFoundException e) {
throw new Error("Could not decode payload :(");
}
}
public static void main(String[] args) {
try {
RAMDirectory index_dir = new RAMDirectory();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_48);
AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(
Version.LUCENE_48, index_dir, analyzer);
// Create our list of products.
ArrayList<Product> products = new ArrayList<Product>();
products.add(
new Product(
"Electric Guitar",
"http://images.example/electric-guitar.jpg",
new String[]{"US", "CA"},
100));
products.add(
new Product(
"Electric Train",
"http://images.example/train.jpg",
new String[]{"US", "CA"},
100));
products.add(
new Product(
"Acoustic Guitar",
"http://images.example/acoustic-guitar.jpg",
new String[]{"US", "ZA"},
80));
products.add(
new Product(
"Guarana Soda",
"http://images.example/soda.jpg",
new String[]{"ZA", "IE"},
130));
// Index the products with the suggester.
suggester.build(new ProductIterator(products.iterator()));
// Do some example lookups.
lookup(suggester, "Gu", "US");
lookup(suggester, "Gu", "ZA");
lookup(suggester, "Gui", "CA");
lookup(suggester, "Electric guit", "US");
} catch (IOException e) {
System.err.println("Error!");
}
}
}
这是驱动程序的输出:
-- "Gu" (US):
Electric Guitar
image: http://images.example/electric-guitar.jpg
# sold: 100
Acoustic Guitar
image: http://images.example/acoustic-guitar.jpg
# sold: 80
-- "Gu" (ZA):
Guarana Soda
image: http://images.example/soda.jpg
# sold: 130
Acoustic Guitar
image: http://images.example/acoustic-guitar.jpg
# sold: 80
-- "Gui" (CA):
Electric Guitar
image: http://images.example/electric-guitar.jpg
# sold: 100
-- "Electric guit" (US):
Electric Guitar
image: http://images.example/electric-guitar.jpg
# sold: 100
有一种方法可以避免编写InputIterator
您可能会发现更容易的完整内容。您可以编写一个从其,和方法InputIterator
返回的存根。将它的一个实例传递给's方法:null
next
payload
contexts
AnalyzingInfixSuggester
build
suggester.build(new ProductIterator(new ArrayList<Product>().iterator()));
然后对于您要索引的每个项目,调用该AnalyzingInfixSuggester
add
方法:
suggester.add(text, contexts, weight, payload)
索引所有内容后,请致电refresh
:
suggester.refresh();
如果您要索引大量数据,则可以通过多线程使用此方法显着加快索引速度:调用build
,然后对项目使用多个线程add
,最后调用refresh
。
[编辑于 2015-04-23 以演示从LookupResult
有效负载中反序列化信息。]