java - 如何使用 Lucene 的新 AnalyzingInfixSuggester API 实现自动建议？

Question

我是 Lucene 的新手，我想实现自动建议，就像 google 一样，当我输入像 'G' 这样的字符时，它会给我一个列表，你可以自己试试。

我已经在整个网络上搜索过。没有人这样做过，它在包中为我们提供了一些新工具建议

但我需要一个例子来告诉我该怎么做

有没有人可以帮忙？

score 53 · Accepted Answer

我会给你一个非常完整的例子，向你展示如何使用AnalyzingInfixSuggester. 在此示例中，我们将假设我们是亚马逊，并且我们想要自动完成产品搜索字段。我们将利用 Lucene 建议系统的特性来实现以下功能：

排名结果：我们将首先推荐最受欢迎的匹配产品。
受地区限制的结果：我们只会推荐我们在客户所在国家/地区销售的产品。
产品照片：我们会将产品照片 URL 存储在建议索引中，以便我们可以在搜索结果中显示它们，而无需进行额外的数据库查找。

首先，我将定义一个简单的类来保存有关 Product.java 中产品的信息：

import java.util.Set;

class Product implements java.io.Serializable
{
    String name;
    String image;
    String[] regions;
    int numberSold;

    public Product(String name, String image, String[] regions,
                   int numberSold) {
        this.name = name;
        this.image = image;
        this.regions = regions;
        this.numberSold = numberSold;
    }
}

AnalyzingInfixSuggester要使用's方法索引记录，build您需要向它传递一个实现org.apache.lucene.search.suggest.InputIterator接口的对象。AnInputIterator可以访问每条记录的key、contexts、payload和weight。

关键是您实际想要搜索和自动完成的文本。在我们的示例中，它将是产品的名称。

上下文是一组额外的任意数据，您可以使用它们来过滤记录。在我们的示例中，上下文是将特定产品运送到的国家/地区的一组 ISO 代码。

有效负载是您要存储在记录索引中的附加任意数据。在此示例中，我们将实际序列化每个Product实例并将生成的字节存储为有效负载。然后当我们稍后进行查找时，我们可以反序列化有效负载并访问产品实例中的信息，例如图像 URL。

权重用于排序建议结果；首先返回权重较高的结果。我们将使用给定产品的销售数量作为其权重。

这是 ProductIterator.java 的内容：

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.UnsupportedEncodingException;
import java.util.Comparator;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import org.apache.lucene.search.suggest.InputIterator;
import org.apache.lucene.util.BytesRef;


class ProductIterator implements InputIterator
{
    private Iterator<Product> productIterator;
    private Product currentProduct;

    ProductIterator(Iterator<Product> productIterator) {
        this.productIterator = productIterator;
    }

    public boolean hasContexts() {
        return true;
    }

    public boolean hasPayloads() {
        return true;
    }

    public Comparator<BytesRef> getComparator() {
        return null;
    }

    // This method needs to return the key for the record; this is the
    // text we'll be autocompleting against.
    public BytesRef next() {
        if (productIterator.hasNext()) {
            currentProduct = productIterator.next();
            try {
                return new BytesRef(currentProduct.name.getBytes("UTF8"));
            } catch (UnsupportedEncodingException e) {
                throw new Error("Couldn't convert to UTF-8");
            }
        } else {
            return null;
        }
    }

    // This method returns the payload for the record, which is
    // additional data that can be associated with a record and
    // returned when we do suggestion lookups.  In this example the
    // payload is a serialized Java object representing our product.
    public BytesRef payload() {
        try {
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            ObjectOutputStream out = new ObjectOutputStream(bos);
            out.writeObject(currentProduct);
            out.close();
            return new BytesRef(bos.toByteArray());
        } catch (IOException e) {
            throw new Error("Well that's unfortunate.");
        }
    }

    // This method returns the contexts for the record, which we can
    // use to restrict suggestions.  In this example we use the
    // regions in which a product is sold.
    public Set<BytesRef> contexts() {
        try {
            Set<BytesRef> regions = new HashSet();
            for (String region : currentProduct.regions) {
                regions.add(new BytesRef(region.getBytes("UTF8")));
            }
            return regions;
        } catch (UnsupportedEncodingException e) {
            throw new Error("Couldn't convert to UTF-8");
        }
    }

    // This method helps us order our suggestions.  In this example we
    // use the number of products of this type that we've sold.
    public long weight() {
        return currentProduct.numberSold;
    }
}

在我们的驱动程序中，我们将做以下事情：

在 RAM 中创建索引目录。
创建一个StandardTokenizer.
AnalyzingInfixSuggester使用 RAM 目录和标记器创建一个。
使用索引许多产品ProductIterator。
打印一些示例查找的结果。

这是驱动程序 SuggestProducts.java：

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester;
import org.apache.lucene.search.suggest.Lookup;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.Version;

public class SuggestProducts
{
    // Get suggestions given a prefix and a region.
    private static void lookup(AnalyzingInfixSuggester suggester, String name,
                               String region) {
        try {
            List<Lookup.LookupResult> results;
            HashSet<BytesRef> contexts = new HashSet<BytesRef>();
            contexts.add(new BytesRef(region.getBytes("UTF8")));
            // Do the actual lookup.  We ask for the top 2 results.
            results = suggester.lookup(name, contexts, 2, true, false);
            System.out.println("-- \"" + name + "\" (" + region + "):");
            for (Lookup.LookupResult result : results) {
                System.out.println(result.key);
                Product p = getProduct(result);
                if (p != null) {
                    System.out.println("  image: " + p.image);
                    System.out.println("  # sold: " + p.numberSold);
                }
            }
        } catch (IOException e) {
            System.err.println("Error");
        }
    }

    // Deserialize a Product from a LookupResult payload.
    private static Product getProduct(Lookup.LookupResult result)
    {
        try {
            BytesRef payload = result.payload;
            if (payload != null) {
                ByteArrayInputStream bis = new ByteArrayInputStream(payload.bytes);
                ObjectInputStream in = new ObjectInputStream(bis);
                Product p = (Product) in.readObject();
                return p;
            } else {
                return null;
            }
        } catch (IOException|ClassNotFoundException e) {
            throw new Error("Could not decode payload :(");
        }
    }

    public static void main(String[] args) {
        try {
            RAMDirectory index_dir = new RAMDirectory();
            StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_48);
            AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(
                Version.LUCENE_48, index_dir, analyzer);

            // Create our list of products.
            ArrayList<Product> products = new ArrayList<Product>();
            products.add(
                new Product(
                    "Electric Guitar",
                    "http://images.example/electric-guitar.jpg",
                    new String[]{"US", "CA"},
                    100));
            products.add(
                new Product(
                    "Electric Train",
                    "http://images.example/train.jpg",
                    new String[]{"US", "CA"},
                    100));
            products.add(
                new Product(
                    "Acoustic Guitar",
                    "http://images.example/acoustic-guitar.jpg",
                    new String[]{"US", "ZA"},
                    80));
            products.add(
                new Product(
                    "Guarana Soda",
                    "http://images.example/soda.jpg",
                    new String[]{"ZA", "IE"},
                    130));

            // Index the products with the suggester.
            suggester.build(new ProductIterator(products.iterator()));

            // Do some example lookups.
            lookup(suggester, "Gu", "US");
            lookup(suggester, "Gu", "ZA");
            lookup(suggester, "Gui", "CA");
            lookup(suggester, "Electric guit", "US");
        } catch (IOException e) {
            System.err.println("Error!");
        }
    }
}

这是驱动程序的输出：

-- "Gu" (US):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100
Acoustic Guitar
  image: http://images.example/acoustic-guitar.jpg
  # sold: 80
-- "Gu" (ZA):
Guarana Soda
  image: http://images.example/soda.jpg
  # sold: 130
Acoustic Guitar
  image: http://images.example/acoustic-guitar.jpg
  # sold: 80
-- "Gui" (CA):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100
-- "Electric guit" (US):
Electric Guitar
  image: http://images.example/electric-guitar.jpg
  # sold: 100

附录

有一种方法可以避免编写InputIterator您可能会发现更容易的完整内容。您可以编写一个从其,和方法InputIterator返回的存根。将它的一个实例传递给's方法：nullnextpayloadcontextsAnalyzingInfixSuggesterbuild

suggester.build(new ProductIterator(new ArrayList<Product>().iterator()));

然后对于您要索引的每个项目，调用该AnalyzingInfixSuggester add方法：

suggester.add(text, contexts, weight, payload)

索引所有内容后，请致电refresh：

suggester.refresh();

如果您要索引大量数据，则可以通过多线程使用此方法显着加快索引速度：调用build，然后对项目使用多个线程add，最后调用refresh。

[编辑于 2015-04-23 以演示从LookupResult有效负载中反序列化信息。]

java - 如何使用 Lucene 的新 AnalyzingInfixSuggester API 实现自动建议？

1 回答 1

附录

Related

Reference