我正在使用 Lucene 查询网站的数据库,但遇到了一些问题。我实际上不知道问题是来自索引还是搜索(更准确地说是查询的构造)。好吧,据我所知,在多个 SQL 数据库表中搜索时,最好为每个表使用多个文档(我遵循了这些教程:
http://kalanir.blogspot.pt/2008/06/indexing-database-using-apache-lucene.html
http://www.lucenetutorial.com/techniques/indexing-databases.html
) 这与我想做的很接近。事实上,就我而言,我必须在 3 个相关的表中进行搜索,因为每个表都指定了上述级别(例如:产品 -> 类型 -> 颜色)。因此,我的索引是这样的:
String sql = "select c.idConteudo as ID, c.designacao as DESIGNACAO, cd.texto as DESCRICAO, ctf.webTag as TAG from Conteudo c, ConteudoDetalhe cd, ConteudoTipoFormato ctf where c.idConteudo = cd.idConteudo AND cd.idConteudoTipoFormato = ctf.idConteudoTipoFormato;";
Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery(sql);
Document document;
while (rs.next())
{
String S = new String();
S += IndexerCounter;
document = new Document();
document.add(new Field("ID_ID",S, Field.Store.YES, Field.Index.NO));
document.add(new Field("ID CONTEUDO", rs.getString("ID"), Field.Store.YES, Field.Index.NO));
document.add(new Field("DESIGNACAO", rs.getString("DESIGNACAO"), Field.Store.NO, Field.Index.TOKENIZED));
document.add(new Field("DESCRICAO", rs.getString("DESCRICAO"), Field.Store.NO, Field.Index.TOKENIZED));
document.add(new Field("TAG", rs.getString("TAG"), Field.Store.NO, Field.Index.TOKENIZED));
try{
writer.addDocument(document);
}catch(CorruptIndexException e){
}catch(IOException e){
}catch(Exception e){ } //just for knowing if something is wrong
IndexerCounter++;
}
如果我输出结果,它们是这样的:
ID: idConteudo: designacao: texto: webTag
1:1:Xor:xor 1 Descricao:x or
2:1:Xor:xor 2 Descricao:xis Or
3:1:Xor:xor 3 Descricao:exor
4:2:And:and 1 Descricao:and
5:2:And:and 2 Descricao:&
6:2:And:and 3 Descricao:ande
7:2:And:and 4 Descricao:a n d
8:2:And:and 5 Descricao:and,
9:3:Nor:nor 1 Descricao:nor
10:3:Nor:nor 2 Descricao:not or
我真正想要的是查询(例如Xor)并在创建的文档中搜索它。因此我的搜索方法是这样的:
构造函数:
public Spider(String Query, String Pathh) {
String[] Q;
QueryFromUser = new String();
QueryFromUser = Query;
QueryToSearch1 = new String();
QueryToSearch2 = new String();
Path = Pathh;
try {
try {
Class.forName("com.mysql.jdbc.Driver");
} catch (ClassNotFoundException e) {
e.printStackTrace();
return;
}
try {
connection = DriverManager.getConnection("jdbc:mysql://localhost:3306/mydb", "root", "");
} catch (SQLException e) {
e.printStackTrace();
return;
}
Q = Query.split(" ");
//NOTE: the AND word enables the search engine to search by the various words in a query
for (int i = 0; i < Q.length; i++) {
if ((Q.length - i) > 1) //prevents the last one to take a AND
{
QueryToSearch1 += Q[i] + " AND ";
} else {
QueryToSearch1 += Q[i];
}
}
for (int i = 0; i < Q.length; i++) {
QueryToSearch2 += "+" + Q[i];
}
try {
SEARCHING_CONTENT();
} catch (ClassNotFoundException ex) {
Logger.getLogger(Spider.class.getName()).log(Level.SEVERE, null, ex);
} catch (InstantiationException ex) {
Logger.getLogger(Spider.class.getName()).log(Level.SEVERE, null, ex);
} catch (IllegalAccessException ex) {
Logger.getLogger(Spider.class.getName()).log(Level.SEVERE, null, ex);
} catch (SQLException ex) {
Logger.getLogger(Spider.class.getName()).log(Level.SEVERE, null, ex);
} catch (ParseException ex) {
Logger.getLogger(Spider.class.getName()).log(Level.SEVERE, null, ex);
}
SEARCHING_WEB(); //not for using now
} catch (CorruptIndexException ex) {
Logger.getLogger(Spider.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(Spider.class.getName()).log(Level.SEVERE, null, ex);
}
这个想法是 QueryToSearch1 和 QueryToSearch2 有命令(我在一个在线教程中看到它,不太记得在哪里) AND 和 +。因此,对于来自用户的查询“not or”,将搜索的内容将是“not AND or”用于同时搜索两个词,而“+not+or”用于分别搜索这两个词。这是我的疑惑之一,我真的不知道lucene查询的构造是不是这样。事实是,在 Querying 方法中:
private void SEARCHING_CONTENT() throws CorruptIndexException, IOException, ClassNotFoundException, InstantiationException, IllegalAccessException, SQLException, ParseException {
Querying(QueryToSearch1); // search for the whole phrase
Querying(QueryToSearch2); //search by individual words
//Querying(QueryFromUser); //search by individual words
}
private void Querying(String QueryS) throws CorruptIndexException, IOException, ClassNotFoundException, InstantiationException, IllegalAccessException, SQLException, ParseException {
searcher = new IndexSearcher(IndexReader.open(Path + "/INDEX_CONTENTS"));
query = new QueryParser("TAG", new StopWords()).parse(QueryS);
query.toString();
hits = searcher.search(query);
pstmt = connection.prepareStatement(sql);
for (int i = 0; i < hits.length(); i++) {
id = hits.doc(i).get("TAG");
pstmt.setString(1, id);
displayResults(pstmt);
}
}
查询的文档没有命中。重要的是要在以下行中说:
query = new QueryParser("TAG", new StopWords()).parse(QueryS);
这StopWords
是我创建的一个扩展 StandardAnalyser 的类,但它是一个带有我指定的单词的新类(因为不删除我的搜索词上的重要内容,如 or or 和 - 在这种情况下,这些词可能很重要)。
问题是,正如我所说。执行搜索时没有命中。我不确定这是因为索引还是因为要搜索的查询的构造(如果查询构造错误,则没有命中)。
我会从任何人那里得到任何帮助。如果需要,我很乐意提供更多信息。
非常感谢。