1

我在java(eclipse)中做一个搜索引擎实现我有一个名为的表tbl_index,我在其中保存所有关键字,并在其中保存一个名为列的列keyWord,我在列中保存一个URL url

现在,如果一个搜索词包含多个单词,我将如何编写查询来查找包含所有单词的所有 URL。

表信息:

第 1 列:keyWord (nvarchar(50))

第 2 列:url (varchar(800))

这两个统称为表的主键。请提出一种我不必改变我的表结构的方法。尽管指出我当前模式中的任何错误都会有所帮助。

还请建议一些好的索引技术来索引我从网站的 html(列keyWord)中获得的关键字。

4

2 回答 2

1

Try this:

select distinct
  url 
from 
  tbl_index a 
where 
  (select count(*) from tbl_index b where a.url=b.url and b.keyword in ('word 1', 'word 2' . . .)) = n

where n is the number of keywords you are searching for and 'word 1', 'word 2' etc are the keywords.

I suggest you create three tables: one with one row for each unique URL, with a numeric id and the url name, a second table with one row for each unique keyword, with a numeric id and the keyword and then a cross-reference table with all the pair url id - keyword id:

create table urls (
  url_id int identity,
  url varchar(800),
  primary key (url_id)
)

create table keywords (
  keyword_id int identity,
  keyword nvarchar(50),
  primary key (keyword_id)
)

create table urlkeys (
  url_id int,
  keyword_id int,
  primary key (url_id, keyword_id)
)

In this way you should reduce the size of the data. The query above becomes something like this:

select 
  url
from
  urls
where (select count(*) from urlkeys join keywords on urlkeys.keyword_id=keywords.keyword_id where urlkeys.url_id=urls.url_id and keywords.keyword in ('word 1', 'word 2' . . .)) = n

It would be a good idea to have an index on the keyword column

P.S. this is the outline of a simplistic SQL solution, but as various people already pointed out in comments this is a problem best solved using a full-text search solution. As soon as you try to do something like stemming, proximity search, partial word searches, wildcards etc etc. any SQL-based solution will fall short.

于 2012-06-01T21:17:45.517 回答
0

这基本上是两步过程。

A. 首先将您的搜索词分解成单独的词,如下所示:

String[] words = searchTerm.split("\\W+");

B. 然后通过遍历 words 数组来构建您的查询并创建如下查询:

Select url from tbl_index where keyword in ('word1', 'word2', 'word3');

这里 word1, word2, word3 基本上是words[0], words[1], words[2]等。

PS:您可能不想在表中精确匹配关键字,在这种情况下,我建议rlike在 MySQL 查询中使用子句以获得正则表达式功能。

于 2012-06-01T21:04:06.430 回答