0

I'm creating a 'smart' search engine that will look into database by relevancy. My system calculate how many words in your sentence correspond to the database field 'tag_clean' that contains text, and try to get the proper result (one per research).

For example you get 'search youpla boom' in a tag_clean field, and another entry like 'search youpla bim' if you tape 'search bim' it will show the second entry.

My system set one point per word and get the most relevant as result. Everything works but my big problem is, it ignores completely the words order !

If you have 'google image test' and 'google test' and you search 'google test image' with my system, the most relevant will be the first one, but it's the second one that's right.

I'd like a system that understand the importance of word orders, but i've no idea how to do it in SQL.

A sample of my SQL request (important part is CASE WHEN at the end):

SELECT * 
FROM keywords 
WHERE tag_clean LIKE 'google%' 
    AND (static = 0) 
    AND 
    (
        tag_clean LIKE '%google%' 
        OR tag_clean LIKE '%test%' 
        OR tag_clean LIKE '%image%'
    )
    OR 
    (
        tag_clean = 'google test image' 
        AND static = 1
    ) 
ORDER BY 
    ((CASE WHEN tag_clean LIKE '%google%' THEN 1 ELSE 0 END) 
        + (CASE WHEN tag_clean LIKE '%test%' THEN 1 ELSE 0 END) 
        + (CASE WHEN tag_clean LIKE '%image%' THEN 1 ELSE 0 END)) 
DESC LIMIT 0, 1;

Thank you people :)

4

1 回答 1

1

首先,我不确定原始 SQL 是不是最好的工具。您应该查看您使用的任何引擎的全文功能。搜索文本是一个相当解决的问题,并且数据库支持此功能(通过对基础语言的扩展)。

假设你想继续,问题是你的结构。您可以开始为标签清除添加额外的子句,例如 '%google test%' 和所有其他双向组合。这可能是一个快速而肮脏的解决方案。

您真正的问题是您将关系数据存储在单个字段中。应该有一个关键字表,每个文档上的每个关键字都有一个单独的行。这将具有以下列:documentID、KeyWord 和 KeyWordPosition。使用 KeyWordPosition,您可以开始进行所需的邻近搜索。

但是,您最好研究现有软件中的全文功能。

于 2012-09-14T15:51:58.400 回答