oracle - 与 Oracle 文本搜索最接近的匹配，包括短字符串

Question

我想在给定字符串的数据库列中找到最接近的匹配字符串。搜索后我想出了下表并查询

CREATE TABLE docs (id NUMBER PRIMARY KEY, text VARCHAR2(200));
INSERT INTO docs VALUES(1, 'California is a state in the US.');
INSERT INTO docs VALUES(2, 'Paris is a city in France.');
INSERT INTO docs VALUES(3, 'France is in Europe.');
INSERT INTO docs VALUES(4, 'Paris');

CREATE INDEX idx_docs ON docs(text)
     INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS
     ('DATASTORE CTXSYS.DEFAULT_DATASTORE');

SELECT SCORE(1), id, text 
  FROM docs 
 WHERE CONTAINS(text, 'fuzzy(Parsi,1,1)', 1) > 0;

我已将相似度分数设置为最小值，即 1。它适用于“Parsi”或“Parse”等搜索字符串。它给了我想要的结果。但如果搜索字符串太小，如“par”或“pa”，它不会显示任何结果。

即使使用非常短的字符串进行搜索，我应该怎么做才能获得最接近的匹配？

score 1 · Accepted Answer

You're basically hitting a limit in the fuzzy operator

Unlike stem expansion, the number of words generated by a fuzzy expansion depends on what is in the index. Results can vary significantly according to the contents of the index.

and oracle doesn't index shorter strings unless you change the default:

begin 
ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST'); 
ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE');
ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH', '3');
ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', '4');
ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
end;

In this case you might actually have to combine fuzzy and wildcard queries using query rewrite/relaxation. In my experience, wildcard expansion tends to significantly slow down everything, although maybe it's just a matter of the right index configuration.

oracle - 与 Oracle 文本搜索最接近的匹配，包括短字符串

1 回答 1

Related

Reference