0

I need to implement fuzzy search on database layer, but I am having some minor issues. Here is my SQL code for demonstration :

SELECT *
   FROM (SELECT *
   FROM TOOLS 
   WHERE UTL_MATCH.jaro_winkler_similarity(UPPER('sample tool'), UPPER(NAME)) > 80
   ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('sample tool', NAME) DESC)
   where ROWNUM <= 10;

I am selecting 10 tools that match best the criteria of jaro winkler and edit distance similarity utl functions. Struggle I am having is, that I am not getting exact matches first. For example when I type rich the best scored candidate is 'mich' and then are tools with name 'rich' for example 'rich 12', 'rich ax', ...

  1. Is it possible to get "exact matches" first with these utl functions or is there any function that fits my requirements better? Our fuzzy search should be focused more on skipping some characters rather than replacing them for another.
  2. Is it possible to not take into account word length with these functions? (for example when I type 'di' I want to get results as 'dinosaur', but the word doesn't match my score criteria only because its length and I am getting no results.
4

1 回答 1

1

首先对结果进行排名,获取排名最高的结果。像这样的东西(阅读代码中的注释):

SQL> with
  2  tools (name) as
  3    -- sample data
  4    (select 'mich' from dual union all
  5     select 'rich 12' from dual union all
  6     select 'rich ax' from dual
  7    ),
  8  temp as
  9    -- rank similirities first
 10    (select name,
 11         utl_match.jaro_winkler_similarity('&&par_tool', name) sim,
 12         --
 13         rank() over (order by
 14           utl_match.jaro_winkler_similarity('&&par_tool', name) desc) rnk
 15     from tools
 16    )
 17  -- finally, return the "top" similar values
 18  select name, sim, rnk
 19  from temp
 20  where rnk = 1;
Enter value for par_tool: rich

NAME              SIM        RNK
---------- ---------- ----------
rich 12            91          1
rich ax            91          1

SQL> undefine par_tool
SQL> /
Enter value for par_tool: mick

NAME              SIM        RNK
---------- ---------- ----------
mich               88          1

SQL>
于 2021-07-30T20:28:09.933 回答