5

I have a table in Oracle with four columns. Table Data in Oracle.

Now user can enter input string as "operation Knee right" (which is valid) to my query and my query should return the ICD Code (IKR123) which matches most of the word in DiagnosisName column.

Following is my current query.(Not giving the proper output)

SELECT diagnosisname
FROM
  (SELECT diagnosisname,
    UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname)
  FROM icd_code
  ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC
  )
WHERE ROWNUM<2;

This query giving me the output as "Left Knee Operation" but my expectation is "Right Knee Operation".

4

2 回答 2

8

关于UTL_MATCH的使用,有几点需要注意:

  • EDIT_DISTANCE_SIMILARITY:返回 0 到 100 之间的整数,其中 0 表示完全不相似,100 表示完全匹配。
  • JARO_WINKLER_SIMILARITY:返回 0 到 100 之间的整数,其中 0 表示完全没有相似性,100 表示完全匹配,但会尝试考虑可能的数据输入错误。

ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC

这不会给你正确的结果。因为,您只考虑可能的相似性,而没有考虑数据输入错误。因此,您必须使用JARO_WINKLER_SIMILARITY

右膝手术

您需要记住输入的CASE和要比较的列值。它们必须在相似的情况下才能正确匹配。您在LOWERCASE中传递输入,但是,您的列值在INITCAP中。更好地将列值和输入转换为类似的情况。

让我们看下面的演示来理解:

SQL> WITH DATA AS(
  2  SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
  3  SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
  4  SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
  5  SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
  6  SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
  7  )
  8  SELECT t.*,
  9    utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
 10    UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
 11  FROM DATA t
 12  ORDER BY jws DESC
 13  /

DIAGNOSIS_NAME       ICD_CO        EDS        JWS
-------------------- ------ ---------- ----------
Right Knee Operation IKR123         20         72
Knee Operation       IK123          20         70
Heart Operation      IH123          25         68
Left Knee Operation  IKL123         25         64
Fever                IF123          15         47

SQL>

因此,您会看到两者之间有何不同。jaro_winkler_similarity在识别数据输入错误和给出最接近的匹配方面做得更好。基于此,只需在降序排序后选择第一行:

SQL> WITH DATA AS(
  2  SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
  3  SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
  4  SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
  5  SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
  6  SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
  7  )
  8  SELECT diagnosis_name
  9  FROM
 10    (SELECT t.*,
 11      utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
 12      UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
 13    FROM DATA t
 14    ORDER BY jws DESC
 15    )
 16  WHERE rownum = 1
 17  /

DIAGNOSIS_NAME
--------------------
Right Knee Operation

SQL>
于 2015-04-29T06:34:43.013 回答
0

请尝试此查询。这可能有助于解决您的问题。

SELECT diagnosisname 
   FROM (SELECT diagnosisname, UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname) 
   FROM icd_code 
   WHERE UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname) = 100
   ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC) 
WHERE ROWNUM<2
于 2015-04-29T06:06:47.893 回答