2

我在优化我正在做的这个 Levenshtein 距离计算时遇到了麻烦。我需要执行以下操作:

  1. 获取源字符串的最小距离记录以及源字符串的修剪版本

  2. 选择距离最短的记录
  3. 如果最小距离相等(原始与修剪),则选择距离最小的修剪过的那个
  4. 如果仍有多条记录属于上述两类,则选择频率最高的一条

这是我的工作版本:

DECLARE @Results TABLE
(
    ID int,
    [Name] nvarchar(200), 
    Distance int, 
    Frequency int, 
    Trimmed bit
)


INSERT INTO @Results
    SELECT ID, 
           [Name], 
           (dbo.Levenshtein(@Source, [Name])) As Distance,
           Frequency, 
           'False' As Trimmed
    FROM
           MyTable

INSERT INTO @Results
    SELECT ID, 
           [Name], 
           (dbo.Levenshtein(@SourceTrimmed, [Name])) As Distance,
           Frequency, 
           'True' As Trimmed
    FROM
           MyTable

SET @ResultID = (SELECT TOP 1 ID FROM @Results ORDER BY Distance, Trimmed, Frequency)
SET @Result = (SELECT TOP 1 [Name] FROM @Results ORDER BY Distance, Trimmed, Frequency)
SET @ResultDist = (SELECT TOP 1 Distance FROM @Results ORDER BY Distance, Trimmed, Frequency)
SET @ResultTrimmed = (SELECT TOP 1 Trimmed FROM @Results ORDER BY Distance, Trimmed, Frequency)

我相信我需要在这里做的是......

  1. 不要把结果变成临时表
  2. 只从“MyTable”中选择 1 个
  3. 在初始选择语句的选择中设置结果。(因为 select 会设置变量,你可以在一个 select 语句中设置多个变量)

我知道必须有一个很好的实现,但我无法弄清楚......这是我所得到的:

SELECT top 1 @ResultID = ID, 
             @Result = [Name], 
            (dbo.Levenshtein(@Source, [Name])) As distOrig,
             (dbo.Levenshtein(@SourceTrimmed, [Name])) As distTrimmed,
             Frequency
FROM
    MyTable
WHERE /* ... yeah I'm lost */
ORDER BY distOrig, distTrimmed, Frequency 

有任何想法吗?

4

1 回答 1

0

我认为您的尝试与您所说的代码不同,因为工作代码首先按距离排序,无论是原始距离还是修剪距离。您的尝试首先按原始距离排序,然后进行修剪。

我不确定我完全理解您要做什么,但是以下是否满足您的需求?

SELECT TOP 1
    @ResultId = ID,
    @Result = [Name],
    @ResultDist = distOrig,
    @ResultTrimmed = distTrimmed
FROM (
    SELECT
        ID, [Name], 
        dbo.Levenshtein(@Source, [Name]) As distOrig,
        dbo.Levenshtein(@SourceTrimmed, [Name])) As distTrimmed,
        Frequency
    FROM MyTable
) AS T
ORDER BY
    CASE WHEN distOrig > distTrimmed THEN distOrig ELSE distTrimmed END, -- Distance
    CASE WHEN distOrig > distTrimmed THEN 1 ELSE 0 END,                  -- Trimmed
    Frequency                                                            -- Frequency
于 2010-05-21T23:34:42.387 回答