我在优化我正在做的这个 Levenshtein 距离计算时遇到了麻烦。我需要执行以下操作:
- 获取源字符串的最小距离记录以及源字符串的修剪版本
- 选择距离最短的记录
- 如果最小距离相等(原始与修剪),则选择距离最小的修剪过的那个
- 如果仍有多条记录属于上述两类,则选择频率最高的一条
这是我的工作版本:
DECLARE @Results TABLE
(
ID int,
[Name] nvarchar(200),
Distance int,
Frequency int,
Trimmed bit
)
INSERT INTO @Results
SELECT ID,
[Name],
(dbo.Levenshtein(@Source, [Name])) As Distance,
Frequency,
'False' As Trimmed
FROM
MyTable
INSERT INTO @Results
SELECT ID,
[Name],
(dbo.Levenshtein(@SourceTrimmed, [Name])) As Distance,
Frequency,
'True' As Trimmed
FROM
MyTable
SET @ResultID = (SELECT TOP 1 ID FROM @Results ORDER BY Distance, Trimmed, Frequency)
SET @Result = (SELECT TOP 1 [Name] FROM @Results ORDER BY Distance, Trimmed, Frequency)
SET @ResultDist = (SELECT TOP 1 Distance FROM @Results ORDER BY Distance, Trimmed, Frequency)
SET @ResultTrimmed = (SELECT TOP 1 Trimmed FROM @Results ORDER BY Distance, Trimmed, Frequency)
我相信我需要在这里做的是......
- 不要把结果变成临时表
- 只从“MyTable”中选择 1 个
- 在初始选择语句的选择中设置结果。(因为 select 会设置变量,你可以在一个 select 语句中设置多个变量)
我知道必须有一个很好的实现,但我无法弄清楚......这是我所得到的:
SELECT top 1 @ResultID = ID,
@Result = [Name],
(dbo.Levenshtein(@Source, [Name])) As distOrig,
(dbo.Levenshtein(@SourceTrimmed, [Name])) As distTrimmed,
Frequency
FROM
MyTable
WHERE /* ... yeah I'm lost */
ORDER BY distOrig, distTrimmed, Frequency
有任何想法吗?