10

我试图弄清楚加权术语在 SQL SERVER 中的 ISABOUT 查询中是如何工作的。

这是我目前所在的位置:

每个查询都返回以下行:

QUERY 1(权重1): 初始排名

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (1) ) ') ORDER BY RANK DESC, [KEY]

KEY     RANK
306342  249
272619  156
221557  114

QUERY 2(权重 0.8): 排名增加,保留初始顺序

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.8) ) ') ORDER BY RANK DESC, [KEY]

 KEY     RANK
 306342  321
 272619  201
 221557  146

QUERY 3(权重 0.2): 排名增加,保留初始顺序

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.2) ) ') ORDER BY RANK DESC, [KEY]

 KEY    RANK
 306342 998
 272619 877
 221557 692

QUERY 4(权重 0.17): 排名下降,最佳匹配现在排在最后,这些术语的反向行为从 0.17 开始

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.17) ) ') ORDER BY RANK DESC, [KEY]

 KEY      RANK
 272619   960
 221557   958
 306342   802

QUERY 5(权重 0.16): 排名上升,最佳匹配现在排名第二

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.17) ) ') ORDER BY RANK DESC, [KEY]

 KEY      RANK
 272619   978
 306342   935
 221557   841

QUERY 6(权重 0.01): 排名下降,最佳匹配再次排在最后

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.01) ) ') ORDER BY RANK DESC, [KEY]

 KEY    RANK
 221557 105
 272619 77
 306342 50

权重 1 的最佳匹配排名为 249,当权重下降到 0.2 时,最佳匹配排名增加到 998。从 0.2 到 0.17 排名下降,从 0.16 开始,结果倒置(再现此行为的权重值取决于术语,可能在搜索的列上...

似乎有一点重量意味着相反,例如“不包括这个词”。你对这种行为有什么解释吗?为什么体重减少时排名会增加?为什么在某个点之后排名会下降,直到结果反转,您如何预测这一点?

当用户搜索创建以下查询的内容时,我使用自定义“断字器”:

CONTAINSTABLE(documentParts, title, 
      'ISABOUT (
          "wordA wordB wordC" weight (0.8), 
          "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6), 
          "wordA*" weight (0.1), 
          "wordB*" weight (0.1), 
          "wordC*" weight (0.1), 
       ) ')

我会期待 0.1 字的大排名吗?
以下查询是否与上述相同,我是否期望 0.1 排名出现一些奇怪的行为?

CONTAINSTABLE(documentParts, title, '
      ISABOUT ( "wordA wordB wordC" weight (0.8) ), 
      OR ISABOUT ( "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6) ), 
      OR ISABOUT ( "wordA*" weight (0.1) ), 
      OR ISABOUT ( "wordB*" weight (0.1) ), 
      OR ISABOUT ( "wordC*" weight (0.1) ), 
      ')
4

2 回答 2

7

根据我的经验,当权重加起来为 1 时,我得到了最好的结果。

CONTAINSTABLE(documentParts, content, 
          'ISABOUT (
              "wordA wordB wordC" weight (0.5), 
              "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.2), 
              "wordA*" weight (0.1), 
              "wordB*" weight (0.1), 
              "wordC*" weight (0.1) 
           ) ')
于 2013-10-17T10:01:40.197 回答
3

由于时钟在滴答作响,我最终得到了这样的结果,它取得了很好的结果......:

SELECT [KEY], SUM([RANK]) AS [RANK] FROM (
    SELECT [KEY], ([RANK]*1)/(SUM([RANK]) OVER( PARTITION BY 1)/ CAST(COUNT([RANK]) OVER( PARTITION BY 1) AS FLOAT)) AS [RANK] 
        FROM CONTAINSTABLE(documentParts, content, 
              'ISABOUT (
                  "wordA wordB wordC" weight (0.8), 
                  "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6), 
                  "wordA*" weight (0.4), 
                  "wordB*" weight (0.4), 
                  "wordC*" weight (0.4) 
               ) ') c
        WHERE c.RANK>0
        UNION ALL      
        SELECT [KEY], ([RANK]*2)/(SUM([RANK]) OVER( PARTITION BY 1)/ CAST(COUNT([RANK]) OVER( PARTITION BY 1) AS FLOAT)) AS [RANK] 
        FROM CONTAINSTABLE(documents, title, 
              'ISABOUT (
                  "wordA wordB wordC" weight (0.8), 
                  "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6), 
                  "wordA*" weight (0.4), 
                  "wordB*" weight (0.4), 
                  "wordC*" weight (0.4) 
               ) ') c
         WHERE c.RANK>0
    ) t 
    GROUP BY [KEY]
ORDER BY [RANK] DESC

我会把它传递给测试团队,然后收工......

于 2012-12-04T15:50:58.093 回答