0

我尝试了一切,但我无法克服这个问题。

我有一个表值函数。

当我用

SELECT * FROM Ratings o1 
    CROSS APPLY dbo.FN_RatingSimilarity(50, 497664, 'Cosine') o2 
WHERE o1.trackId = 497664

执行需要一段时间。但是当我这样做时。

SELECT * FROM Ratings o1 
    CROSS APPLY dbo.FN_RatingSimilarity(50, o1.trackId, 'Cosine') o2 
WHERE o1.trackId = 497664

它在 32 秒内执行。我创建了所有索引,但没有帮助。

顺便说一句,我的功能:

ALTER FUNCTION [dbo].[FN_RatingSimilarity]
(   
    @trackId    INT,
    @nTrackId   INT,
    @measureType    VARCHAR(100)
)
RETURNS TABLE 
WITH SCHEMABINDING
AS
    RETURN
    (
        SELECT o2.id,
               o2.name,
               o2.releaseDate,
               o2.numberOfRatings,
               o2.averageRating,
               COUNT(1) as numberOfSharedUsers,
          CASE @measureType 
               WHEN 'Cosine' THEN SUM(o3.score*o4.score)/(0.01+SQRT(SUM(POWER(o3.score,2))) * SQRT(SUM(POWER(o4.score,2)))) 
               WHEN 'AdjustedCosine' THEN SUM((o3.score-o5.averageRating)*(o4.score-o5.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o5.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o5.averageRating, 2)))) 
               WHEN 'Pearson' THEN SUM((o3.score-o1.averageRating)*(o4.score-o2.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o1.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o2.averageRating, 2)))) 
           END as similarityRatio
          FROM dbo.Tracks o1
    INNER JOIN dbo.Tracks o2 ON o2.id != @trackId 
    INNER JOIN dbo.Ratings o3 ON o3.trackId = o1.id 
    INNER JOIN dbo.Ratings o4 ON o4.trackId = o2.id AND o4.userId = o3.userId
    INNER JOIN dbo.Users o5 ON o5.id = o4.userId 
         WHERE o1.id = @trackId 
             AND o2.id = ISNULL(@nTrackId, o2.id)
      GROUP BY o2.id, 
               o2.name, 
               o2.releaseDate,
               o2.numberOfRatings, 
               o2.averageRating
    )

任何帮助,将不胜感激。

谢谢。恩拉

4

1 回答 1

1

我相信你的瓶颈是计算+你非常昂贵的内部连接。

您加入的方式基本上是创建一个交叉连接 - 它返回一个结果集,其中所有记录都链接到所有其他记录,除了提供 id 的记录。然后,您可以使用其他内部联接将结果集添加到该结果集中。

对于每个内部连接,SQL 都会创建一个所有行都匹配的结果集。所以你在查询中做的第一件事就是告诉 SQL 基本上在同一个表上做一个交叉连接。(我假设你还在关注,这看起来很高级,所以我会带你熟悉高级 SQL 语法和运算符)

现在在下一个内部连接中,您将 Results 表应用到新创建的巨大结果集,然后才过滤掉那些不是两个表的表。

因此,首先,看看你是否不能以相反的方式加入。(这实际上取决于您的表记录数和记录大小)。尝试首先获得最小的结果集,然后加入该结果集。

您可能要尝试的第二件事是首先限制您的结果集,甚至在连接之前。所以从一个 CTE 开始,您可以在其中过滤 o1.id = @trackId。然后从这个 CTE 中选择 *,在 CTE 上进行连接,然后在查询中过滤 o2.id = ISNULL(@nTrackId, o2.id)

我将举一个例子,敬请期待......

-- 好的,我添加了一个例子,做了一个快速测试,返回的值是一样的。通过您的数据运行此操作,并让我们知道是否有任何改进。(注意,这并没有解决所讨论的 INNER JOIN 顺序点,仍然可以解决这个问题。)

例子:

ALTER FUNCTION [dbo].[FN_RatingSimilarity_NEW] 
(    
    @trackId    INT, 
    @nTrackId   INT, 
    @measureType    VARCHAR(100) 
) 
RETURNS TABLE  
WITH SCHEMABINDING 
AS 
    RETURN 
    ( 
        WITH CTE_ALL AS 
        (
            SELECT id, 
               name, 
               releaseDate, 
               numberOfRatings, 
               averageRating
            FROM dbo.Tracks
            WHERE  id = @trackId  
        )
        SELECT o2.id, 
               o2.name, 
               o2.releaseDate, 
               o2.numberOfRatings, 
               o2.averageRating, 
               COUNT(1) as numberOfSharedUsers, 
          CASE @measureType  
               WHEN 'Cosine' THEN SUM(o3.score*o4.score)/(0.01+SQRT(SUM(POWER(o3.score,2))) * SQRT(SUM(POWER(o4.score,2))))  
               WHEN 'AdjustedCosine' THEN SUM((o3.score-o5.averageRating)*(o4.score-o5.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o5.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o5.averageRating, 2))))  
               WHEN 'Pearson' THEN SUM((o3.score-o1.averageRating)*(o4.score-o2.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o1.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o2.averageRating, 2))))  
           END as similarityRatio 
          FROM CTE_ALL o1 
    INNER JOIN dbo.Tracks o2 ON o2.id != @trackId  
    INNER JOIN dbo.Ratings o3 ON o3.trackId = o1.id  
    INNER JOIN dbo.Ratings o4 ON o4.trackId = o2.id AND o4.userId = o3.userId 
    INNER JOIN dbo.Users o5 ON o5.id = o4.userId  
         WHERE o2.id = ISNULL(@nTrackId, o2.id) 
      GROUP BY o2.id,  
               o2.name,  
               o2.releaseDate, 
               o2.numberOfRatings,  
               o2.averageRating 
    ) 
于 2011-12-19T11:20:02.547 回答