5

SQL 小提琴:http ://sqlfiddle.com/#!3/9b459/6

我有一张表格,其中包含对“你会参加这个活动吗?”这个问题的答案。每个用户可能会响应多次,所有答案都存储在表中。通常我们只对最新的答案感兴趣,我正在尝试为此构建一个有效的查询。我正在使用 SQL Server 2008 R2。

一个事件的表格内容:

表格内容

Column types: int, int, datetime, bit
Primary key: (EventId, MemberId, Timestamp)

请注意,成员 18 先回答“否”后回答“是”,成员 20 先回答“是”后回答“否”,成员 11 回答“否”后再次回答“否”。我想过滤掉这些成员的第一个答案。此外,应该过滤的答案可能不止一个 - 例如,用户可能会回答是、是、否、是、否、否、否。

我尝试了一些不同的想法,并在 SQL Server Management Studio 中通过输入所有查询、选择显示估计的执行计划并比较每个查询的总成本百分比来评估它们。这是评估性能的好方法吗?

到目前为止测试的不同查询:

-----------------------------------------------------------------
-- Subquery to select Answer (does not include Timestamp)
-- Cost: 63 %
-----------------------------------------------------------------
select distinct a.EventId, a.MemberId,
(
  select top 1 Answer
  from    Attendees
  where EventId   = a.EventId
  and   MemberId  = a.MemberId
  order by Timestamp desc
) as Answer
from    Attendees a
where a.EventId = 68

-----------------------------------------------------------------
-- Where with subquery to find max(Timestamp)
-- Cost: 13 %
-----------------------------------------------------------------
select a.EventId, a.MemberId, a.Timestamp, a.Answer
from     Attendees a
where  a.EventId = 68
and    a.Timestamp =
(
  select max(Timestamp)
  from     Attendees
  where  EventId  = a.EventId
  and    MemberId = a.MemberId
)
order by a.TimeStamp;

-----------------------------------------------------------------
-- Group by to find max(Timestamp)
-- Subquery to select Answer matching max(Timestamp)
-- Cost: 23 %
-----------------------------------------------------------------
select a.EventId, a.MemberId, max(a.Timestamp),
(
  select top 1 Answer
  from    Attendees
  where EventId   = a.EventId
  and   MemberId  = a.MemberId
  and   Timestamp = max(a.Timestamp)
) as Answer
from    Attendees a
where a.EventId = 68
group by a.EventId, a.MemberId
order by max(a.TimeStamp);

最好避免对每个成员使用子查询。在我尝试使用的最后一个查询中,group by但仍然必须对 Answer 列使用子查询。我真的很想要这样的东西,但这当然不是有效的 SQL:

select a.EventId, a.MemberId, max(a.Timestamp), a.Answer <-- Picked from the line selected by max(a.Timestamp)
from  Attendees a
where a.EventId = 68
group by a.EventId, a.MemberId
order by max(a.TimeStamp);

有效查询的任何其他想法?


编辑:

SQL Fiddle 给我留下了深刻的印象,我现在已经在那里输入了我的实际数据: http ://sqlfiddle.com/#!3/9b459/6

4

3 回答 3

7

SQL Server 2008 支持公用表表达式和窗口函数。

WITH recordsList
AS
(
    SELECT  EventID, MemberID, TimeStamp, Answer,
            ROW_NUMBER() OVER (PARTITION BY EventID, MemberID
                                ORDER BY Timestamp DESC) rn
    FROM    tableName
)
SELECT  EventID, MemberID, TimeStamp, Answer
FROM    recordsList
WHERE   rn = 1
于 2013-01-25T15:01:50.467 回答
3

我也更喜欢 CTE 方法,但这是另一个使用子查询的选项,它应该可以工作:

SELECT T.EventId, T.MemberId, T.TimeStamp, T.Answer
FROM TableName T
 JOIN (
   SELECT EventId, MemberId, Max(Timestamp) MaxTimeStamp
   FROM TableName
   GROUP BY EventId, MemberId ) T2 ON T.EventId = T2.EventId 
    AND T.MemberId = T2.MemberId 
    AND T.TimeStamp = T2.MaxTimeStamp

话虽如此,我想 CTE 会有更好的表现。

编辑——不再确定性能——这里是两者的SQL Fiddle——你可以看到每个的执行计划。

祝你好运。

于 2013-01-25T15:08:22.253 回答
3

另一种选择

SELECT a.EventId, a.MemberId, a.Timestamp, a.Answer
FROM Attendees a
WHERE a.EventId = 68 AND EXISTS (
              SELECT 1
              FROM Attendees
              WHERE EventId = a.EventId             
              GROUP BY MemberId
              HAVING  MAX(Timestamp) = a.Timestamp                      
                      AND MemberId  = a.MemberId
              )

SQLFiddle上的演示

于 2013-01-25T16:47:54.493 回答