4

例子

有一个应用程序可以测量世界每个城镇的温度。每 5 分钟进行一次测量并写入测量表。

CREATE TABLE [dbo].[Measurement](
    [MeasurementID] [int] IDENTITY(1,1) NOT NULL,
    [Town] [varchar](50) NOT NULL,
    [Date] [datetime] NOT NULL,
    [Temp] [int] NOT NULL,
CONSTRAINT [PK_Measurement] PRIMARY KEY CLUSTERED 
(
    [MeasurementID] ASC
)) ON [PRIMARY]

问题

获取城镇列表及其当前温度的最有效查询是什么?

假设有 10 万个城镇和 1000 万条记录

注意:我添加了几个可能的答案,但可能还有其他选项。

4

6 回答 6

4

这里有一对应该起作用的:

SELECT
m1.Town, m1.Temp
FROM
Measurement AS m1
LEFT JOIN
Measurement AS m2
ON
m1.Town = m2.Town
AND m1.Date < m2.Date
WHERE
m2.MeasurementID IS NULL
ORDER BY m1.Town


你需要一个关于城镇和日期的索引。

这种技术对于 MySQL 的早期版本特别有用,它无法处理更明显的问题

SELECT Town, Temp
FROM Measurement AS m1
WHERE NOT EXISTS (
SELECT 1 From Measurement
WHERE Town = m1.Town
AND Date > m1.date
)按
城镇排序

于 2008-11-17T19:16:46.540 回答
1

Good to see so many ways to skin this cat. Here's one using a CTE (you can also nest the query for more ANSI-ism, but I find CTEs great to avoid a lot of indenting and declaring things up front makes it pretty readable up top and down below):

WITH LastMeasurements AS (
    SELECT [Town], MAX([Date]) AS LastMeasurementDate
    FROM [Measurement]
    GROUP BY [Town]
)
SELECT [Measurement].Town, [Measurement].[Date], [Measurement].Temp
FROM [Measurement]
INNER JOIN LastMeasurements
    ON [Measurement].[Town] = LastMeasurements.[Town]
    AND [Measurement].[Date] = LastMeasurements.LastMeasurementDate

What I like about the explicit seeking back technique is that it easily gives you access to all the information in the top row selected for the group and is very flexible in changing the grouping and low on repeating yourself.

The optimizer tends to perform these pretty quickly on SQL Server - like most solutions, if you have an index on Town, Date, Temp this will be covering and will run super fast. Even if it's just on Town, Date, the bulk of the work in the GROUP BY can be done super fast anyway.

于 2008-11-17T22:38:25.290 回答
1
select *
from
(
    select distinct *, --Keyword,Total,CreatedOn,EngineInstanceID,
    Rank() over (PARTITION by Town order by Date DESC) as DateOrder
    from Measurement
    where Town is not null
) CurrentMeasurement
where DateOrder = 1
于 2008-11-17T19:09:35.540 回答
0
select s.*
from Measurement s
where exists ( 
   select 1
   from Measurement s1
   where s.Town = s1.Town
   group by s1.Town
   having max( s1.Date )= s.Date)
   order by s.Town
于 2008-11-17T19:08:52.343 回答
0
select m.town, m.temperature, m.date
from Measurement m
where m.date = (select max(m2.date) from Measurement m2 where m2.town = m.town)
order by 1
于 2008-11-17T19:26:40.933 回答
0

你可能有一张带有不同城镇列表的表格吗?假设每个城镇有大约 1000 个测量值,窗口函数解决方案(例如 row_number()、rank() 等)的性能可能不如普通聚合或此 APPLY 版本:

SELECT
   M.*
FROM
   Towns T
   OUTER APPLY (
      SELECT TOP 1 * -- add 'WITH TIES' to the 'TOP 1' if you have/want ties on date.
      FROM Measurement M
      WHERE T.Town = M.Town
      ORDER BY M.Date DESC
   ) M

如果没有城镇列表,你可以试试这个,虽然我不知道它会如何与普通的香草聚合 + 查找相叠加:

SELECT
   M.*
FROM
   (SELECT DISTINCT Town FROM Towns) T
   OUTER APPLY (
      SELECT TOP 1 *
      FROM Measurement M
      WHERE T.Town = M.Town
      ORDER BY M.Date DESC
   ) M

这些查询的性能绝对取决于索引。您至少需要 [Town] 上的一个,而 [Town, Date] 最好。如果其他表使用MeasurementID,但您很少使用MeasurementID 访问Measurement 表,则删除聚集索引,使MeasurementID 成为非聚集PK,并在Town、Date 上添加(非唯一)聚集索引。如果您没有使用 MeasurementID 的其他表,则完全删除该列 - 在这种情况下,它是无用的合成/人工键,无缘无故地使您的表膨胀。

这些建议的索引更改将有助于此处使用聚合或应用的答案中的所有查询。不确定它们对窗口函数的影响,这取决于优化器如何制定执行计划(如果它足够聪明地意识到它只需要访问最大日期而不触及所有其他行,那么相同的索引将提升它难以置信,虽然我怀疑优化器可以做到这一点)。

另外,为了提高性能,我肯定会建议使用 Town 表,使用 TownID 而不是将整个城镇放在适当的位置。如果城市名称改变了怎么办?从每个名称的平均 15 个字节左右切换到 int TownID 的仅 4 个字节将有助于提高速度。(虽然测试是为了肯定地证明这一点)。

于 2012-02-12T00:25:36.827 回答