sql - 带有开始日期和结束日期的 SQL 查询 - 最佳选择是什么？

Question

我在工作中使用 MS SQL Server 2005 来构建数据库。有人告诉我，大多数表在构建后将在不久的将来保存 1,000,000 到 500,000,000 行数据......我没有使用过这么大的数据集。大多数时候，我什至不知道我应该考虑什么来找出设置模式、查询等方法的最佳答案。

所以......我需要知道某事的开始和结束日期以及在该时间范围内与 ID 相关联的值。所以......我们可以用两种不同的方式来整理表格：

create table xxx_test2 (id int identity(1,1), groupid int, dt datetime, i int) 

create table xxx_test2 (id int identity(1,1), groupid int, start_dt datetime, end_dt datetime, i int)

哪个更好？如何更好地定义？我用大约 100,000 行数据填充了第一个表，根据查询设置第二个表的格式大约需要 10-12 秒...

    select  y.groupid,
            y.dt as [start], 
            z.dt as [end],   
            (case when z.dt is null then 1 else 0 end) as latest, 
            y.i 
    from    #x as y 
            outer apply (select top 1 * 
                            from    #x as x 
                            where   x.groupid = y.groupid and 
                                    x.dt > y.dt 
                            order by x.dt asc) as z

或
http://consultingblogs.emc.com/jamiethomson/archive/2005/01/10/t-sql-deriving-start-and-end-date-from-a-single-effective-date.aspx

Buuuuut ...与第二个表....要插入新行，我必须查看是否有前一行，如果有，则更新其结束日期。那么......在检索数据与插入/更新事物时，这是一个性能问题吗？将结束日期存储两次似乎很愚蠢，但也许......不是吗？我应该看什么？

这就是我用来生成我的假数据的东西......如果你出于某种原因想使用它（如果你将随机数的最大值更改为更高的值，它将更快地生成假数据）：

declare @dt datetime
declare @i int
declare @id int
set @id = 1
declare @rowcount int
set @rowcount = 0
declare @numrows int 

while (@rowcount<100000)
begin

set @i = 1
set @dt = getdate()
set @numrows = Cast(((5 + 1) - 1) * 
                Rand() + 1 As tinyint)

while @i<=@numrows
    begin
    insert into #x values (@id, dateadd(d,@i,@dt), @i)
    set @i = @i + 1
    end 

set @rowcount = @rowcount + @numrows
set @id = @id + 1
print @rowcount
end

score 3 · Accepted Answer

出于您的目的，我认为选项 2 是表格设计的方式。这为您提供了灵活性，并将为您节省大量工作。

拥有生效日期和结束日期将允许您通过在where子句中包含仅返回当前有效数据的查询：

where sysdate between effectivedate and enddate

然后，您还可以使用它以对时间敏感的方式与其他表连接。

如果您正确设置密钥并提供正确的索引，性能（至少在此表上）应该不是问题。

score 0 · Accepted Answer

对于可以使用SQL Server 2012（或 Oracle、DB2...）的LEAD Analytic 功能的任何人，从第一个表（仅使用 1 个日期列）中检索数据将比没有此功能快得多：

select
  groupid,
  dt "start",
  lead(dt) over (partition by groupid order by dt) "end",
  case when lead(dt) over (partition by groupid order by dt) is null
       then 1 else 0 end "latest",
  i
from x

sql - 带有开始日期和结束日期的 SQL 查询 - 最佳选择是什么？

2 回答 2

Related

Reference