
我试图在不使用动态 sql 的情况下将插入从一个表拉到另一个表。但是,我目前提出的唯一解决方案是使用动态 sql。搜索任何类似的场景一直很棘手。



CREATE TABLE [dbo].[_Combinations](
[AttributeID] [int] NULL,
[Value] [varchar](50) NULL
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'1')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'2')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Red')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Orange')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Yellow')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Green')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Blue')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Indigo')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Violet')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'A')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'B')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'C')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'D')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'E')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'F')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'G')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'H')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'I')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'J')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'K')

SELECT * FROM _Combinations

_Combinations 表包含不同类型属性的键 (AttributeID) 和每个属性的可能值 (Value)。

在这种情况下,有 3 个不同的属性具有多个可能的值,但可以有更多(最多 10 个)。


CREATE TABLE [dbo].[_CombinedAttributes](
[GroupKey] [int] NULL,
[AttributeID] [int] NULL,
[Value] [varchar](50) NULL


GroupKey    AttributeID Value
1               8         A
1               16        1
1               28        Red
2               8         B
2               16        1
2               28        Red

这给了我我需要的东西。每个组都有一个标识符,我可以跟踪组成每个组的属性 ID 和值。我正在使用两个脚本从 _Combinations 表中获取 _CombinedAttributes 表的格式:

-- SCRIPT #1
SELECT Identity(int) AS RowNumber, * INTO #Test
SELECT AttributeID AS Attribute1, Value AS Value1 FROM _Combinations WHERE AttributeID = 8) C1
SELECT AttributeID AS Attribute2, Value AS Value2 FROM _Combinations WHERE AttributeID = 16) C2
SELECT AttributeID AS Attribute3, Value AS Value3 FROM _Combinations WHERE AttributeID = 28) C3

-- SCRIPT #2

INSERT INTO _CombinedAttributes
SELECT RowNumber AS GroupKey, Attribute1, Value1 
FROM #Test
SELECT RowNumber, Attribute2, Value2 
FROM #Test
SELECT RowNumber, Attribute3, Value3
FROM #Test
ORDER BY RowNumber, Attribute1

上面两个脚本可以工作,但显然有一些缺点。即我需要知道我正在处理多少个属性并且有 ID 的硬编码,所以我不能即时生成它。我想出的解决方案是通过循环 _Combinations 表中的属性为脚本 1 和脚本 2 构建字符串,并生成冗长且混乱的执行字符串,但如果需要,我可以发布。任何人都可以在没有动态 sql 的情况下找到最终插入格式的方法吗?

这个例程不会运行太多,但它会运行得足够多,我不想做任何执行字符串构建并使用直接 SQL。



当我使用第二个数据集时,Gordon 的代码不再返回正确的结果,它创建的组在末尾只有 1 个属性,但是在第二个数据集上,我使用 Nathan 的例程得到了正确的行数(最终结果的行数应该是 396) . 但正如我在评论中所说,如果我使用第一个数据集,我会得到相反的结果,戈登的返回正确,但内森的代码有重复。我不知所措。这是第二个数据集:

删除表 [dbo].[_Combinations] GO

创建表 [dbo].[_Combinations]( [AttributeID] [int] NULL, [Value] varchar NULL ) ON [PRIMARY] GO

INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'1')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'2')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'<=39')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'40-44')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'45-49')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'50-54')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'55-64')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'65+')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'AA')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'JJ')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'CC')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'DD')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'EE')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'KK')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'BB')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'FF')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'GG')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'HH')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'II')

这是方法。首先,观察最终数据有每个属性个数的乘积——2*7*11 = 154行。然后观察每个值出现固定的次数。对于 AttributeId = 16,每个值出现 154 / 2,因为有两个值。

所以,这个想法是计算每个值出现的次数。然后,生成所有值的列表。最后的挑战是将组号分配给这些。为此,我使用row_number()按属性 id 进行分区。老实说,我不是 100% 认为分组分配是正确的(这很有意义,并且通过了眼球测试),但我担心我错过了一个微妙之处。


with attributecount1 as (
      select c.AttributeId, count(*) as cnt
      from _Combinations c
      group by c.AttributeId
     const as (
      select exp(sum(log(cnt))) as tot, count(*) as numattr
      from attributecount1
     attributecount as (
       select a.*,
              (tot / a.cnt) as numtimes
       from attributecount1 a cross join const
     thevalues as (
      select c.AttributeId, c.Value, ac.numtimes, 1 as seqnum
      from AttributeCount ac join
           _Combinations c
           on ac.AttributeId = c.AttributeId
      union all
      select v.AttributeId, v.Value, v.numtimes, v.seqnum + 1
      from thevalues v
      where v.seqnum + 1 <= v.numtimes
select row_number() over (partition by AttributeId order by seqnum, Value) as groupnum,
from thevalues
order by 1, 2

问题是可以解决的。上述解决方案有效,但是 - 正如我的评论中所述 - 只有当维度是成对互质时。问题是将组号分配给值。事实证明,这是数论中的一个问题。

本质上,我们想要枚举组合。如果两组中有 2 个,那么它将是:

group 0:  1    1
group 1:  1    2
group 2:  2    1
group 3:  2    2

您可以看到组号与分配的值之间的关系——基于组号的二进制表示。如果这是 2x3,那么它看起来像:

group 0:  1    1
group 1:  1    2
group 2:  1    3
group 3:  2    1
group 4:  2    2
group 5:  2    3



以下在 Postgres 中实现了这一点:

with c as (
      select 1 as attrid, '1' as val union all
      select 1 as attrid, '2' as val union all
      select 2 as attrid, 'A' as val union all
      select 2 as attrid, 'B' as val union all
      select 3 as attrid, '10' as val union all
      select 3 as attrid, '20' as val 
     c1 as (
       select c.*, dense_rank() over (order by attrid) as attrnum,
              dense_rank() over (partition by attrid order by val) as valnum,
              count(*) over (partition by attrid) as cnt
       from c
     a1 as (
       select attrid, count(*) as cnt,
              cast(round(exp(sum(ln(count(*))) over (order by attrid rows between unbounded preceding and current row))) as int)/count(*) as cum
       from c
       group by attrid
     a2 as (
       select a.*,
              (select cast(round(exp(sum(ln(cnt)))) as int)
               from a1
               where a1.attrid <= a.attrid
              ) / cnt as cum
       from a1 a
     const as (
       select cast(round(exp(sum(ln(cnt)))) as int) as numrows
       from a1
     nums as (
       select 1 as n union all select 2 union all select 3 union all select 4 union all
       select 5 union all select 6 union all select 7 union all select 8
       from const
     ac as (
      select c1.*, a1.cum, const.numrows
      from c1 join
           a1 on c1.attrid = a1.attrid cross join
select *
from nums join
     on (nums.n/cum) % cnt = valnum - 1
order by 1, 2;

(注意:由于某些原因,generate_series() 在某些连接中无法正常工作,这就是它手动生成数字序列的原因。)

当 SQL Fiddle 重新开始工作时,我应该能够将其转换回 SQL Server。


这是在 SQL Server 中工作的版本:

with attributecount1 as (
      select c.AttributeId, count(*) as cnt
      from _Combinations c
      group by c.AttributeId
     const as (
      select cast(round(exp(sum(log(cnt))), 1) as int) as tot, count(*) as numattr
      from attributecount1
     attributecount as (
       select a.*,
              (tot / a.cnt) as numtimes,
              (select cast(round(exp(sum(log(ac1.cnt))), 1) as int)
               from attributecount1 ac1
               where ac1.AttributeId <= a.AttributeId
              ) / a.cnt as cum
       from attributecount1 a cross join const
     c as (
       select c.*, ac.numtimes, ac.cum, ac.cnt,
              dense_rank() over (order by c.AttributeId) as attrnum,
              dense_rank() over (partition by c.AttributeId order by Value) as valnum
       from _Combinations c join
            AttributeCount ac
            on ac.AttributeId = c.AttributeId
     nums as (
       select 1 as n union all
       select 1 + n
       from nums cross join const
       where 1 + n <= const.tot
select *
from nums join
     on (nums.n / c.cum)%c.cnt = c.valnum - 1
option (MAXRECURSION 1000)

几年前,我遇到了一个与您的不同的固定 EAV 架构的类似问题。Peter Larsson提出了以下解决方案来解决我的“动态组合”查询。


;with cteSource (Iteration, AttributeID, recID, Items, Unq, Perm) as 
    select  v.Number + 1,
            row_number() over (order by v.Number, s.AttributeID) - 1,
    from    (select AttributeID, count(*) from  _Combinations group by AttributeID) s(AttributeId, Items)
    join    (select count(distinct AttributeID) from _Combinations) u (Unq)
    join    master..spt_values as v on v.Type = 'P'
    apply   (
                select  top(1) cast(exp(sum(log(count(*))) over ()) as bigint)
                from    _Combinations as w
                where   w.AttributeID >= s.AttributeID
                by      w.AttributeID
                having  count(*) > 1
            ) as f(Perm)
    where   v.Number < (select top(1) exp(sum(log(count(*))) over()) from _Combinations as x group by x.AttributeID)
select  s.Iteration,
from    cteSource as s
apply   (
            select  Value,
                    row_number() over (order by Value) - 1
            from    _Combinations
            where   AttributeID = s.AttributeID
        ) w(Value, recID)
where   coalesce(s.recID / (s.Perm * s.Unq / s.Items), 0) % s.Items = w.recID
by      s.Iteration, s.AttributeId;
我决定发布这个,只是为了与基于 CTE 的解决方案并行出现的程序解决方案。

下面生成一个从零开始的 GroupKey列。如果您希望它从 1 开始,只需更改@i@i+1最后一个insert ... select

-- Add a zero-based row number, partitioned by AttributeId
declare @Attrs table (AttributeId int,Value varchar(50),RowNum int)
insert into @Attrs
  ROW_NUMBER()over(partition by AttributeId order by AttributeId,Value)-1
from _Combinations

-- AttributeId value counts
declare @AttCount table (AttributeId int,n int)
insert into @AttCount
select AttributeId,COUNT(*) n from @Attrs
group by AttributeID

-- Total number of combos -- Multiply all AttributeId counts
-- EXP(SUM(LOG(n))) didnt work as expected
-- so fall back to good old cursors...
declare @ncombos int,@num int
declare mulc cursor for select n from @AttCount
open mulc
set @ncombos=1
fetch next from mulc into @num
while @@FETCH_STATUS=0
  set @ncombos=@ncombos*@num
  fetch next from mulc into @num
close mulc
deallocate mulc

-- Now let's get our hands dirty...
declare @i int,@m int,@atid int,@n int,@r int
declare c cursor for select AttributeId,n from @AttCount
open c
fetch next from c into @atid,@n
set @m=1
while @@FETCH_STATUS=0
  set @i=0
  while @i<@ncombos
    set @r=(@i/@m)%@n
    insert into _CombinedAttributes (GroupKey,AttributeId,Value)
    select @i,@atid,value from @Attrs where AttributeId=@atid and RowNum=@r
    set @i=@i+1
  set @m=@m*@n
  fetch next from c into @atid,@n
close c
deallocate c


以下是递归解决方案,SQLFiddle 在这里

with a as ( -- unique AttributeIDs
  select AttributeID
        ,Row_Number() over(order by AttributeID) as rowNo
        ,count(*) as cnt
    from [dbo].[_Combinations]
  group by AttributeID
r as (
  -- start recursion: list all values of the first attribute
  select Dense_Rank() over(order by c.[Value]) - 1 as GroupKey
        ,a.cnt as factor
        ,1 as level
    from a
         join [dbo].[_Combinations] as c on a.AttributeID = c.AttributeID
   where a.rowNo = 1

  union all

  -- recursion step: add the combinations with the values of the next attribute
  select GroupKey
        ,case when AttributeID = 'prev' then prevAttribID else currAttribID end as AttributeID
    from (select r.Value as prev
                ,c.Value as curr
                ,(Dense_Rank() over(order by c.[Value]) - 1) * r.factor + r.GroupKey as GroupKey
                ,r.level + 1 as level
                ,r.factor * a.cnt as factor
                ,r.AttributeID as prevAttribID
                ,a.AttributeID as currAttribID
            from r
                 join a on r.level + 1 = a.rowNo
                 join [dbo].[_Combinations] as c on a.AttributeID = c.AttributeID
         ) as p
         unpivot ( Value for AttributeID in (prev, curr)) as up
-- get result: this is the data from the deepest level
select distinct
       GroupKey + 1 as GroupKey -- start with one instead of zero
  from r
 where level = (select count(*) from a)
order by GroupKey, AttributeID, [Value]



declare @stmt varchar(max);
with a as ( -- unique attribute keys, cast here to avoid casting when building the dynamic statement
  select distinct cast(AttributeID as varchar(10)) as ID
    from [dbo].[_Combinations]
select @stmt = 'select GroupKey, Cast(SubString(AttributeIDStr, 2, 100) as int) as AttributeID, Value
  select '
  + (select ' C' + ID + '.Value as V' + ID + ', ' from a for xml path(''))
  + ' Row_Number() over(order by '
  + stuff((select ', C' + ID + '.Value' from a for xml path('')), 1, 2, '')
  + ') AS GroupKey from '
  + stuff((select ' cross join [dbo].[_Combinations] as C' + ID from a for xml path('')), 1, 11, '')
  + ' where ' 
  + stuff((select ' and C' + ID + '.AttributeID = ' + ID from a for xml path('')), 1, 4, '')
  + ')  as p unpivot (Value for AttributeIDStr in ('
  + stuff((select ', V' + ID from a for xml path('')), 1, 2, '')
  + ')) as up'
exec (@stmt)

由于 SQL Server 没有其他数据库具有的漂亮的列表聚合功能,因此必须使用丑陋的stuff((select ... for xml path('')))表达式。

为样本数据生成的语句 - 除了空格差异 - 如下:

select GroupKey, Cast(SubString(AttributeIDStr, 2, 100) as int) as AttributeID, Value
  select C16.Value as V16
        ,C28.Value as V28
        ,C8.Value  as V8
        ,Row_Number() over(order by C16.Value, C28.Value, C8.Value) AS GroupKey
    from [dbo].[_Combinations] as C16
         cross join
         [dbo].[_Combinations] as C28
         cross join
         [dbo].[_Combinations] as C8
   where C16.AttributeID = 16
     and C28.AttributeID = 28
     and C8.AttributeID = 8
  )  as p
  unpivot ( Value for AttributeIDStr in (V16, V28, V8)) as up


关于 的问题exp(sum(log(count(*))) over ()),对我来说,答案似乎是将 ROUND 功能引入混合中。因此,以下片段似乎产生了一个可靠的答案(至少到目前为止):

ROUND(exp(sum(log(count(*))) over ()), 0)
