4

全部,

我试图在不使用动态 sql 的情况下将插入从一个表拉到另一个表。但是,我目前提出的唯一解决方案是使用动态 sql。搜索任何类似的场景一直很棘手。

以下是详细信息:

我的出发点是以下遗留表:

CREATE TABLE [dbo].[_Combinations](
[AttributeID] [int] NULL,
[Value] [varchar](50) NULL
) ON [PRIMARY]
GO
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'1')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'2')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Red')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Orange')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Yellow')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Green')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Blue')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Indigo')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Violet')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'A')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'B')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'C')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'D')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'E')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'F')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'G')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'H')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'I')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'J')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'K')

SELECT * FROM _Combinations

_Combinations 表包含不同类型属性的键 (AttributeID) 和每个属性的可能值 (Value)。

在这种情况下,有 3 个不同的属性具有多个可能的值,但可以有更多(最多 10 个)。

然后要求创建每个值的每个可能组合并将其标准化存储,因为每个可能的组合都会存储其他数据。我需要存储构成每个组合的属性键和值,因此显示每个组合不仅仅是一个简单的交叉连接。存储每个属性组合的目标表是这样的:

CREATE TABLE [dbo].[_CombinedAttributes](
[GroupKey] [int] NULL,
[AttributeID] [int] NULL,
[Value] [varchar](50) NULL
) ON [PRIMARY]

因此使用上述数据的属性组合记录在目标表中将如下所示:

GroupKey    AttributeID Value
1               8         A
1               16        1
1               28        Red
2               8         B
2               16        1
2               28        Red

这给了我我需要的东西。每个组都有一个标识符,我可以跟踪组成每个组的属性 ID 和值。我正在使用两个脚本从 _Combinations 表中获取 _CombinedAttributes 表的格式:

-- SCRIPT #1
SELECT Identity(int) AS RowNumber, * INTO #Test
FROM (
SELECT AttributeID AS Attribute1, Value AS Value1 FROM _Combinations WHERE AttributeID = 8) C1
CROSS JOIN 
(
SELECT AttributeID AS Attribute2, Value AS Value2 FROM _Combinations WHERE AttributeID = 16) C2
CROSS JOIN
(
SELECT AttributeID AS Attribute3, Value AS Value3 FROM _Combinations WHERE AttributeID = 28) C3

-- SCRIPT #2

INSERT INTO _CombinedAttributes
SELECT RowNumber AS GroupKey, Attribute1, Value1 
FROM #Test
UNION ALL
SELECT RowNumber, Attribute2, Value2 
FROM #Test
UNION ALL
SELECT RowNumber, Attribute3, Value3
FROM #Test
ORDER BY RowNumber, Attribute1

上面两个脚本可以工作,但显然有一些缺点。即我需要知道我正在处理多少个属性并且有 ID 的硬编码,所以我不能即时生成它。我想出的解决方案是通过循环 _Combinations 表中的属性为脚本 1 和脚本 2 构建字符串,并生成冗长且混乱的执行字符串,但如果需要,我可以发布。任何人都可以在没有动态 sql 的情况下找到最终插入格式的方法吗?

这个例程不会运行太多,但它会运行得足够多,我不想做任何执行字符串构建并使用直接 SQL。

提前致谢。

更新:

当我使用第二个数据集时,Gordon 的代码不再返回正确的结果,它创建的组在末尾只有 1 个属性,但是在第二个数据集上,我使用 Nathan 的例程得到了正确的行数(最终结果的行数应该是 396) . 但正如我在评论中所说,如果我使用第一个数据集,我会得到相反的结果,戈登的返回正确,但内森的代码有重复。我不知所措。这是第二个数据集:

删除表 [dbo].[_Combinations] GO

创建表 [dbo].[_Combinations]( [AttributeID] [int] NULL, [Value] varchar NULL ) ON [PRIMARY] GO

INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'1')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'2')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'<=39')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'40-44')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'45-49')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'50-54')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'55-64')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'65+')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'AA')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'JJ')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'CC')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'DD')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'EE')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'KK')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'BB')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'FF')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'GG')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'HH')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'II')
4

5 回答 5

6

我认为这可以解决您的问题。

这是方法。首先,观察最终数据有每个属性个数的乘积——2*7*11 = 154行。然后观察每个值出现固定的次数。对于 AttributeId = 16,每个值出现 154 / 2,因为有两个值。

所以,这个想法是计算每个值出现的次数。然后,生成所有值的列表。最后的挑战是将组号分配给这些。为此,我使用row_number()按属性 id 进行分区。老实说,我不是 100% 认为分组分配是正确的(这很有意义,并且通过了眼球测试),但我担心我错过了一个微妙之处。

这是查询:

with attributecount1 as (
      select c.AttributeId, count(*) as cnt
      from _Combinations c
      group by c.AttributeId
     ),
     const as (
      select exp(sum(log(cnt))) as tot, count(*) as numattr
      from attributecount1
     ),
     attributecount as (
       select a.*,
              (tot / a.cnt) as numtimes
       from attributecount1 a cross join const
     ),
     thevalues as (
      select c.AttributeId, c.Value, ac.numtimes, 1 as seqnum
      from AttributeCount ac join
           _Combinations c
           on ac.AttributeId = c.AttributeId
      union all
      select v.AttributeId, v.Value, v.numtimes, v.seqnum + 1
      from thevalues v
      where v.seqnum + 1 <= v.numtimes
     )
select row_number() over (partition by AttributeId order by seqnum, Value) as groupnum,
      *
from thevalues
order by 1, 2

SQL Fiddle 在这里

编辑:

不幸的是,我今天无法访问 SQL Server,而 SQL Fiddle 正在发挥作用。

问题是可以解决的。上述解决方案有效,但是 - 正如我的评论中所述 - 只有当维度是成对互质时。问题是将组号分配给值。事实证明,这是数论中的一个问题。

本质上,我们想要枚举组合。如果两组中有 2 个,那么它将是:

group 0:  1    1
group 1:  1    2
group 2:  2    1
group 3:  2    2

您可以看到组号与分配的值之间的关系——基于组号的二进制表示。如果这是 2x3,那么它看起来像:

group 0:  1    1
group 1:  1    2
group 2:  1    3
group 3:  2    1
group 4:  2    2
group 5:  2    3

同样的想法,但现在没有“二进制”表示。数字中的每个位置都有不同的基数。没问题。

因此,挑战是将数字(例如组号)映射到每个数字。这需要适当的除法和模运算。

以下在 Postgres 中实现了这一点:

with c as (
      select 1 as attrid, '1' as val union all
      select 1 as attrid, '2' as val union all
      select 2 as attrid, 'A' as val union all
      select 2 as attrid, 'B' as val union all
      select 3 as attrid, '10' as val union all
      select 3 as attrid, '20' as val 
     ),
     c1 as (
       select c.*, dense_rank() over (order by attrid) as attrnum,
              dense_rank() over (partition by attrid order by val) as valnum,
              count(*) over (partition by attrid) as cnt
       from c
     ),
     a1 as (
       select attrid, count(*) as cnt,
              cast(round(exp(sum(ln(count(*))) over (order by attrid rows between unbounded preceding and current row))) as int)/count(*) as cum
       from c
       group by attrid
     ),
     a2 as (
       select a.*,
              (select cast(round(exp(sum(ln(cnt)))) as int)
               from a1
               where a1.attrid <= a.attrid
              ) / cnt as cum
       from a1 a
     ),
     const as (
       select cast(round(exp(sum(ln(cnt)))) as int) as numrows
       from a1
     ),
     nums as (
       select 1 as n union all select 2 union all select 3 union all select 4 union all
       select 5 union all select 6 union all select 7 union all select 8
       from const
     ),
     ac as (
      select c1.*, a1.cum, const.numrows
      from c1 join
           a1 on c1.attrid = a1.attrid cross join
           const
     )
select *
from nums join
     ac
     on (nums.n/cum) % cnt = valnum - 1
order by 1, 2;

(注意:由于某些原因,generate_series() 在某些连接中无法正常工作,这就是它手动生成数字序列的原因。)

当 SQL Fiddle 重新开始工作时,我应该能够将其转换回 SQL Server。

编辑二:

这是在 SQL Server 中工作的版本:

with attributecount1 as (
      select c.AttributeId, count(*) as cnt
      from _Combinations c
      group by c.AttributeId
     ),
     const as (
      select cast(round(exp(sum(log(cnt))), 1) as int) as tot, count(*) as numattr
      from attributecount1
     ),
     attributecount as (
       select a.*,
              (tot / a.cnt) as numtimes,
              (select cast(round(exp(sum(log(ac1.cnt))), 1) as int)
               from attributecount1 ac1
               where ac1.AttributeId <= a.AttributeId
              ) / a.cnt as cum
       from attributecount1 a cross join const
     ),
     c as (
       select c.*, ac.numtimes, ac.cum, ac.cnt,
              dense_rank() over (order by c.AttributeId) as attrnum,
              dense_rank() over (partition by c.AttributeId order by Value) as valnum
       from _Combinations c join
            AttributeCount ac
            on ac.AttributeId = c.AttributeId
     ),
     nums as (
       select 1 as n union all
       select 1 + n
       from nums cross join const
       where 1 + n <= const.tot
     )
select *
from nums join
     c
     on (nums.n / c.cum)%c.cnt = c.valnum - 1
option (MAXRECURSION 1000)

SQL Fiddle 在这里

于 2013-11-05T17:40:29.753 回答
1

几年前,我遇到了一个与您的不同的固定 EAV 架构的类似问题。Peter Larsson提出了以下解决方案来解决我的“动态组合”查询。

我已经对其进行了调整以适合您的架构。希望这可以帮助!

SqlFiddle 在这里

;with cteSource (Iteration, AttributeID, recID, Items, Unq, Perm) as 
(   
    select  v.Number + 1,
            s.AttributeId,
            row_number() over (order by v.Number, s.AttributeID) - 1,
            s.Items,
            u.Unq,
            f.Perm
    from    (select AttributeID, count(*) from  _Combinations group by AttributeID) s(AttributeId, Items)
    cross 
    join    (select count(distinct AttributeID) from _Combinations) u (Unq)
    join    master..spt_values as v on v.Type = 'P'
    outer 
    apply   (
                select  top(1) cast(exp(sum(log(count(*))) over ()) as bigint)
                from    _Combinations as w
                where   w.AttributeID >= s.AttributeID
                group 
                by      w.AttributeID
                having  count(*) > 1
            ) as f(Perm)
    where   v.Number < (select top(1) exp(sum(log(count(*))) over()) from _Combinations as x group by x.AttributeID)
)
select  s.Iteration,
        s.AttributeID,
        w.Value     
from    cteSource as s
cross 
apply   (
            select  Value,
                    row_number() over (order by Value) - 1
            from    _Combinations
            where   AttributeID = s.AttributeID
        ) w(Value, recID)
where   coalesce(s.recID / (s.Perm * s.Unq / s.Items), 0) % s.Items = w.recID
order 
by      s.Iteration, s.AttributeId;
于 2013-11-05T17:44:55.803 回答
1

我决定发布这个,只是为了与基于 CTE 的解决方案并行出现的程序解决方案。

下面生成一个从零开始的 GroupKey列。如果您希望它从 1 开始,只需更改@i@i+1最后一个insert ... select

-- Add a zero-based row number, partitioned by AttributeId
declare @Attrs table (AttributeId int,Value varchar(50),RowNum int)
insert into @Attrs
select 
  AttributeId,Value,
  ROW_NUMBER()over(partition by AttributeId order by AttributeId,Value)-1
from _Combinations

-- AttributeId value counts
declare @AttCount table (AttributeId int,n int)
insert into @AttCount
select AttributeId,COUNT(*) n from @Attrs
group by AttributeID

-- Total number of combos -- Multiply all AttributeId counts
-- EXP(SUM(LOG(n))) didnt work as expected
-- so fall back to good old cursors...
declare @ncombos int,@num int
declare mulc cursor for select n from @AttCount
open mulc
set @ncombos=1
fetch next from mulc into @num
while @@FETCH_STATUS=0
  begin
  set @ncombos=@ncombos*@num
  fetch next from mulc into @num
  end
close mulc
deallocate mulc

-- Now let's get our hands dirty...
declare @i int,@m int,@atid int,@n int,@r int
declare c cursor for select AttributeId,n from @AttCount
open c
fetch next from c into @atid,@n
set @m=1
while @@FETCH_STATUS=0
  begin
  set @i=0
  while @i<@ncombos
    begin
    set @r=(@i/@m)%@n
    insert into _CombinedAttributes (GroupKey,AttributeId,Value)
    select @i,@atid,value from @Attrs where AttributeId=@atid and RowNum=@r
    set @i=@i+1
    end
  set @m=@m*@n
  fetch next from c into @atid,@n
  end
close c
deallocate c

提示这就是我不用exp(sum(log()))来模拟mul()聚合的原因。

于 2013-11-06T12:12:52.007 回答
0

递归解决方案

以下是递归解决方案,SQLFiddle 在这里

with a as ( -- unique AttributeIDs
  select AttributeID
        ,Row_Number() over(order by AttributeID) as rowNo
        ,count(*) as cnt
    from [dbo].[_Combinations]
  group by AttributeID
),
r as (
  -- start recursion: list all values of the first attribute
  select Dense_Rank() over(order by c.[Value]) - 1 as GroupKey
        ,c.AttributeID
        ,c.[Value]
        ,a.cnt as factor
        ,1 as level
    from a
         join [dbo].[_Combinations] as c on a.AttributeID = c.AttributeID
   where a.rowNo = 1

  union all

  -- recursion step: add the combinations with the values of the next attribute
  select GroupKey
        ,case when AttributeID = 'prev' then prevAttribID else currAttribID end as AttributeID
        ,[Value]
        ,factor
        ,level
    from (select r.Value as prev
                ,c.Value as curr
                ,(Dense_Rank() over(order by c.[Value]) - 1) * r.factor + r.GroupKey as GroupKey
                ,r.level + 1 as level
                ,r.factor * a.cnt as factor
                ,r.AttributeID as prevAttribID
                ,a.AttributeID as currAttribID
            from r
                 join a on r.level + 1 = a.rowNo
                 join [dbo].[_Combinations] as c on a.AttributeID = c.AttributeID
         ) as p
         unpivot ( Value for AttributeID in (prev, curr)) as up
)
-- get result: this is the data from the deepest level
select distinct
       GroupKey + 1 as GroupKey -- start with one instead of zero
      ,AttributeID
      ,[Value]
  from r
 where level = (select count(*) from a)
order by GroupKey, AttributeID, [Value]

动态解决方案

这是使用动态语句的稍短版本:

declare @stmt varchar(max);
with a as ( -- unique attribute keys, cast here to avoid casting when building the dynamic statement
  select distinct cast(AttributeID as varchar(10)) as ID
    from [dbo].[_Combinations]
)
select @stmt = 'select GroupKey, Cast(SubString(AttributeIDStr, 2, 100) as int) as AttributeID, Value
  from
  (
  select '
  + (select ' C' + ID + '.Value as V' + ID + ', ' from a for xml path(''))
  + ' Row_Number() over(order by '
  + stuff((select ', C' + ID + '.Value' from a for xml path('')), 1, 2, '')
  + ') AS GroupKey from '
  + stuff((select ' cross join [dbo].[_Combinations] as C' + ID from a for xml path('')), 1, 11, '')
  + ' where ' 
  + stuff((select ' and C' + ID + '.AttributeID = ' + ID from a for xml path('')), 1, 4, '')
  + ')  as p unpivot (Value for AttributeIDStr in ('
  + stuff((select ', V' + ID from a for xml path('')), 1, 2, '')
  + ')) as up'
;
exec (@stmt)

由于 SQL Server 没有其他数据库具有的漂亮的列表聚合功能,因此必须使用丑陋的stuff((select ... for xml path('')))表达式。

为样本数据生成的语句 - 除了空格差异 - 如下:

select GroupKey, Cast(SubString(AttributeIDStr, 2, 100) as int) as AttributeID, Value
  from
  (
  select C16.Value as V16
        ,C28.Value as V28
        ,C8.Value  as V8
        ,Row_Number() over(order by C16.Value, C28.Value, C8.Value) AS GroupKey
    from [dbo].[_Combinations] as C16
         cross join
         [dbo].[_Combinations] as C28
         cross join
         [dbo].[_Combinations] as C8
   where C16.AttributeID = 16
     and C28.AttributeID = 28
     and C8.AttributeID = 8
  )  as p
  unpivot ( Value for AttributeIDStr in (V16, V28, V8)) as up

这两种解决方案都避免了使用在其他一些答案中使用的乘法聚合解决方法exp(log()),这对舍入错误非常敏感。

于 2014-01-31T17:05:21.680 回答
0

关于 的问题exp(sum(log(count(*))) over ()),对我来说,答案似乎是将 ROUND 功能引入混合中。因此,以下片段似乎产生了一个可靠的答案(至少到目前为止):

ROUND(exp(sum(log(count(*))) over ()), 0)
于 2014-08-08T17:35:56.963 回答