9

我有一个看起来像这样的表:

表格摘录

    Owner   | Attribute | value
----------------------------------------------------
    10      | COLOR     | BLUE
    10      | COLOR     | RED
    10      | COLOR     | GREEN
    10      | SIZE      | BIG
    20      | COLOR     | GREEN
    20      | SIZE      | MEDIUM
    20      | MEMORY    | 16G
    20      | MEMORY    | 32G
    30      | COLOR     | RED
    30      | COLOR     | BLUE
    30      | MEMORY    | 64G

是否有一个 SQL 将计算所有属性与单个索引的组合(结果中的最后一列):

Owner   | Attribute | Value | Rule_No
10      | COLOR     | BLUE  | 1
10      | SIZE      | BIG   | 1
10      | COLOR     | RED   | 2
10      | SIZE      | BIG   | 2
10      | COLOR     | GREEN | 3
10      | SIZE      | BIG   | 3
20      | COLOR     | GREEN | 1
20      | SIZE      | MEDIUM| 1
20      | MEMORY    | 16G   | 1
20      | COLOR     | GREEN | 2
20      | SIZE      | MEDIUM| 2
20      | MEMORY    | 32G   | 2
30      | COLOR     | BLUE  | 1
30      | MEMORY    | 64G   | 1
30      | COLOR     | RED   | 2
30      | MEMORY    | 64G   | 2

每个所有者的规则编号都是唯一的(所有者“10”的规则“1”与所有者“20”的规则“1”无关。

我尝试使用 SQL 交叉连接,但属性的数量不固定,然后我不能使用它(每个属性需要一个交叉连接),我希望组合是新行而不是新列。

我正在尝试使用Talend Open Studio - Data Integration它,但仅使用 SQL 的解决方案对我来说会更好。

4

5 回答 5

6

您是否真的想要问题中给出的形式的数据(然后需要进一步汇总Rule_No才能在最可能的情况下有用),还是您最终寻求对它进行调整?也就是说,规则连接在一起(每个属性成为自己的列),如下所示:

+---------+-------+--------+--------+--------+
| 规则_否 | 所有者 | 颜色 | 尺码 | 内存 |
+---------+-------+--------+--------+--------+
| 1 | 10 | 蓝色 | 大 | 空 |
| 2 | 10 | 红色 | 大 | 空 |
| 3 | 10 | 绿色 | 大 | 空 |
| 1 | 20 | 绿色 | 中 | 16G |
| 2 | 20 | 绿色 | 中 | 32G |
| 1 | 30 | 红色 | 空 | 64G |
| 2 | 30 | 蓝色 | 空 | 64G |
+---------+-------+--------+--------+--------+

可以使用如下查询对此类数据进行透视:

SELECT   @t:=IF(Owner=@o,@t,0)+1 AS Rule_No,
         @o:=Owner AS Owner,
         `COLOR`,`SIZE`,`MEMORY`
FROM     (SELECT DISTINCT Owner, @t:=0 FROM my_table) t0

  LEFT JOIN (
    SELECT Owner, value AS `COLOR`
    FROM   my_table
    WHERE  Attribute='COLOR'
  ) AS `t_COLOR` USING (Owner)

  LEFT JOIN (
    SELECT Owner, value AS `SIZE`
    FROM   my_table
    WHERE  Attribute='SIZE'
  ) AS `t_SIZE` USING (Owner)

  LEFT JOIN (
    SELECT Owner, value AS `MEMORY`
    FROM   my_table
    WHERE  Attribute='MEMORY'
  ) AS `t_MEMORY` USING (Owner)

ORDER BY Owner, Rule_No

由于属性列表是动态的,因此可以使用查询来构造上述 SQL,从中准备并执行语句:

SELECT CONCAT('
         SELECT   @t:=IF(Owner=@o,@t,0)+1 AS Rule_No,
                  @o:=Owner AS Owner,
                  ', GROUP_CONCAT(DISTINCT CONCAT(
                    '`',REPLACE(Attribute,'`','``'),'`'
                  )), '
         FROM     (SELECT DISTINCT Owner, @t:=0 FROM my_table) t0
       ', GROUP_CONCAT(DISTINCT CONCAT('
           LEFT JOIN (
             SELECT Owner, value AS `',REPLACE(Attribute,'`','``'),'`
             FROM   my_table
             WHERE  Attribute=',QUOTE(Attribute),'
           ) AS `t_',REPLACE(Attribute,'`','``'),'` USING (Owner)
         ') SEPARATOR ''), '
         ORDER BY Owner, Rule_No
       ') INTO @sql
FROM   my_table;

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

sqlfiddle上查看。

于 2013-01-17T17:06:50.227 回答
2

好的,所以在我写任何东西之前首先:这个查询可以在一个 SQL 选择中完成,但我不会推荐它。它可能适用于这个小样本表,但它不会是大表的现实解决方案,并且可以通过使用存储过程以更好(更快,更清洁)的方式解决。

另外,我没有完全完成它,因为现在不是凌晨 2 点 10 分,而且我已经有几个小时的工作了 - 不去想这是一个太大的挑战,但剩下的部分只是复制粘贴 SQL 重写基于已经存在的查询。

我用pastebin上的示例数据发布了我的思考过程

基本流程是:

  1. 计算所有者的可能排列 (N)
  2. 构造一个 SQL 查询,它从 1..(N*number_of_attributes) 生成数字
  3. 对于每一行
    1. 根据 N 选择一个属性
    2. 根据 N 为属性选择一个值

该算法是任意数量的属性或值的通用解决方案。

于 2013-01-18T01:19:32.563 回答
0

这是 fthiella 对 SQL Server 的回答(非最终版):

If  Object_ID('tempdb..#test') Is Not Null Drop Table #test;

Select '10' As Owner,'COLOR' Attribute,'BLUE' Value Into #test
Union
Select '10','COLOR','RED'
Union
Select '10','COLOR','GREEN'
Union
Select '10','SIZE','BIG'
Union
Select '20','a','1'
Union
Select '20','a','2'
Union
Select '20','b','111'
Union
Select '20','b','222'
Union
Select '20','COLOR','GREEN'
Union
Select '20','SIZE','MEDIUM'
Union
Select '20','MEMORY','16G'
Union
Select '20','MEMORY','32G'
Union
Select '30','COLOR','RED'
Union
Select '30','COLOR','BLUE'
Union
Select '30','MEMORY','64G';



Select 
    Owner, Attribute, Value,
    RuleNo = Row_Number() Over (Partition By Owner, Attribute Order By Owner, Attribute)
From
    (Select Base.Owner, Base.Attribute, Base.Value
    From
        #Test As Base
        Inner Join
            (Select Owner, Attribute
             From #Test
             Group By Owner, Attribute
             Having Count(*) > 1) As MultipleValue
        On Base.Owner = MultipleValue.Owner
        And Base.Attribute = MultipleValue.Attribute
        Union All
        Select Sing.Owner, Sing.Attribute, Sing.Value
        From
            (Select Owner, Attribute, Value = Min(Value)
            From #Test
            Group by Owner, Attribute
            Having Count(*) = 1) As Sing
        Inner Join
            (Select Owner, Attribute
            From #Test
            Group by Owner, Attribute
            Having Count(*) > 1) As Mult
            On Sing.Owner = Mult.Owner
        Inner Join #Test As Comp
        On Mult.Owner = Comp.Owner And Mult.Attribute = Comp.Attribute) As Vals
Order By 
    Owner, RuleNo, Attribute, Value
于 2013-01-16T20:08:56.120 回答
0

我试了一下(并且花了太多时间在上面)。以为我有一个解决方案——它为给定的数据产生了预期的结果(不准确,但我相信可以接受)。不幸的是,当添加更多数据时,它并不能成立。

也许其他人可以在此基础上找到可行的解决方案。

SELECT DISTINCT a.`owner`, a.`attribute`, a.`value`, a.`index` * b.`index` AS `Rule_No`
FROM (
  SELECT `owner`, `attribute`, `value`,  
    IF(
      `owner` = @_owner AND `attribute` = @_attribute,
      @_row := @_row + 1,
      @_row := 1 AND (@_owner := `owner`) AND (@_attribute := `attribute`)
    ) + 1 AS `index`
  FROM `attributes`, (SELECT @_owner := '', @_attribute := '', @_row := 0) x
  ORDER BY `owner`, `attribute`
  ) a
INNER JOIN (
  SELECT `owner`, `attribute`, `value`,  
    IF(
      `owner` = @_owner AND `attribute` = @_attribute,
      @_row := @_row + 1,
      @_row := 1 AND (@_owner := `owner`) AND (@_attribute := `attribute`)
    ) + 1 AS `index`
  FROM `attributes`, (SELECT @_owner := '', @_attribute := '', @_row := 0) x
  ORDER BY `owner`, `attribute`
  ) b
ON a.`owner` = b.`owner` AND a.`attribute` <> b.`attribute`
ORDER BY `owner`, `Rule_No`, `attribute`, `value`

SQLFiddle - 工作

SQLFiddle - 损坏(添加了更多数据)

于 2013-01-17T20:22:36.307 回答
0

虽然这还远未完成,但这是我当时能做的最好的事情。也许它会给别人一个想法?具体来说,它以错误的顺序为该数据集获取正确的行数。

select a.owner, a.attribute, a.value
from test1 a
    join (
        select owner, attribute, count(distinct attribute, value) - 1 as total
        from test1
        group by owner, attribute
    ) b
        on a.owner = b.owner
            and a.attribute = b.attribute
    join (
        select owner, max(total) as total from (
            select owner, attribute, count(distinct attribute, value) as total
            from test1
            group by owner, attribute
        ) t group by owner
    ) c
        on a.owner = c.owner
    join (
        select @rownum:=@rownum+1 as num
        from test1,
            (select @rownum:=0 from dual) r
    ) temp
        on num <= c.total - b.total
order by a.owner asc
;
于 2013-01-17T20:24:03.830 回答