0

我们有一个相当复杂的新产品评估流程。产品针对不同的区域和应用进行评估。在每个评估步骤之后,产品都会获得一个新的结果分数。这个分数(介于 0 到 10 之间)表示产品在流程中的进展程度。对于每个步骤,它保持不变或增加但从不减少,并且不均匀的数字标记未通过评估的产品。最高分称为产品状态。

我不想选择在 startDate (包括该状态)具有偶数状态(2、4、8、10)的所有产品以及时间范围的 endDate 状态。

(我还想选择在该时间范围内进入流程的所有新产品,但我认为这可以在第二个语句中轻松完成。)

我遇到的问题是如何在输出中包含两个初始状态。这是我的 SQL 语句:

SELECT 
  MyTable.product_id, 
  MyTable.REGION, 
  MyTable.SEGMENT,
  Max(MyTable.result) AS NEW_STATUS
  FROM 
     MyTable INNER JOIN (
     SELECT
      product_id, 
      REGION, 
      SEGMENT, 
      Max(result) AS INITIAL_STATUS
    FROM
      MyTable
    WHERE
      DATE <= to_date(:startDate)
    GROUP BY
      product_id, REGION, SEGMENT
    HAVING
      Max(result) IN(2,4,8,10)
   ) initial_status ON MyTable.product_id = initial_status.product_id    
  WHERE
    MyTable.DATE <= to_date(:endDate)
  GROUP BY
    MyTable.product_id, 
    MyTable.REGION, 
    MyTable.SEGMENT;

如何在不影响 max/group by 的情况下在输出中包含 initial_status?(是甲骨文,但我不是专家,所以也许一些甲骨文特定的东西可以提供帮助)

编辑:

数据是一对多的关系。1个产品,很多评价。每个评估都有一个区域、细分、结果和评估日期(加上这里不相关的其他数据)。这里非规范化一些示例数据:

product_id    Region    Segment    Result    date
    1           US         AB         2    20.05.2012
    1           EU         TS         4    13.06.2012
    1           US         AB         4    01.09.2012
  234           US         AB         2    09.09.2012

上述样本的预期输出,日期范围从 26.08.2012 到 21.09.2012:

product_id    Region    Segment    Initial_Status    New_Status
    1            US        AB             2              4
    1            EU        TS             4              4 (this did not change)
  234            US        AB           (null)           2 ( new entry)

我知道我当前的 SQL 无法做到这一点。特别是显示新的值。

4

2 回答 2

0

这听起来像您需要一个子查询中的分析函数和一个UNION集合操作。分析函数的好处是您只需要进行一次表扫描。

我现在想选择在 startDate 具有偶数状态 (2,4,8,10) 的所有产品

这将是:

select product_id, region, segment, initial_status, new_status
  from ( select product_id, region, segment, initial_status, date
                -- The maximum status over all time per product_id,
                -- region and segment
              , max(initial_status) over 
                   ( partition by product_id, region, segment ) as new_status
           from my_table
                )
       -- Restrict on where 
 where ( date <= to_date(:start_date, <format model>)
          -- If you only want even you can use mod
         and mod(initial_status, 2) = 0
             )
    or new_status = initial_status

然后,您可以获得所有新内容:

select product_id, region, segment, initial_status, new_status
  from ( select product_id, region, segment, initial_status
              , initial_status as new_status, date
                -- Minimum date this product_id, region, segment
                -- combination was entered
              , min(date) over 
                   ( partition by product_id, region, segment ) as min_date
                -- Find the most recent record for this combination
              , rank() over ( partition by product_id, region, segment
                                  order by date desc ) as rnk
           from my_table
                )
       -- By putting this condition in the outer-select
       -- you ensure you only get completely new records
 where min_date >= to_date(:startdate, <format_model>)
       -- If you have multiple records that were entered for a single pk
       -- between startdate and enddate you only want the most recent one.
   and rnk = 1

最后,您可以使用 UNION 将这些添加在一起。如果您可以保证不会有重叠,那么请改用 UNION ALL,因为这不会执行 DISTINCT 操作,因此会提高查询的性能。

select query1
 union
select query2

请注意如何将这些组合在一起成为一个查询,它看起来并不漂亮,但可能会更有效:

select product_id, region, segment, initial_status, new_status
  from ( select product_id, region, segment, initial_status
              , min(date) over 
                   ( partition by product_id, region, segment ) as min_date
              , rank() over ( partition by product_id, region, segment
                                  order by date desc ) as rnk
              , max(initial_status) over 
                   ( partition by product_id, region, segment ) as new_status
           from my_table
                )
 where ( min_date >= to_date(:startdate, <format_model>)
         and rnk = 1
             )
    or ( ( date <= to_date(:start_date, <format model>)
            and mod(initial_status, 2) = 0
                )
        or new_status = initial_status
           )
于 2012-09-24T12:33:17.710 回答
0

只是为了文档,我想出了以下查询。我知道它包含最初问题中未提出的某些要求。其中一些是由于处理错误的数据。

SELECT 
  product_id, 
  REGION, 
  SEGMENT,
  initial_status,
  NEW_STATUS,
  "Comment",
  Count("Comment")  OVER (PARTITION BY 
      "Comment"
    ) "Counter"
from(
SELECT DISTINCT
  myTable.product_id, 
  myTable.REGION, 
  myTable.SEGMENT,
  initial_status.initial_status,
  Max(myTable.result) 
    OVER (PARTITION BY 
      myTable.product_id, 
      myTable.REGION, 
      myTable.SEGMENT
    ) NEW_STATUS,
  CASE WHEN initial_status.initial_status <> Max(myTable.result) 
    OVER (PARTITION BY 
      myTable.product_id, 
      myTable.REGION, 
      myTable.SEGMENT
    ) THEN 'Changed' ELSE 'Same' END as "Comment"  
  FROM 
    myTable INNER JOIN (
     SELECT
      product_id, 
      REGION, 
      SEGMENT, 
      Max(result) AS INITIAL_STATUS
    FROM
      myTable
    WHERE
      DATE <= to_date(:startDate)
      OR DATE is null
    GROUP BY
      product_id, REGION, SEGMENT
    HAVING
      Max(result) IN(2,4,8,10)
   ) initial_status 
    ON 
      myTable.product_id = initial_status.product_id
      AND myTable.REGION = initial_status.REGION
      AND (
        myTable.SEGMENT = initial_status.SEGMENT
        OR (myTable.SEGMENT is null AND initial_status.SEGMENT is null)
      )
  WHERE
    myTable.DATE <= to_date(:endDate)
UNION ALL
SELECT 
  myTable.product_id, 
  myTable.REGION, 
  myTable.SEGMENT,
  null AS initial_status,
  Max(myTable.result) 
    OVER (PARTITION BY 
      myTable.product_id, 
      myTable.REGION, 
      myTable.SEGMENT
    ) NEW_STATUS,
'New' As "Comment"
FROM myTable
WHERE evaluation_date BETWEEN to_date(:startDate) + 1 AND to_date(:endDate)
AND stage <> 'Stage 0')
ORDER BY
    product_id ASC;
于 2012-09-26T07:10:46.987 回答