1

I have data for the purchases of a product formatted like this:

Item  |  Price  |  Quantity Bought

ABC      10.10     4
DEF      8.30      12
DEF      7.75      8
ABC      10.50     20
GHI      15.4      1
GHI      15.2      12
ABC      10.25     8
...      ...       ...

Where each row represents an individual purchasing a certain amount at a certain price. I would like to aggregate this data and eliminate the prices below the 30th percentile for total quantity bought from my table.

For example, in the above data set the total amount of product ABC bought was (4+20+8) = 32 units, with average price = (4*10.10 + 8*10.25 + 20*10.50)/32 = 10.39.

I would like to organize the above data set like this:

Item  |  VWP   |  Total Vol  |  70th %ile min  |  70th %ile max
ABC      10.39    32            ???               ???
DEF      ...      20            ???               ???
GHI      ...      13            ???               ???

Where VWP is the volume weighted price, and the 70th %ile min/max represent the minimum and maximum prices within the top 70% of volume.

In other words, I want to eliminate the prices with the lowest volumes until I have 70% of the total volume for the day contained in the remaining prices. I would then like to publish the min and max price for the ones that are left in the 70th %ile min/max columns.

I tried to be as clear as possible, but if this is tough to follow along with please let me know which parts need clarification.

Note: These are not the only columns contained in my dataset, and I will be selecting and calculating other values as well. I only included the columns that are relevant to this specific calculation.

EDIT:

Here is my code so far, and I need to incorporate my calculation into this (the variables with the '@' symbol before them are inputs that are given by the user:

SELECT Item, 
   SUM(quantity) AS Total_Vol, 
   DATEADD(day, -@DateOffset, CONVERT(date, GETDATE())) AS buyDate,
   MIN(Price) AS MinPrice,
   MAX(Price) AS MaxPrice,
   MAX(Price) - MIN(Price) AS PriceRange,
   ROUND(SUM(Price * quantity)/SUM(quantity), 6) AS VWP,

FROM TransactTracker..CustData
-- @DateOffset (Number of days data is offset by)
-- @StartTime (Time to start data in hours)
-- @EndTime (Time to stop data in hours)

WHERE DATEDIFF(day, TradeDateTime, GETDATE()) = (@DateOffset+1)
AND DATEPART(hh, TradeDateTime) >= @StartTime
AND HitTake = ''

OR DATEDIFF(day, TradeDateTime, GETDATE()) = @DateOffset
AND DATEPART(hh, TradeDateTime) < @EndTime
AND HitTake = ''

GROUP BY Item

EDIT 2:

FROM (SELECT p.*,
(SELECT SUM(quantity) from TransactTracker..CustData p2 
    where p2.Series = p.Series and p2.Size >= p.Size) as volCum
FROM TransactTracker..CustData p
) p

EDIT 3:

(case when CAST(qcum AS FLOAT) / SUM(quantity) <= 0.7 THEN MIN(Price) END) AS min70px,
(case when CAST(qcum AS FLOAT) / SUM(quantity) <= 0.7 THEN MAX(Price) END) AS max70px


FROM (select p.*,
  (select SUM(quantity) from TransactTracker..CustData p2 
  where p2.Item = p.Item and p2.quantity >= p.quantity) 
  as qcum from TransactTracker..CustData p) cd
4

1 回答 1

3

当某物超过阈值时,您如何定义 70% 存在一些歧义。然而,挑战是双重的。识别出累计比例后,查询还需要选择合适的行。这建议row_number()用于选择。

此解决方案使用 SQL Server 2012 语法计算累积和。然后它会根据该比率与 70% 的接近程度分配一个顺序值。

select item,
       SUM(price * quantity) / SUM(quantity) as vwp,
       SUM(quantity) as total_vol,
       min(case when seqnum = 1 then price end) as min70price,
       max(case when seqnum = 1 then price end) as max70price
from (select p.*,
             ROW_NUMBER() over (partition by item order by abs(0.7 - qcum/qtot) as seqnum
      from (select p.*,
                   SUM(quantity) over (partition by item order by vol desc) as qcum,
                   SUM(quantity) over (partition by item) as qtot
            from purchases p
           ) p
     ) p
group by item;

要获得小于 70% 的最大值,您可以使用:

max(case when qcum < qtot*0.7 then qcum end) over (partition by item) as lastqcum

然后case外部选择中的语句将是:

min(case when lastqcum = qcum then price end) . . 

在 SQL Server 的早期版本中,您可以通过关联子查询获得相同的效果:

select item,
       SUM(price * quantity) / SUM(quantity) as vwp,
       SUM(quantity) as total_vol,
       min(case when seqnum = 1 then price end) as min70price,
       max(case when seqnum = 1 then price end) as max70price
from (select p.*,
             ROW_NUMBER() over (partition by item order by abs(0.7 - qcum/qtot) as seqnum
      from (select p.*,
                   (select SUM(quantity) from purchases p2 where p2.item = p.item and p2.quantity >= p.quantity
                   ) as qsum,
                   SUM(quantity) over (partition by item) as qtot
            from purchases p
           ) p
     ) p
group by item

这是您的代码示例:

SELECT Item, 
   SUM(quantity) AS Total_Vol, 
   DATEADD(day, -@DateOffset, CONVERT(date, GETDATE())) AS buyDate,
   MIN(Price) AS MinPrice,
   MAX(Price) AS MaxPrice,
   MAX(Price) - MIN(Price) AS PriceRange,
   ROUND(SUM(Price * quantity)/SUM(quantity), 6) AS VWP,
   min(case when seqnum = 1 then price end) as min70price,
   max(case when seqnum = 1 then price end) as max70price
from (select p.*,
             ROW_NUMBER() over (partition by item order by abs(0.7 - qcum/qtot) as seqnum
      from (select p.*,
                   (select SUM(quantity) from TransactTracker..CustData p2 where p2.item = p.item and p2.quantity >= p.quantity
                   ) as qsum,
                   SUM(quantity) over (partition by item) as qtot
            from purchases TransactTracker..CustData
           ) p
     ) cd
-- @DateOffset (Number of days data is offset by)
-- @StartTime (Time to start data in hours)
-- @EndTime (Time to stop data in hours)

WHERE DATEDIFF(day, TradeDateTime, GETDATE()) = (@DateOffset+1)
AND DATEPART(hh, TradeDateTime) >= @StartTime
AND HitTake = ''

OR DATEDIFF(day, TradeDateTime, GETDATE()) = @DateOffset
AND DATEPART(hh, TradeDateTime) < @EndTime
AND HitTake = ''

GROUP BY Item
于 2013-05-15T13:42:48.023 回答