2

My goal is to select an average of exactly 5 records only if they meet the left join criteria to another table. Let's say we have table one (left) with records:

RECNUM   ID    DATE         JOB
1      | cat | 2019.01.01 | meow
2      | dog | 2019.01.01 | bark

And we have table two (right) with records:

RECNUM   ID    Action_ID    DATE         REWARD
1      | cat | 1          | 2019.01.02 | 20
2      | cat | 99         | 2018.12.30 | 1
3      | cat | 23         | 2019.12.28 | 20       
4      | cat | 54         | 2018.01.01 | 20
5      | cat | 32         | 2018.01.02 | 20
6      | cat | 21         | 2018.01.03 | 20
7      | cat | 43         | 2018.12.28 | 1
8      | cat | 65         | 2018.12.29 | 1
9      | cat | 87         | 2018.09.12 | 1
10     | cat | 98         | 2018.10.11 | 1 
11     | dog | 56         | 2018.09.01 | 99 
12     | dog | 42         | 2019.09.02 | 99 

A result should return:

ID  | AVG(Reward_from_latest_5_jobs)
cat | 1

The criteria met should be: For each JOB from left table, try to find 5 latest but older unique Action_ID(s) for the same ID in the right table and calculate average for them. So in other words, dog has barked, we do not know what reward to give him and we try to count the average of the latest five rewards he got. If less than 5 found, do not return anything/put null, if more, discard the oldest ones.

The way I wanted to do it is like:

         SELECT a."ID", COUNT(b."Action_ID"), AVG(b."REWARD")  
         FROM 
             ( 
                SELECT "ID", "DATE"
                 FROM :left_table
             ) a  

              LEFT JOIN

             ( 
                SELECT "ID", "Action_ID", "DATE", "REWARD"
                 FROM :right_table
             ) b 

             ON(
                    a."ID" = b."ID" 
               )    
         WHERE a."DATE" > b."DATE" 
         GROUP BY a."ID"
         HAVING COUNT(b."Action_ID") >= 5;

But then it would calculate for all the Action_ID(s) that match the criteria and not only the five latest ones. Could you please tell how to achieve expected results? I can use sub-tables and it does not have to be done in one SQL statement. Procedures are not allowed for this use case. Any input highly appreciated.

4

3 回答 3

1

You could use window functions, then aggregation:

select 
    id,
    avg(reward) avg_reward
from (
    select 
        t1.id, 
        t2.reward, 
        count(*) over(partition by t1.id) cnt,
        rank() over(partition by t1.id order by t2.date desc) rn
    from leftable t1
    inner join righttable t2 on t1.id = t2.id and t2.date >= t1.date
) t
where cnt >= 5 and rn <= 5
group by id

The inner query joins the table according to your requirement, does a window count of the total available records for each id and ranks the record of each id by descending date.

Then the outer query filters on ids that have at least 5 records, and computes the average of the top 5 records for each id.

于 2019-11-18T20:07:55.623 回答
1

Use window functions to get the top 5:

select id, avg(reward)
from (select r.*,
             row_number() over (partition by l.id order by r.date desc) as seqnum
      from table1 l join
           table2 r
           on l.id = r.id and l.date > r.date
     ) r
where seqnum <= 5
group by id
having count(*) >= 5;

Then a having clause to filter out those ids that don't have five rows.

于 2019-11-18T20:08:07.607 回答
1

Here is how to do it with a join (if there are more joins you want to do, just repeat this method for every join

  SELECT ONE.ID, 
         CASE WHEN MAX(J1.RN) < 5 THEN NULL ELSE AVG(J1.REWARD) END AS REWARD_AVG
         -- we could also use count
       --CASE WHEN COUNT(*) = 5 THEN AVG(J1.REWARD) ELSE NULL END AS REWARD_AVG
  FROM TABLE_ONE ONE
  JOIN (
    SELECT
      ID,
      REWARD,
      ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATE DESC) AS RN
    FROM TABLE_TWO
    WHERE TABLE_TWO.DATE < ONE.DATE
  ) AS J1 ON J1.ID = ONE.ID and RN <= 5 -- take first five only
  GROUP BY ONE.ID
于 2019-11-18T20:13:19.733 回答