2

All,

I have a table that looks like this:

Date     Pitcher        WHIP
-------- -------------- -----
7/4/12   JACKSON, E     1.129
7/4/12   YOUNG, C       1.400
7/4/12   CORREIA, K     1.301
7/4/12   WOLF, R        1.594
...
6/28/12  JACKSON, E     1.137
6/27/12  YOUNG, C       1.750
...
6/19/12  JACKSON, E     1.215
6/17/12  YOUNG, C       1.851

I've set up a SQLFiddle here: http://sqlfiddle.com/#!2/addfe/1

In other words, the table lists the starting pitcher for every game of the MLB season, along with that pitcher's current WHIP (WHIP is a measure of the pitcher's performance).

What I'd like to obtain from my query is this: how much has that pitcher's WHIP changed in the last 30 days?

Or, more precisely, how much has that pitcher's WHIP changed since his most recent start that was at least 30 days ago?

So, for example, if E. Jackson's WHIP on 7/4/12 was 1.129, and his WHIP on 6/3/12 was 1.500, then I'd like to know that his WHIP changed by -0.371.

This is easy to figure out for any individual, but I want to calculate that for all pitchers, on all dates.

One of the things that makes this tricky is that there isn't data for every date. For example, if E. Jackson pitched on 7/4/12, the most recent start that's at least 30 days ago might be on 5/28/2012.

However, for K. Correia, who also pitched on 7/4/12 - his most recent start that's at least 30 days ago might be 5/26/2012.

I'm assuming that I need to join the table to itself, but I'm not sure how to do it.

Here's my first stab:

select
    t1.home_pitcher,
    t1.date,
    t1.All_starts_whip,
    t2.All_starts_whip
from
    mlb_data t1
join
    mlb_data t2
ON
    t1.home_pitcher = t2.home_pitcher
and
    t2.date = (select max(date) from mlb_data t3 where t3.home_pitcher = t1.home_pitcher and t3.date < date_sub(t1.date, interval 1 month))

This seems to work (and hopefully illustrates what I'm trying to capture), but takes HORRENDOUSLY long - my table goes back a few seasons, and has about 6,250 rows - and this query took 7,289 seconds (yes, that's correct - more than 2 hours). I'm sure this is a classic case of the absolute worst way to right a query.

[UPDATE] Some clarification...

The query should produce a value for EACH pitcher for EACH start.

In other words, if E. Jackson pitched in 10 games, he'd be listed in the result set 10 times.

Date     Pitcher        WHIP  WHIP_30d_ago
-------- -------------- ----- ------------
7/4/12   JACKSON, E     1.129 1.111
...
5/18/12  JACKSON, E     1.111 2.222
...
4/14/12  JACKSON, E     2.222 3.333

In other words, I'm looking for a 30-day trailing WHIP for each start.

Many thanks in advance!

4

1 回答 1

2

我不认为你需要一个自我加入..你可以使用这样的子查询:

select
        t1.home_pitcher,
        t1.date,
        t1.All_starts_whip,
       (SELECT t2.all_starts_whip FROM mlb_data t2 
        WHERE 
        t2.date < date_sub(t1.date, interval 1 month) 
        AND t2.home_pitcher=t1.home_pitcher 
        ORDER   BY t2.date DESC LIMIT 1) as previous_whip,
        t1.all_starts_whip - previous_whip 

    FROM 
        mlb_data t1

因此,对于每个玩家的每个鞭子得分,您可以获得上个月的最新得分并计算进化。

检查出来:http ://sqlfiddle.com/#!2/addfe/8 (有些条目没有上个月的条目来计算差异,所以它是空的)

于 2012-07-18T17:45:14.650 回答