1

我正在努力研究如何在 U-SQL 中制作“多行”公式。我已经按日期对数据进行了排序,并且对于每个 for,我想找到不等于当前行值的“端口”的第一个值。以类似的方式,我想用当前港口值找到日期值的最后一行,以计算出一艘船在港口停留了多少天。请记住,这必须是具有相同端口名称的行,中间没有新的/其他端口。

我正在像这样加载我的数据:

@res = SELECT
        Port,
        Date
        FROM @data;

这就是我的日期的结构:

Port      |   Date       |
Port A    |   1/1/2017   |
Port A    |   1/1/2017   |
Port A    |   1/2/2017   |
Port B    |   1/4/2017   |
Port B    |   1/4/2017   |
Port B    |   1/4/2017   |
Port B    |   1/5/2017   |
Port B    |   1/6/2017   |
Port C    |   1/9/2017   |
Port C    |   1/10/2017  |
Port C    |   1/11/2017  |
Port A    |   1/14/2017  |
Port A    |   1/15/2017  |

我希望如何构建数据:

Port      |   Date       |  Time in Port   | Previous Port
Port A    |   1/1/2017   |      0          |   N/A
Port A    |   1/1/2017   |      0          |   N/A
Port A    |   1/2/2017   |      1          |   N/A
Port B    |   1/4/2017   |      0          |   Port  A
Port B    |   1/4/2017   |      0          |   Port  A
Port B    |   1/4/2017   |      0          |   Port  A
Port B    |   1/5/2017   |      1          |   Port  A
Port B    |   1/6/2017   |      2          |   Port  A
Port C    |   1/9/2017   |      0          |   Port  B
Port C    |   1/10/2017  |      1          |   Port  B
Port C    |   1/11/2017  |      2          |   Port  B
Port A    |   1/14/2017  |      0          |   Port  C
Port A    |   1/15/2017  |      1          |   Port  C

我是 U-SQL 的新手,所以我在如何处理这个问题上遇到了一些麻烦。我的第一直觉是使用 LEAD()/LAG() 和 ROW_NUMBER() OVER(PARTITION BY xx ORDER BY Date) 的某种组合,但我不确定如何获得我正在寻找的确切效果。

谁能指出我正确的方向?

4

1 回答 1

1

您可以使用所谓的RankingAnalytic函数(如LAG和子句)来做您需要的事情DENSE_RANKOVER尽管它并不完全简单。这个简单的装备适用于您的测试数据,我建议使用更大、更复杂的数据集进行彻底测试。

// Test data
@input = SELECT *
     FROM (
        VALUES
        ( "Port A", DateTime.Parse("1/1/2017", new CultureInfo("en-US") ), 0 ),
        ( "Port A", DateTime.Parse("1/1/2017", new CultureInfo("en-US") ), 0 ),
        ( "Port A", DateTime.Parse("1/2/2017", new CultureInfo("en-US") ), 1 ),
        ( "Port B", DateTime.Parse("1/4/2017", new CultureInfo("en-US") ), 0 ),
        ( "Port B", DateTime.Parse("1/4/2017", new CultureInfo("en-US") ), 0 ),
        ( "Port B", DateTime.Parse("1/4/2017", new CultureInfo("en-US") ), 0 ),
        ( "Port B", DateTime.Parse("1/5/2017", new CultureInfo("en-US") ), 1 ),
        ( "Port B", DateTime.Parse("1/6/2017", new CultureInfo("en-US") ), 2 ),
        ( "Port C", DateTime.Parse("1/9/2017", new CultureInfo("en-US") ), 0 ),
        ( "Port C", DateTime.Parse("1/10/2017", new CultureInfo("en-US") ), 1 ),
        ( "Port C", DateTime.Parse("1/11/2017", new CultureInfo("en-US") ), 2 ),
        ( "Port A", DateTime.Parse("1/14/2017", new CultureInfo("en-US") ), 0 ),
        ( "Port A", DateTime.Parse("1/15/2017", new CultureInfo("en-US") ), 1 )
     ) AS x ( Port, Date, timeInPort );



// Add a group id to the dataset
@working =
    SELECT Port,
           Date,
           timeInPort,
           DENSE_RANK() OVER(ORDER BY Date) - DENSE_RANK() OVER(PARTITION BY Port ORDER BY Date) AS groupId

    FROM @input;


// Use the group id to work out the datediff with previous row
@working =
    SELECT Port,
           Date,
           timeInPort,
           groupId,
           Date.Date.Subtract((DateTime)(LAG(Date) OVER(PARTITION BY groupId ORDER BY Date) ?? Date)).TotalDays AS diff    // datediff

    FROM @working;


// Work out the previous port, based on group id
@ports =
    SELECT Port, groupId
    FROM @working
    GROUP BY Port, groupId;

@ports =
    SELECT Port, groupId, LAG(Port) OVER( ORDER BY groupId ) AS previousPort
    FROM @ports;


// Prep the final output
@output =
    SELECT w.Port,
           w.Date.ToString("M/d/yyyy") AS Date,
           SUM(w.diff) OVER( PARTITION BY w.groupId ORDER BY w.Date ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) AS timeInPort,
           p.previousPort
    FROM @working AS w
         INNER JOIN
             @ports AS p
         ON w.Port == p.Port
            AND w.groupId == p.groupId;


OUTPUT @output TO "/output/output.csv"
ORDER BY Date, Port       
USING Outputters.Csv(quoting:false);

我的结果:

结果

于 2017-12-08T18:17:22.887 回答