2

我有一个大表,其中包含 6000000 条记录,例如这种格式(Acc,sDate,Serial,Amount,...) Acc,date,serial 是 PKey。

为了显示我的问题,创建了小代码

public class Cheque 
{
    public string Account{ get; set; }
    public string Serial{ get; set; }   
    public string StartDate { get; set; }
    // ... public string Amount { get; set; }    ...
}

var list = new List<Cheque>();
list.Add(new Cheque() { Account= "1", Serial = "1", StartDate = "20080120"});
list.Add(new Cheque() { Account= "1", Serial= "2", StartDate = "20080120" });
list.Add(new Cheque() { Account= "1", Serial= "3", StartDate = "20080120" }); 
list.Add(new Cheque() { Account= "1", Serial= "4", StartDate = "20080120" }); 
// each acc have 100 to 300 record per date ,for simplicity 3 obj added

list.Add(new Cheque() { Account= "1", Serial= "1", StartDate = "20110120" });
list.Add(new Cheque() { Account= "1", Serial= "2", StartDate = "20110120" });

list.Add(new Cheque() { Account= "1", Serial= "1", StartDate = "20120120" });
list.Add(new Cheque() { Account= "1", Serial= "2", StartDate = "20120120" });
list.Add(new Cheque() { Account= "1", Serial= "3", StartDate = "20120120" });

list.Add(new Cheque() { Account= "2", Serial= "1", StartDate = "20100417" });
list.Add(new Cheque() { Account= "2", Serial= "2", StartDate = "20100417" });

list.Add(new Cheque() { Account= "2", Serial= "1", StartDate = "20120314" });

list.Add(new Cheque() { Account= "2", Serial= "1", StartDate = "20070301" });
list.Add(new Cheque() { Account= "2", Serial= "1", StartDate = "20070301" });
list.Add(new Cheque() { Account= "2", Serial= "1", StartDate = "20070301" });

预期的列表仅包含与每个帐户最近的日期

累积序列日期

"1", "1", "20120120"   //first resultSet with Account= 1 
"1", "2", "20120120" 
"1", "3", "20120120"
"1", "1", "20110120"  //second resultset with Account=  1 
"1", "2", "20110120" 
"2", "1", "20120314"  //first resultSet with Account= 2 
"2", "1", "20100417" //second resultset with Account=  2 
"2", "2", "20100417" 

请帮助我如何用 linq 查询这个如何按(或不同)分组并采取第一组,像这样

4

4 回答 4

2

诀窍是按Account Serial分组。取前两个日期,然后通过 SelectMany 再次展平列表:

list.GroupBy(x => new {x.Account, x.Serial})
.Select(g => new { FirstTwo = g
                   .GroupBy(x => x.StartDate).Select(x => x.FirstOrDefault())
                   .OrderByDescending(x => x.StartDate).Take(2)
                 })
.SelectMany(g => g.FirstTwo)
.OrderBy(x => x.Account)
    .ThenByDescending(x => x.StartDate)
        .ThenBy(x => x.Serial)

结果:

1   1   20120120
1   2   20120120
1   3   20120120
1   1   20110120
1   2   20110120
1   3   20110120
2   1   20120314
2   2   20120314
2   1   20100417
2   2   20100417
于 2013-09-15T13:08:08.500 回答
0

搜索并阅读 stackoverflow 后,使用此代码生成所需的结果。

    var groupedList = (from t in list
                       group t by new { t.Account, t.StartDate } into g
                       select new
                       {                              
                          g.Key.Account,
                          g.Key.StartDate
                        });

    var filteredList = groupedList.GroupBy(x => x.Account)
            .SelectMany(g => (from t in g orderby t.StartDate descending select t)
                     .Take(2).ToList() );

    var Result = (from c in list
                  join k in filteredList on
                  new { c.StartDate, c.Account } equals
                  new { k.StartDate, k.Account } //into j
                  select c).ToList();
        /*  or lambda method chain
        var groupedList = list.GroupBy(t => new {t.StartDate, t.Account})
            .Select(g => new { g.Key.StartDate,g.Key.Account})
            .GroupBy(x => x.Account)
            .SelectMany(g => (from t in g orderby t.StartDate descending select t)
                        .Take(2).ToList() );

          var result = (list.Join(inner: groupedList, 
            outerKeySelector: c => new {c.StartDate, c.Account}, 
            innerKeySelector: k => new {k.StartDate, k.Account},
            resultSelector: (c, k) => c))
            .OrderByDescending(e =>e.StartDate).OrderBy(e =>e.Account).ToList(); 

        */

      Console.WriteLine(Result);  

非常感谢 LINQPAD(linq 的最佳工具)和 stackoverflow 中的所有朋友(世界上最好的专业开发人员)

但我想我的代码非常复杂(3 级过滤)并且没有最佳性能。:)

谁有更好的报价,请告诉我。

我很想得到一些改进!

于 2013-09-15T12:12:08.720 回答
0

为了从组中获取前两个,查询将是这样的: 更新 但是在这种情况下,帐户 ID 和开始日期的组合必须是唯一的。

.ToList().GroupBy(x=>new{x.Account,x.StartDate}).SelectMany(y=>y.OrderByDescending(z=>z.StartDate).Take(2));

我在我的代码中使用了类似的代码,并且知道这可以正常工作。

于 2013-09-15T12:29:17.130 回答
0

最后我找到了一个产生预期结果的语句。

var result = (from cheque in list.OrderBy(a => a.Account).ThenByDescending(a => a.StartDate)
                            group cheque by new { cheque.Account, cheque.StartDate } into gr
                            //from ids in gr
                            group gr by gr.Key.Account into secondGrouping
                            from second in secondGrouping.Distinct().Take(2)
                                  from Cheque f in second
                                  select f 
                            ).ToList<Cheque>();
于 2013-09-18T12:20:41.457 回答