4


I have around 200K records in a list and I'm looping through them and forming another collection. This works fine on my local 64 bit Win 7 but when I move it to a Windows Server 2008 R2, it takes a lot of time. There is difference of about an hour almost!

I tried looking at Compiled Queries and am still figuring it out.
For various reasons, we cant do a database join and retrieve the child values

Here is the code:

//listOfDetails is another collection
List<SomeDetails> myDetails = null;
foreach (CustomerDetails myItem in customerDetails)
{

    var myList = from ss in listOfDetails
                 where ss.CustomerNumber == myItem.CustomerNum
                 && ss.ID == myItem.ID
                 select ss;
     myDetails = (List<SomeDetails>)(myList.ToList());
     myItem.SomeDetails = myDetails;
}
4

4 回答 4

7

我会这样做:

var lookup = listOfDetails.ToLookup(x => new { x.CustomerNumber, x.ID });
foreach(var item in customerDetails)
{
    var key = new { CustomerNumber = item.CustomerNum, item.ID };
    item.SomeDetails = lookup[key].ToList();
}

这段代码的最大好处是它只需要循环listOfDetails一次来构建查找 - 这只不过是一个哈希映射。之后,我们只需使用键获取值,这非常快,因为这是构建哈希映射的目的。

于 2012-09-11T16:57:35.187 回答
4

我不知道为什么你在性能上有差异,但你应该能够让代码表现得更好。

//listOfDetails is another collection
List<SomeDetails> myDetails = ...;
detailsGrouped = myDetails.ToLookup(x => new { x.CustomerNumber, x.ID });
foreach (CustomerDetails myItem in customerDetails)
{ 
    var myList = detailsGrouped[new { CustomerNumber = myItem.CustomerNum, myItem.ID }];
    myItem.SomeDetails = myList.ToList();
}

这里的想法是避免重复循环myDetails,而是构建基于哈希的查找。一旦构建完成,进行查找非常便宜。

于 2012-09-11T16:55:59.853 回答
1

内部 ToList() 强制对每个循环进行评估,这一定会受到伤害。SelectMany 可能会让您避免使用 ToList,如下所示:

var details = customerDetails.Select( item => listOfDetails
    .Where( detail => detail.CustomerNumber == item.CustomerNum)
    .Where( detail => detail.ID == item.ID)
    .SelectMany( i => i as SomeDetails )
);

如果您首先获取所有 SomeDetails,然后将它们分配给项目,它可能会加快速度。或者它可能不会。您应该真正配置文件以查看花费的时间。

于 2012-09-11T17:01:19.260 回答
1

我认为您可能会从这里的加入中受益,所以:

var mods = customerDetails
    .Join(
        listOfDetails, 
        x => Tuple.Create(x.ID, x.CustomerNum), 
        x => Tuple.Create(x.ID, x.CustomerNumber),
        (a, b) => new {custDet = a, listDet = b})
    .GroupBy(x => x.custDet)
    .Select(g => new{custDet = g.Key,items = g.Select(x => x.listDet).ToList()});

foreach(var mod in mods)
{
    mod.custDet.SomeDetails = mod.items;
}

我没有编译这段代码...

Lookup通过在 O(n) 时间内构建第二个列表的类似哈希表的集合(然后就是迭代第一个列表并从Lookup. 由于从哈希表中提取数据是 O(1),因此迭代/匹配阶段也只需要 O(n),随后的 GroupBy 也是如此。因此,在所有操作中都应该采用~O(3n),这相当于 O(n),其中 n 是较长列表的长度。

于 2012-09-11T17:30:28.000 回答