c# - 跨方法边界的 LINQ to SQL 规则

Question

为了让我的代码更简洁，我经常尝试将 LINQ to SQL 中的部分数据访问代码分解为私有子方法，就像使用普通的旧业务逻辑代码一样。让我举一个非常简单的例子：

public IEnumerable<Item> GetItemsFromRepository()
{
    var setA = from a in this.dataContext.TableA
               where /* criteria */
               select a.Prop;

    return DoSubquery(setA);
}

private IEnumerable<Item> DoSubQuery(IEnumerable<DateTimeOffset> set)
{
     return from item in set
            where /* criteria */
            select item;
}

我敢肯定，没有人会通过想象具有更深嵌套的更复杂示例或使用集合结果来过滤其他查询来扩展想象力。

我的基本问题是：我已经看到了一些显着的性能差异，甚至只是通过在私有方法中重新组织 LINQ to SQL 代码而引发的异常。谁能解释这些行为的规则，以便我可以就如何编写高效、干净的数据访问代码做出明智的决定？

我有一些问题：

1) System.Linq.Table instace 何时通过方法导致查询执行？

2) 何时在另一个查询中使用 System.Linq.Table 会导致执行？

3) 对 System.Linq.Table 将参数传递给方法的操作类型（Take、First、Last、order by 等）是否有限制？

score 5 · Accepted Answer

The most important rule in terms of LINQ-to-SQL would be: don't return IEnumerable<T> unless you must - as the semantic is unclear. There are two schools of thought beyond that:

if you return IQueryable<T>, it is composable, meaning the where from later queries is combined to make a single TSQL, but as a down-side, it is hard to fully test
otherwise, return List<T> or similar, so it is clear that everything beyond that point is LINQ-to-Objects

Currently, you are doing something in the middle: collapsing it to LINQ-to-Objects (via IEnumerable<T>), but without it being obvious - and keeping the connection open in the middle (again, only a problem because it isn't obvious)

score 3 · Accepted Answer

删除隐式转换：

public IQueryable<Item> GetItemsFromRepository()
{
    var setA = from a in this.dataContext.TableA
               where /* criteria */
               select a.Prop;

    return DoSubquery(setA);
}

private IQueryable<Item> DoSubQuery(IQueryable<DateTimeOffset> set)
{
     return from item in set
            where /* criteria */
            select item;
}

IQueryable<Item>从to的隐式转换IEnumerable<Item>与调用AsEnumerable()您的IQueryable<Item>. 当然，有时您需要这样做，但您应该将其保留为IQueryable默认值，以便可以在数据库上执行整个查询，而不仅仅是GetItemsFromRepository()在内存中完成其余部分。

次要问题：

1) System.Linq.Table instace 何时通过方法导致查询执行？

当某事需要最终结果时，例如Max(),ToList()等既不是可查询对象，也不是加载时可枚举的对象。

但是请注意，虽然AsEnumerable()不会导致查询执行，但这确实意味着当执行确实发生在AsEnumerable()对源数据源执行之前，这将产生一个按需的内存数据源，其余的将被执行.

2) 何时在另一个查询中使用 System.Linq.Table 会导致执行？

和上面一样。Table<T>实现IQueryable<T>。例如，如果您将其中两个连接在一起，那还不会导致执行任何操作。

3) 对 System.Linq.Table 将参数传递给方法的操作类型（Take、First、Last、order by 等）是否有限制？

由定义的那些IQueryable<T>。

编辑：对和之间的异同进行一些IEnumerable澄清IQueryable。

几乎可以在 an 上执行的任何操作，IQueryable反之亦然IEnumerable，但执行方式会有所不同。

任何给定IQueryable的实现都可以在 linq 查询中使用，并且将具有所有 linqy 扩展方法，例如,Take()等。Select()GroupBy

只是如何做到这一点，取决于实施。例如，System.Linq.Data.Table通过将查询转换为 SQL 查询来实现这些方法，其结果在加载时转换为对象。所以如果mySource是一张桌子，那么：

var filtered = from item in mySource
  where item.ID < 23
  select new{item.ID, item.Name};

foreach(var i in filtered)
  Console.WriteLine(i.Name);

变成如下 SQL：

select id, name from mySourceTable where id < 23

然后从中创建一个枚举器，以便在每次调用MoveNext()另一行时从结果中读取，并从中创建一个新的匿名对象。

另一方面，如果mySourcewhere aList或 a HashSet，或其他任何实现IEnumerable<T>但没有自己的查询引擎的东西，那么 linq-to-objects 代码会将其变成如下内容：

foreach(var item in mySource)
  if(item.ID < 23)
    yield return new {item.ID, item.Name};

这与该代码可以在内存中完成的效率差不多。结果将是相同的，但获得它们的方式会有所不同：

现在，由于所有IQueryable<T>都可以转换为等价物IEnumerable<T>，如果我们愿意，我们可以采用第一个mySource（在数据库中执行的地方）并执行以下操作：

var filtered = from item in mySource.AsEnumerable()
  where item.ID < 23
  select new{item.ID, item.Name};

在这里，虽然在我们迭代结果或调用检查所有这些结果的东西之前，仍然没有对数据库执行任何操作，但一旦我们这样做，就好像我们将执行分为两个单独的步骤：

var asEnum = mySource.AsEnumerable();
var filtered = from item in asEnum
  where item.ID < 23
  select new{item.ID, item.Name};

第一行的实现是执行 SQL SELECT * FROM mySourceTable，其余的执行就像上面的 linq-to-objects 示例。

不难看出，如果数据库包含 10 个 id < 23 的项目，以及 50,000 个 id 更高的项目，那么现在的性能要低得多。

除了提供显式AsEnumerable()方法外，所有IQueryable<T>都可以隐式转换为IEnumerable<T>. 这让我们可以foreach对它们进行处理并将它们与任何其他处理的现有代码一起使用IEnumerable<T>，但是如果我们在不适当的时间不小心这样做了，我们可以使查询变得更慢，这就是当你DoSubQuery被定义为获取IEnumerable<DateTimeOffset>和返回时发生的情况一个IEnumerable<Item>; 它隐式地调用AsEnumerable()了你IQueryable<DateTimeOffset>和你的IQueryable<Item>，并导致可能在数据库上执行的操作在内存中执行。

出于这个原因，在 99% 的情况下，我们都希望继续进行交易，IQueryable直到最后一刻。

作为一个相反的例子，只是为了指出这一点AsEnumerable()以及演员IEnumerable<T>们并不是出于疯狂，我们应该考虑两件事。第一个是IEnumerable<T>让我们做一些其他方式无法完成的事情，比如加入两个完全不同但彼此不了解的来源（例如两个不同的数据库、一个数据库和一个 XML 文件等）

另一个是有时IEnumerable<T>实际上也更有效。考虑：

IQueryable<IGrouping<string, int>> groupingQuery = from item in mySource select item.ID group by item.Name;
var list1 = groupingQuery.Select(grp => new {Name=grp.Key, Count=grp.Count()}).ToList();//fine
foreach(var grp in groupingQuery)//disaster!
  Console.WriteLine(grp.Count());

这里groupingQuery被设置为一个可查询的，它进行一些分组，但无论如何都没有执行。当我们创建 list1 时，首先我们IQueryable基于它创建一个新的，查询引擎最好找出最适合它的 SQL 是什么，并提出如下内容：

select name, count(id) from mySourceTable group by name

这是非常有效的执行。然后将行转换为对象，然后将其放入列表中。

另一方面，对于第二个查询，对于不对group by所有非分组项执行聚合方法的 a 没有自然的 SQL 转换，因此查询引擎能想到的最好的方法是首先做：

select distinct name from mySourceTable,

然后对于它收到的每个名称，执行：

select id from mySourceTable where name = '{name found in last query goes here}'

依此类推，这是否意味着 2 个 SQL 查询，还是 200,000 个。

在这种情况下，我们的工作要好得多，mySource.AsEnumerable()因为在这里首先将整个表抓取到内存中效率更高。（更好的是继续工作，mySource.Select(item => new {item.ID, item.Name}).AsEnumerable()因为那时我们仍然只从数据库中检索我们关心的列，然后切换到内存中）。

The last bit is worth remembering because it breaks our rule that we should stay with IQueryable<T> as long as possible. It isn't something to worry about much, but it is worth keeping an eye on if you do grouping and find yourself with a very slow query.

c# - 跨方法边界的 LINQ to SQL 规则

2 回答 2

Related

Reference