5

Right now I'm working on a pretty complex database. Our object model is designed to be mapped to the database. We're using EF 5 with POCO classes, manually generated.

Everything is working, but there's some complaining about the performances. I've never had performance problems with EF so I'm wondering if this time I just did something terribly wrong, or the problem could reside somewhere else.

The main query may be composed of dynamic parameters. I have several if and switch blocks that are conceptually like this:

if (parameter != null) { query = query.Where(c => c.Field == parameter); }

Also, for some complex And/Or combinations I'm using LinqKit extensions from Albahari.

The query is against a big table of "Orders", containing years and years of data. The average use is a 2 months range filter though.

Now when the main query is composed, it gets paginated with a Skip/Take combination, where the Take is set to 10 elements.

After all this, the IQueryable is sent through layers, reaches the MVC layer where Automapper is employed.

Here, when Automapper starts iterating (and thus the query is really executed) it calls a bunch of navigation properties, which have their own navigation properties and so on. Everything is set to Lazy Loading according to EF recommendations to avoid eager loading if you have more than 3 or 4 distinct entities to include. My scenario is something like this:

  • Orders (maximum 10)
    • Many navigation properties under Order
      • Some of these have other navigation under them (localization entities)
    • Order details (many order details per order)
      • Many navigation properties under each Order detail
        • Some of these have other navigation under them (localization entities)

This easily leads to a total of 300+ queries for a single rendered "page". Each of those queries is very fast, running in a few milliseconds, but still there are 2 main concerns:

  • The lazy loaded properties are called in sequence and not parallelized, thus taking more time
  • As a consequence of previous point, there's some dead time between each query, as the database has to receive the sql, run it, return it and so on for each query.

Just to see how it went, I tried to make the same query with eager loading, and as I predicted it was a total disaster, with a translated sql of more than 7K lines (yes, seven thousands) and way more slow overall.

Now I'm reluctant to think that EF and Linq are not the right choice for this scenario. Some are saying that if they were to write a stored procedure which fetches all the needed data, it would run tens of times faster. I don't believe that to be true, and we would lose the automatic materialization of all related entities.

I thought of some things I could do to improve, like:

  • Table splitting to reduce the selected columns
  • Turn off object tracking, as this scenario is read only (have untracked entities)

With all of this said, the main complaint is that the result page (done in MVC 4) renders too slowly, and after a bit of diagnostics it seems all "Server Time" and not "Network Time", taking about from 8 to 12 seconds of server time.

From my experience, this should not be happening. I'm wondering if I'm approaching this query need in a wrong way, or if I have to turn my attention to something else (maybe a bad configured IIS server, or anything else I'm really clueless). Needles to say, the database has its indexes ok, checked very carefully by our dba.

So if anyone has any tip, advice, best practice I'm missing about this, or just can tell me that I'm dead wrong in using EF with Lazy Loading for this scenario... you're all welcome.

4

4 回答 4

4

对于产生大量分层数据的非常复杂的查询,如果您采用正确的方法,存储过程通常不会帮助您在性能方面优于 LINQ/EF。正如您所指出的,EF(延迟加载和急切加载)的两个“开箱即用”选项在这种情况下效果不佳。但是,仍然有几种好的方法可以优化这一点:

(1) 与其将一堆实体读入内存然后通过自动映射器进行映射,不如在可能的情况下直接在查询中执行“自动映射”。例如:

var mapped = myOrdersQuery.Select(o => new OrderInfo { Order = o, DetailCount = o.Details.Count, ... })
    // by deferring the load until here, we can bring only the information we actually need 
    // into memory with a single query
    .ToList();

如果您只需要复杂层次结构中的一部分字段,这种方法非常有效。此外,如果您需要返回比平面表格数据更复杂的内容,EF 选择分层数据的能力比使用存储过程更容易。

(2) 手动运行多个 LINQ 查询并将结果组装到内存中。例如:

// read with AsNoTracking() since we'll be manually setting associations
var myOrders = myOrdersQuery.AsNoTracking().ToList();
var orderIds = myOrders.Select(o => o.Id);
var myDetails = context.Details.Where(d => orderIds.Contains(d.OrderId)).ToLookup(d => d.OrderId);
// reassemble in memory
myOrders.ForEach(o => o.Details = myDetails[o.Id].ToList());

当您需要所有数据并且仍希望尽可能多地利用 EF 实现时,这非常有效。请注意,在大多数情况下,存储过程方法不能做得比这更好(它使用原始 SQL,因此它必须运行多个表格查询)但不能重用您已经在 LINQ 中编写的逻辑。

(3) 使用 Include() 手动控制预先加载的关联。这可以与 #2 结合使用,以利用 EF 加载某些关联,同时让您可以灵活地手动加载其他关联。

于 2013-08-13T12:40:04.843 回答
2

尝试想一个高效而简单的 sql 查询来获取视图数据。

甚至可能吗?

如果没有,请尝试分解(非规范化)您的表,以便获取数据所需的连接更少。此外,表列上是否有有效的索引来加速数据检索?

如果是,忘记 EF,编写一个存储过程并使用它来获取数据。

对于只读方案,必须关闭对选定查询的跟踪。看看我的数字:

http://netpl.blogspot.com/2013/05/yet-another-orm-micro-benchmark-part-23_15.html

如您所见,tracking 和 nottracking 场景之间的区别是显着的。

我会尝试急切加载,但不是在任何地方(所以你最终不会得到 7k 行长的查询),而是在选定的子查询中。

于 2013-05-24T19:13:43.063 回答
0

需要考虑的一点是,EF 绝对有助于缩短开发时间。但是,您必须记住,当您从数据库返回大量数据时,EF 使用的是动态 SQL。这意味着EF必须1.创建SQL,2.SQL Server然后需要创建执行计划。这发生在查询运行之前。

使用存储过程时,SQL Server 可以缓存执行计划(可以对其进行编辑以提高性能),这确实比使用 EF 更快。但是...您始终可以创建存储过程,然后从 EF 执行它。我将转换为存储过程然后从 EF 调用的任何复杂过程或查询。然后你可以看到你的性能提升并从那里重新评估。

于 2013-05-24T19:12:41.340 回答
0

在某些情况下,您可以使用 Compiled Queries MSDN来大幅提高查询性能。这个想法是,如果您有一个运行多次的公共查询,可能会生成具有不同参数的相同 SQL 调用,您可以在第一次运行时编译查询绑定,然后将其作为委托传递,从而消除实体框架重新的开销为每个后续调用生成 SQL。

于 2013-08-13T13:45:03.847 回答