architecture - 3-tiers pattern and large amounts of data

Question

Here is my situation: I am trying to follow as hard as I can the 3-tier pattern (i.e. Presentation, Business and Data layer). When I need data from the DB, the Business layer calls the Data layer which returns the information. The Data layer never return a SqlDataReader or DataTable object, but often an enumeration of custom object known by the Data Access Layer. It works pretty well when the Data layer has to return a list with few objects.

I am now facing this problem, my application (the business layer) must process 500000 records. I could simply add another method to my Data layer and return an IEnumerable but this sound very bad to me. I don't want to load half-million records in memory.

My question is, considering the 3-tier model, how should I handle this case? If I had no 3-tiers pattern, I would simply use SqlDataReader in my business classes. Any suggestions?

UPDATE: The data will not be displayed, so this is not a paging issue (the presentation layer is not involved at all here). I simply have to analyze each record and then keep some of them.

Thanks

score 2 · Accepted Answer

I assume you're not displaying 500,000 records to the front end at once? You're probably doing some pagination, right? So, only return one page worth of data from the database at one time.

score 1 · Accepted Answer

您可以在 SqlReader 类之上构建抽象。这样您就不必直接传递 SqlReader，但您仍然可以一次处理一个对象。

想想迭代器。

score 1 · Accepted Answer

Yes, your instinct is correct.

I'm betting that your UI client does not want to look at half a million records at once. Google doesn't return every hit in a single page; you won't, either.

You have a choice as to where and when your application processes those half a million records. You can chunk them into smaller units of work; you can process them asynchronously; you can write a stored procedure and process them in the database without bringing them all over to the middle tier.

The MVC pattern is wonderful, but it's not holy writ. Make the choices that work for your app.

score 1 · Accepted Answer

一张纸永远无法战胜现实。如果您的具体问题要求打破 3 层范式，那就去做吧。

score 1 · Accepted Answer

在某些情况下，您必须打破 3 层界限。但在你这样做之前，你可以问自己：

当您“分析每条记录并保留其中一些记录”时，这真的是业务逻辑的一部分吗？或者它是一个数据访问功能？这可能属于数据访问层。
如果它是业务逻辑的一部分，您是否需要全部 500000 条记录才能决定是否“保留”任何单独的记录？可能是业务层应该一次处理一条记录。连续进行 500000 次数据库调用并不漂亮，但如果从概念的角度来看，这是应用程序应该做的事情，那么有一些方法可以缓解这种情况。

我不建议为了保持 3 层分开而做任何愚蠢的事情。但有时，当你认为你必须越界时，那是因为设计中有一些东西需要重新审视。

--
bmb

score 1 · Accepted Answer

在数据库中进行过滤。无论如何都不需要带来超过 500000 条要过滤掉的记录。为什么将它们都带到中间层只是为了删除它们。使用后端的 SQL 引擎 (sproc) 尽早处理操作（数据）。最有效，类似于在发送到 IIS 之前检查表示层上的基本输入检查。

score 0 · Accepted Answer

This is not an uncommon problem and occurs frequently in situations where you need to consolidate large amounts of data and present summaries to the user (reports are a typical example). Your solution should be designed with these considerations in mind. It does not make sense to ignore the efficiencies offered by sql readers (or similar tools) when strict coherence to some particular architectural model makes your application inefficient. It is often possible to overcome some of these problems by adapting an architectural model to your needs. Generic architectural models are rarely applicable out of the box. They are guidelines that should be applied to your particular needs.

score 0 · Accepted Answer

如果我正确理解这一点，您想“分析”记录，然后保留其中一些并删除其余记录。那么在这种情况下，我认为最好在数据库本身（PL/SQL 或 T/SQL）内处理这个问题。像这样的需求应该是重中之重，而不是架构。由于您不只是显示分析，因此最好在程序本身中进行。

score 0 · Accepted Answer

在数据库级别进行所需的任何分析都不会感到羞耻。如果您可以使用存储过程对您需要的东西进行切片和切块，或者与存储过程进行必要的关联，并将应用程序用于更复杂的操作，那么您应该没问题。

问题是，用户是否希望按下按钮并处理所有 500K 记录并看到结果？如果是这样，他们是否愿意坐下来观看旋转的 gif，或者在流程完成后收到某种类型的通知是否令人满意？如果处理 500K 是最重要的，您的数据模型是否需要更改以支持此过程？有Hadoop和消息队列等处理方法适合这种大容量，但是你需要到这个程度吗？您可能能够在将您的头发拉到性能上之前设定用户的期望。

architecture - 3-tiers pattern and large amounts of data

9 回答 9

Related

Reference