sql - 当查询得到非常大的结果时会发生什么

Question

大多数用法/教程/手册解释了如何在可管理的数据库的上下文中使用这些方法。

因此，如果User.where( some condition)返回数十或数百个结果，那么认为 Rails/DB/server 可以处理它是合理的

如果相同的查询返回数千或数十万条记录的结果会怎样？我敢说百万记录吗？

它取决于什么？Rails 或硬件（如果有的话）有什么限制？

最重要的是，有没有办法在 Rails 中处理如此大的数据集（而不会使一切崩溃？）

score 3 · Accepted Answer

The basic thing is that PostgreSQL will materialize the result set to disk if it gets too large. This means that you get a speed hit, but it keeps memory free for other operations.

In general there is rarely a need in PostgreSQL to send hundreds of thousands or millions of rows to a client ever. The key is to build your queries (and with proper SQL extensions as needed) to return only the data your front-end needs, properly aggregated etc. in the database. I have met a number of people who think that putting such aggregation logic in the db slows it down (and there is a CPU time cost) but the costs in that area tend to be well-repaid many times over in disk I/O wait time costs and the like.

The fundamental question I would ask is "why do you need to see millions of records?" You are basically saying you want to keep these in memory or store them on disk, then transfer them across the network, then receive them, and then process them. This is not the paragon of efficiency. It is far better to process millions of records close to the storage and so trade some CPU cost for for the others.

If you need something capable of more complex intraquery parallelism in a mixed or DW environment, go with Postgres-XC instead of vanilla PostgreSQL. This has significant complexity cost, but in large environments makes otherwise unsolvable problems solvable.

score 2 · Accepted Answer

好的，让我们开始：

如果您尝试将一桶水放入玻璃杯中会发生什么？

就是这么说的：

第一个依赖项是您的数据库大小。
选择数百万行需要 (millions * size of row) 的大小，因此需要很多 Spool Space。如果进一步连接 Spool，则对空间的需求会大大增加。
如果数据库不支持并行并且没有智能优化器，那么这些行数会影响性能并使查询更加缓慢。

无论如何，我认为如果您正在处理数百万/数万亿的数据，您应该考虑迁移到像 Teradata 这样的最新数据仓库。

sql - 当查询得到非常大的结果时会发生什么

2 回答 2

Related

Reference