sql - Fast way to select small sample from huge table

Question

The table I have is huge about 100+ million entries, it is ordered by default by 'A'. There could be many items with the same column A, A increases from 0 to... A big number. I tried TABLESAMPLE but it does not quite select a good number from each A number, it skips some of them or maybe I am not using it well. So I would like to select the same amount of values from each A number. And I would like the total of selected rows to be a number, let's say 10 million or let's call it B.

score 2 · Accepted Answer

虽然我不清楚您需要实现什么，但当我需要一个在父值和/或公共属性值之间很好分布的大型样本子集时，我已经这样做了：

SELECT  *
FROM    YourTable
WHERE   (YourID % 10) = 3

这还有一个好处，就是您可以通过将“3”更改为另一个数字来获得另一个完全不同的样本。另外，您可以通过调整“10”来更改子样本大小。

score 1 · Accepted Answer

1

您可以使用NEWID()：

SELECT TOP 100
  *
FROM
  YourTable
ORDER BY NEWID()

于 2012-04-30T00:57:13.183 回答

score 0 · Accepted Answer

@RBarryYoung 解决方案是正确的，通用的，它适用于任何恒定的统计分布，如 id 序列（或任何自动增量列）。但是，有时您的分布不是恒定的，或者您可能会遇到性能问题（SQL Server 必须扫描所有索引条目以计算 WHERE 子句）。

TOP如果其中任何一个影响您的问题，请考虑可能适合您需要的内置 T-SQL 运算符：

SELECT TOP (30) PERCENT *
FROM YourTable;

sql - Fast way to select small sample from huge table

3 回答 3

Related

Reference