0

The table I have is huge about 100+ million entries, it is ordered by default by 'A'. There could be many items with the same column A, A increases from 0 to... A big number. I tried TABLESAMPLE but it does not quite select a good number from each A number, it skips some of them or maybe I am not using it well. So I would like to select the same amount of values from each A number. And I would like the total of selected rows to be a number, let's say 10 million or let's call it B.

4

3 回答 3

2

虽然我不清楚您需要实现什么,但当我需要一个在父值和/或公共属性值之间很好分布的大型样本子集时,我已经这样做了:

SELECT  *
FROM    YourTable
WHERE   (YourID % 10) = 3

这还有一个好处,就是您可以通过将“3”更改为另一个数字来获得另一个完全不同的样本。另外,您可以通过调整“10”来更改子样本大小。

于 2012-04-29T23:24:06.113 回答
1

您可以使用NEWID()

SELECT TOP 100
  *
FROM
  YourTable
ORDER BY NEWID()
于 2012-04-30T00:57:13.183 回答
0

@RBarryYoung 解决方案是正确的,通用的,它适用于任何恒定的统计分布,如 id 序列(或任何自动增量列)。但是,有时您的分布不是恒定的,或者您可能会遇到性能问题(SQL Server 必须扫描所有索引条目以计算 WHERE 子句)。

TOP如果其中任何一个影响您的问题,请考虑可能适合您需要的内置 T-SQL 运算符:

SELECT TOP (30) PERCENT *
FROM YourTable;
于 2012-04-30T10:29:13.920 回答