The table I have is huge about 100+ million entries, it is ordered by default by 'A'. There could be many items with the same column A, A increases from 0 to... A big number. I tried TABLESAMPLE but it does not quite select a good number from each A number, it skips some of them or maybe I am not using it well. So I would like to select the same amount of values from each A number. And I would like the total of selected rows to be a number, let's say 10 million or let's call it B.
问问题
1430 次
3 回答
2
虽然我不清楚您需要实现什么,但当我需要一个在父值和/或公共属性值之间很好分布的大型样本子集时,我已经这样做了:
SELECT *
FROM YourTable
WHERE (YourID % 10) = 3
这还有一个好处,就是您可以通过将“3”更改为另一个数字来获得另一个完全不同的样本。另外,您可以通过调整“10”来更改子样本大小。
于 2012-04-29T23:24:06.113 回答
1
您可以使用NEWID()
:
SELECT TOP 100
*
FROM
YourTable
ORDER BY NEWID()
于 2012-04-30T00:57:13.183 回答
0
@RBarryYoung 解决方案是正确的,通用的,它适用于任何恒定的统计分布,如 id 序列(或任何自动增量列)。但是,有时您的分布不是恒定的,或者您可能会遇到性能问题(SQL Server 必须扫描所有索引条目以计算 WHERE 子句)。
TOP
如果其中任何一个影响您的问题,请考虑可能适合您需要的内置 T-SQL 运算符:
SELECT TOP (30) PERCENT *
FROM YourTable;
于 2012-04-30T10:29:13.920 回答