0

使用 SQL Server 2008 R2,我有以下查询返回的结果集 -

QID    QcID    QtID    QsID
21      1       SC      3
4       1       SC      1
8       1       MC      1
2       1       SC      1
23      1       SC      3
24      1       SC      3
5       1       SC      1
22      1       SC      3
1       1       SC      1
29      1       MC      3
10      1       MC      1
30      1       MC      3
26      1       MC      3
25      1       SC      3
6       1       MC      1
27      1       MC      3
7       1       MC      1
3       1       SC      1
28      1       MC      3
9       1       MC      1

现在我想找到一组 15 个 QID 的随机集合,其中必须包括 -

 9 QsID having QsID = 1
 6 QsID having QsID = 3
 9 QtID having QtID = SC
 6 QtID having QtID = MC
 15 QsID having QtID = 1

由于它可能有数以万计的记录,因此如何做到牢记性能。

@ Damien_The_Unbeliever 预期的输出可能是 -

21      1       SC       3
4       1       SC       1
8       1       MC       1
2       1       SC       1
23      1       SC       3
24      1       SC       3
5       1       SC       1
1       1       SC       1
10      1       MC       1
25      1       SC       3
6       1       MC       1
27      1       MC       3
7       1       MC       1
3       1       SC       1
28      1       MC       3

还要忘记随机性,如何选择满足所有条件的集合。

4

2 回答 2

0

编辑2:

那么,如何使用存储过程呢?
假设您有 4 组用于采样数据的条件。从您提供的输出推导出来。它可能不正确,但您可以根据需要再次调整它。

参数:

  • @SIZE- 是以行为单位的采样结果集的大小
  • @P1-@P3- 应为特定条件集填充随机行的采样结果集的百分比。
  • @P4=@SIZE-(@N1+@N2+@N3)
CREATE PROCEDURE sqlsampling 
    @SIZE INT, @P1 DECIMAL(6,4), @P2 DECIMAL(6,4), @P3 DECIMAL(6,4) 
AS 

DECLARE @N1 INT, @N2 INT, @N3 INT, @N4 INT;
SET @N1=CEILING(@SIZE*@P1*0.01);
SET @N2=CEILING(@SIZE*@P2*0.01);
SET @N3=CEILING(@SIZE*@P3*0.01);
SET @N4=@SIZE-(@N1+@N2+@N3);

CREATE TABLE #sample(QID INT, QcID INT, QtID CHAR(2), QsID INT);

INSERT INTO #sample 
SELECT TOP(@N1) * FROM mytable 
WHERE QtID = 'MC' AND QsID = 1 
ORDER BY CHECKSUM(NEWID());

INSERT INTO #sample 
SELECT TOP(@N2) * FROM mytable 
WHERE QtID = 'MC' AND QsID = 3 AND QID NOT IN(SELECT QID FROM #sample)
ORDER BY CHECKSUM(NEWID());

INSERT INTO #sample 
SELECT TOP(@N3) * FROM mytable 
WHERE QtID = 'SC' AND QsID = 1 AND QID NOT IN(SELECT QID FROM #sample)
ORDER BY CHECKSUM(NEWID()); 

INSERT INTO #sample
SELECT TOP(@N4) * FROM mytable 
WHERE QtID = 'SC' AND QsID = 3 AND QID NOT IN(SELECT QID FROM #sample)
ORDER BY CHECKSUM(NEWID());

SELECT * FROM #sample;
DROP TABLE #sample;
GO

如果我们像这样在您的示例数据上执行它

EXEC sqlsampling @SIZE=15, @P1=26.666, @P2=13.333, @P3=33.333;

它会给我们输出:

QID QCID QTID QSID
10  1    MC   1
9   1    MC   1
7   1    MC   1
6   1    MC   1
27  1    MC   3
30  1    MC   3
1   1    SC   1
4   1    SC   1
3   1    SC   1
5   1    SC   1
2   1    SC   1
21  1    SC   3
23  1    SC   3
25  1    SC   3
24  1    SC   3

其他注意事项:

  • 您的条件集的正确索引应该会有所帮助
  • CHECKSUM(NEWID())对我们使用的惩罚有所帮助NEWID()

原答案:

你可以这样做:

SELECT TOP 15 * FROM
(
SELECT * FROM (SELECT TOP 9 QID FROM mytable 
WHERE QsID = 1 
ORDER BY CHECKSUM(NEWID())) a
 UNION
SELECT * FROM (SELECT TOP 6 QID FROM mytable 
WHERE QsID = 3 
ORDER BY CHECKSUM(NEWID())) b
...
) z ORDER BY CHECKSUM(NEWID())
于 2013-01-24T08:53:11.593 回答
0

也许您可以尝试percent..top这不是完整的答案,而是向方向投一些光..

select  * from demo where qid in 
(select top 40 percent qid
 from demo order by newid())
;

这里也是一个参考:http tablesample: //msdn.microsoft.com/en-us/library/ms189108.aspx

于 2013-01-24T09:40:43.343 回答