sql - SQL Server：根据类型仅包括一定百分比的行

Question

我有一个输入的问题表：Select One, True False，Select All 我正在尝试编写一个查询，该查询将返回一定数量的问题，同时将其中一种类型限制为返回的类型的特定百分比。

例子：

SELECT QuestionID, QuestionText, QuestionType
FROM Question
WHERE/HAVING --limit QuestionTypeID = 4 to be only 10% of total returned

我最初的想法是单独提取查询

SELECT TOP 10 PERCENT QuestionID, QuestionText, QuestionType
FROM Question
WHERE QuestionTypeID <> 4
UNION ALL
SELECT QuestionID, QuestionText, QuestionType
FROM Question
WHERE QuestionTypeID = 4

然而，似乎有一种更有效的计算方法。

编辑：

让我澄清一下我要完成的工作......我需要返回 90% 的其他类型问题的混合体，而只有 10% 的类型 4。

我需要将结果随机化，所以我一直在使用SET ROWCOUNT和ORDER BY NEWID()

我可以使用子查询，但不确定在SET ROWCOUNT子查询中使用的方式...此时似乎需要临时表，但如果有更好的方法请告诉我...

到目前为止我所拥有的......带有临时表

DECLARE @ReturnPercent [int] --variable that holds percentage for bottom questions
DECLARE @ReturnCount [int] --variable that holds how many questions I'd like returned

CREATE TABLE #Temp1(
        QuestionID [int], 
        QuestionText [nvarchar](256), 
        QuestionTypeID [int]
);

DECLARE @TOP [int] = @ReturnCount-CAST(@ReturnCount*@ReturnPercent AS INT);
DECLARE @BOTTOM [int] = CAST(@ReturnCount(@ReturnPercent AS INT);

SET ROWCOUNT @TOP
INSERT INTO #Temp1(QuestionID, QuestionText, QuestionTypeID)
SELECT QuestionID, QuestionText, QuestionTypeID
FROM Question
WHERE QuestionTypeID <> 4
ORDER BY NEWID()
SET ROWCOUNT 0

SET ROWCOUNT @BOTTOM
INSERT INTO #Temp1(QuestionID, QuestionText, QuestionTypeID)
SELECT QuestionID, QuestionText, QuestionTypeID
FROM Question
WHERE QuestionTypeID = 4
ORDER BY NEWID()
SET ROWCOUNT 0

--Query to join them with other data(omitted) 
SET ROWCOUNT @ReturnCount
SELECT a.QuestionID, a.QuestionText, a.QuestionTypeID
FROM #Temp1 a
JOIN --OTHER TABLES FOR FULL QUERY
ORDER BY NEWID()
SET ROWCOUNT 0

score 0 · Accepted Answer

union all 很可能是最有效的方法。

有一种为所有行分配行号的有效方法：

SELECT row_number() over (order by (select NULL)) as seqnum

但是，如果您添加分区子句，我认为这不是有效的。如果这是有效的，并且如果您有关于问题 id 的索引，则可以使用：

select q.QuestionID, q.QuestionText, q.QuestionType
from (select QuestionID, QuestionText, QuestionType,
             row_number() over (partition by QuestionId order by (select NULL)) as seqnum
      from Question
     ) q join
     (select QuestionId, count(*) as cnt
      From Question
      group by Questionid
     ) qsum
     q.questionid = qsum.questionid
where questionid <> 4 or seqnum*1.0/cnt <= 0.1

（这个想法是 group by 将由索引满足，连接将非常便宜，并且 row_number() 有效。）

但是，我认为，工会都可能更快。

嗯，现在我想起来了，你可以用一个近似值走得很远：

select q.QuestionID, q.QuestionText, q.QuestionType
from (select QuestionID, QuestionText, QuestionType,
             row_number() over (order by (select NULL)) as seqnum
      from Question
     ) q
where mod(seqnum, 10) <> 0 and questionid <> 4 or
      mod(seqnum, 10) = 0 and questionid = 4

这占用了 questionid <> 4 的大约 90% 的行和 questionid = 4 的大约 10% 的行。这应该非常有效，因为这个短语中的 row_number() 分配了一个没有排序或分区的序列号。

sql - SQL Server：根据类型仅包括一定百分比的行

1 回答 1

Related

Reference