1

我有一个输入的问题表:Select One, True FalseSelect All 我正在尝试编写一个查询,该查询将返回一定数量的问题,同时将其中一种类型限制为返回的类型的特定百分比。

例子:

SELECT QuestionID, QuestionText, QuestionType
FROM Question
WHERE/HAVING --limit QuestionTypeID = 4 to be only 10% of total returned

我最初的想法是单独提取查询

SELECT TOP 10 PERCENT QuestionID, QuestionText, QuestionType
FROM Question
WHERE QuestionTypeID <> 4
UNION ALL
SELECT QuestionID, QuestionText, QuestionType
FROM Question
WHERE QuestionTypeID = 4

然而,似乎有一种更有效的计算方法。


编辑:

让我澄清一下我要完成的工作......我需要返回 90% 的其他类型问题的混合体,而只有 10% 的类型 4。

我需要将结果随机化,所以我一直在使用SET ROWCOUNTORDER BY NEWID()

我可以使用子查询,但不确定在SET ROWCOUNT子查询中使用的方式...此时似乎需要临时表,但如果有更好的方法请告诉我...

到目前为止我所拥有的......带有临时表

DECLARE @ReturnPercent [int] --variable that holds percentage for bottom questions
DECLARE @ReturnCount [int] --variable that holds how many questions I'd like returned

CREATE TABLE #Temp1(
        QuestionID [int], 
        QuestionText [nvarchar](256), 
        QuestionTypeID [int]
);

DECLARE @TOP [int] = @ReturnCount-CAST(@ReturnCount*@ReturnPercent AS INT);
DECLARE @BOTTOM [int] = CAST(@ReturnCount(@ReturnPercent AS INT);

SET ROWCOUNT @TOP
INSERT INTO #Temp1(QuestionID, QuestionText, QuestionTypeID)
SELECT QuestionID, QuestionText, QuestionTypeID
FROM Question
WHERE QuestionTypeID <> 4
ORDER BY NEWID()
SET ROWCOUNT 0

SET ROWCOUNT @BOTTOM
INSERT INTO #Temp1(QuestionID, QuestionText, QuestionTypeID)
SELECT QuestionID, QuestionText, QuestionTypeID
FROM Question
WHERE QuestionTypeID = 4
ORDER BY NEWID()
SET ROWCOUNT 0

--Query to join them with other data(omitted) 
SET ROWCOUNT @ReturnCount
SELECT a.QuestionID, a.QuestionText, a.QuestionTypeID
FROM #Temp1 a
JOIN --OTHER TABLES FOR FULL QUERY
ORDER BY NEWID()
SET ROWCOUNT 0
4

1 回答 1

0

union all 很可能是最有效的方法。

有一种为所有行分配行号的有效方法:

SELECT row_number() over (order by (select NULL)) as seqnum

但是,如果您添加分区子句,我认为这不是有效的。如果这是有效的,并且如果您有关于问题 id 的索引,则可以使用:

select q.QuestionID, q.QuestionText, q.QuestionType
from (select QuestionID, QuestionText, QuestionType,
             row_number() over (partition by QuestionId order by (select NULL)) as seqnum
      from Question
     ) q join
     (select QuestionId, count(*) as cnt
      From Question
      group by Questionid
     ) qsum
     q.questionid = qsum.questionid
where questionid <> 4 or seqnum*1.0/cnt <= 0.1

(这个想法是 group by 将由索引满足,连接将非常便宜,并且 row_number() 有效。)

但是,我认为,工会都可能更快。

嗯,现在我想起来了,你可以用一个近似值走得很远:

select q.QuestionID, q.QuestionText, q.QuestionType
from (select QuestionID, QuestionText, QuestionType,
             row_number() over (order by (select NULL)) as seqnum
      from Question
     ) q
where mod(seqnum, 10) <> 0 and questionid <> 4 or
      mod(seqnum, 10) = 0 and questionid = 4

这占用了 questionid <> 4 的大约 90% 的行和 questionid = 4 的大约 10% 的行。这应该非常有效,因为这个短语中的 row_number() 分配了一个没有排序或分区的序列号。

于 2012-06-02T01:51:34.460 回答