sql - 为 SQL 列提供范围

Question

我有 SQL 表，其中有 column 和 Probability 。我想从中随机选择一行，但我想给更多的机会来获得更多的加权概率。我可以这样做

    Order By abs(checksum(newid()))

但是概率之间的差异太大了，所以它给最高概率提供了更多的机会。就像在选择 74 次该值之后，它再次选择另一个值大约 74 次。我想减少这个。就像我想要 3-4 次对它，而不是其他和所有。我正在考虑将 Range 赋予 Probabilies.It Like

    Row[i] = Row[i-1]+Row[i]

我该怎么做。我需要创建函数吗？还有其他方法可以实现吗？我是 neewby。任何帮助都会得到帮助。谢谢

编辑： 我有我的问题的解决方案。我有一个问题。如果我有如下表格。

    Column1   Column2
     1         50
     2         30
     3         20

我能得到吗？

    Column1   Column2  Column3
     1         50       50
     2         30       80
     3         20       100

每次我想用现有的增加价值。有什么办法吗？

更新： 终于在 3 小时后得到解决方案，我只是取我的概率的平方根，这样我就可以缩小它们的差异。就像我添加列

    sqrt(sqrt(sqrt(Probability)))....:-)

score 0 · Accepted Answer

这是一个基本示例，如何在考虑分配的行权重的情况下从表中选择一行。

假设我们有表：

CREATE TABLE TableWithWeights(
  Id int NOT NULL PRIMARY KEY,
  DataColumn nvarchar(50) NOT NULL,
  Weight decimal(18, 6) NOT NULL -- Weight column
)

让我们用示例数据填充表格。

INSERT INTO TableWithWeights VALUES(1, 'Frequent', 50)
INSERT INTO TableWithWeights VALUES(2, 'Common', 30)
INSERT INTO TableWithWeights VALUES(3, 'Rare', 20)

这是在考虑给定行权重的情况下返回一个随机行的查询。

SELECT * FROM
   (SELECT tww1.*,     -- Select original table data
     -- Add column with the sum of all weights of previous rows
     (SELECT SUM(tww2.Weight)- tww1.Weight  
      FROM TableWithWeights tww2
      WHERE tww2.id <= tww1.id) as SumOfWeightsOfPreviousRows
    FROM TableWithWeights tww1) as tww,
    -- Add column with random number within the range [0, SumOfWeights)
    (SELECT RAND()* sum(weight) as rnd    
     FROM TableWithWeights) r 
WHERE  
         (tww.SumOfWeightsOfPreviousRows <= r.rnd) 
     and ( r.rnd < tww.SumOfWeightsOfPreviousRows + tww.Weight)

要检查查询结果，我们可以运行 100 次。

DECLARE @count as int;
SET @count = 0;
WHILE ( @count < 100)
BEGIN
    -- This is the query that returns one random row with
    -- taking into account given row weights
    SELECT * FROM
       (SELECT tww1.*,     -- Select original table data
         -- Add column with the sum of all weights of previous rows
         (SELECT SUM(tww2.Weight)- tww1.Weight  
          FROM TableWithWeights tww2
          WHERE tww2.id <= tww1.id) as SumOfWeightsOfPreviousRows
        FROM TableWithWeights tww1) as tww,
       -- Add column with random number within the range [0, SumOfWeights)
       (SELECT RAND()* sum(weight) as rnd    
        FROM TableWithWeights) r 
    WHERE  
         (tww.SumOfWeightsOfPreviousRows <= r.rnd) 
     and ( r.rnd < tww.SumOfWeightsOfPreviousRows + tww.Weight) 

    -- Increase counter
    SET @count += 1
END

PS 该查询在 SQL Server 2008 R2 上进行了测试。当然查询可以优化（如果你明白了，这很容易做到）

score 0 · Accepted Answer

回答您最近的问题：

SELECT t.Column1, 
       t.Column2,
       (SELECT SUM(Column2) 
        FROM table t2
        WHERE t2.Column1 <= t.Column1) Column3
FROM table t

score 0 · Accepted Answer

我会用类似的东西来处理它

ORDER BY rand()*pow(<probability-field-name>,<n>)

对于不同的 n 值，您会将线性概率扭曲为简单的多项式。n 的小值（例如 0.5）会将概率压缩为 1，从而使不太可能的选择更有可能，n 的大值（例如 2）将起到相反的作用，并进一步降低已经不可能的值的概率。

score 0 · Accepted Answer

由于概率差异太大，您需要添加一个具有修正权重的计算域，该权重具有更均匀的概率分布。如何做到这一点取决于您的数据和首选分布。一种方法是将权重“标准化”为 1 到 10 之间的整数，以使最低概率永远不会比最高概率小十倍。

sql - 为 SQL 列提供范围

4 回答 4

Related

Reference