1

我有一个基本上有列的数据库表date Date, int UserId, double Value

我希望能够进行查询,为所有用户的每个日期的价值提供 10% 和 90% 的百分位数,例如SELECT Date, Pct10(Value), Pct90(Value) from Table group by Date.

我知道在 MySQL 中使用Count(*)LIMIT计算行数来计算百分位数的不同方法,但是,我不知道如何对一个语句中的每个日期值迭代地应用它。

示例数据:

Date       | UserId  | Value
2013-01-01 |      0  |     1
2013-01-01 |      1  |     1
2013-01-01 |      2  |     1
2013-01-01 |      3  |     1
2013-01-01 |      4  |     2
2013-01-01 |      5  |     2
2013-01-01 |      6  |     2
2013-01-01 |      7  |     2
2013-01-01 |      8  |     2
2013-01-01 |      9  |     2
2013-01-01 |     10  |     9
2013-01-02 |      1  |     1
2013-01-02 |      9  |     1

预期的结果是

Date       | Pct10  | Pct90
2013-01-01 |     1  |     2
2013-01-02 |     1  |     1
4

1 回答 1

0

我不确定获得百分位数。我正在使用基于从下面的 mysql 中选择第 n 个百分位的子查询,但我不太确定我是否已正确修改它。我的回答的重点在于子查询的组合。

随着表大小的增加,以下查询会很慢并且呈指数级变慢,但它应该可以满足您的需求:

SELECT p10.Date, Pct10, Pct90
FROM (
    SELECT Date, count(Value) AS Pct10
    FROM mydata 
    GROUP BY Date, Value
    ORDER BY ABS(0.1-(count(Value)/(select count(*) from mydata)))
    LIMIT 1) AS p10
INNER JOIN (
    SELECT Date, count(Value) AS Pct9
    FROM mydata 
    GROUP BY Date, Value
    ORDER BY ABS(0.9-(count(Value)/(select count(*) from mydata)))
    LIMIT 1) AS p90 ON p10.Date = p90.Date
GROUP BY p1.Date

这是我的第二个想法。如果它有效,它将比我列出的第一个更快、更高效,但对于较大的表来说仍然相当慢。

SELECT p10.Date, count(Value) AS Pct10, Pct90
FROM mydata p10
INNER JOIN (
    SELECT Date, count(Value) AS Pct90
    FROM mydata 
    GROUP BY Date, Value
    ORDER BY ABS(0.9-(count(Value)/(select count(*) from mydata)))
    LIMIT 1) AS p90 ON p10.Date = p90.Date
GROUP BY Date, Value
ORDER BY ABS(0.1-(count(Value)/(select count(*) from mydata)))
LIMIT 1

编辑

好的,头脑风暴时间。鉴于这是一个日期百分位数的子查询(我什至不确定它是如何工作的):

    SELECT Date, count(Value) AS Pct90
    FROM mydata 
    WHERE Date = ?
    GROUP BY Value
    ORDER BY ABS(0.9-(count(Value)/(select count(*) from mydata WHERE Date = ?)))
    LIMIT 1

然后让我们尝试修复 ORDER BY:

   SELECT Date, count(Value) as Pct90
   FROM mydata
   INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d 
       ON d.Date = mydata.Date
   GROUP BY Date, Value
   ORDER BY (ABS(0.9-(COUNT(Value)/d.DateTotal)))
   LIMIT 1

如果您在我之前的示例中使用此模式,也许它会起作用。

编辑 2

所以,我们又来了,因为我们不能使用 LIMIT 1 (我之前应该意识到的)。我实际上在我自己的数据库上测试了以下内容(希望我将所有字段和表名改回它们应该的样子!),它似乎工作。您必须为 p10 再次执行此操作并将两者结合起来。

--- removed due to typos ---

编辑 3

我在Edit 2中发现了一些错误,所以我将其删除。这是整个百分比查询。据我所知,此查询适用于我的数据库(使用不同的字段和表)。

SELECT n.Date, n.Pct AS Pct10, n.Value AS Pct10Value, q.Pct AS Pct90, q.Value AS Pct90Value FROM (
    SELECT p.Date, p.Pct, p.Value, m.Selector FROM (
        SELECT mydata.Date, Value, COUNT(Value) as Pct, (ABS(0.1-(COUNT(Value)/d.DateTotal))) AS Abs10
        FROM mydata
        INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d 
            ON d.Date = mydata.Date
        GROUP BY Date, Value
        ) p
    INNER JOIN (
        SELECT Date, MIN(Abs10) AS Selector FROM (
            SELECT mydata.Date, Value, COUNT(Value) as Pct, (ABS(0.1-(COUNT(Value)/d.DateTotal))) AS Abs10
            FROM mydata
            INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d 
            ON d.Date = mydata.Date
            GROUP BY Date, Value
        ) x GROUP BY Date
    ) AS m ON m.Selector = p.Abs10
    GROUP BY p.Date) n
INNER JOIN (
    SELECT p.Date, p.Pct, p.Value, m.Selector FROM (
        SELECT mydata.Date, Value, COUNT(Value) as Pct, (ABS(0.9-(COUNT(Value)/d.DateTotal))) AS Abs90
        FROM mydata
        INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d 
            ON d.Date = mydata.Date
        GROUP BY Date, Value
        ) p
    INNER JOIN (
        SELECT Date, MIN(Abs90) AS Selector FROM (
            SELECT mydata.Date, Value, COUNT(Value) as Pct, (ABS(0.9-(COUNT(Value)/d.DateTotal))) AS Abs90
            FROM mydata
            INNER JOIN (SELECT Date, COUNT(*) AS DateTotal FROM mydata GROUP BY Date) AS d 
                ON d.Date = mydata.Date
            GROUP BY Date, Value
        ) x GROUP BY Date
    ) AS m ON m.Selector = p.Abs90
    GROUP BY p.Date) q ON q.Date = n.Date
于 2013-10-21T18:33:57.277 回答