1

语境:

我正在尝试进行一系列市场交易,并确定每种物品类型实际移动的金额。这几乎是我第一次尝试 MySql,所以查询很丑,但以下几乎可以工作:

SELECT types.typename,
       averages.type,
       averages.price,
       movement.sold,
       ( averages.price * movement.sold ) AS value
FROM   (SELECT type,
               Round(Avg(price)) AS price
        FROM   orders
        GROUP  BY type) AS averages
       INNER JOIN (SELECT type,
                          ( startingvolume - currentvolume ) AS sold
                   FROM   (SELECT type,
                                  Sum(volume)        AS currentVolume,
                                  Sum(volumeentered) startingVolume
                           FROM   orders
                           GROUP  BY type) AS movement
                   WHERE  ( startingvolume - currentvolume ) > 10000
                   ORDER  BY sold) AS movement
               ON averages.type = movement.type
       INNER JOIN invtypes AS types
               ON types.typeid = averages.type
ORDER  BY value DESC
LIMIT  10 ;

-

+------------------------------------+-------+---------+------------+------------------+
| typeName                           | type  | price   | sold       | value            |
+------------------------------------+-------+---------+------------+------------------+
| Dirt                               |    34 | 1904767 | 2670581874 | 5086836224393358 |
| Light Wood                         |  2629 |   42999 |    2756595 |     118530828405 |
| Dark Wood                          | 24509 |   47344 |    1107771 |      52446310224 |
| Stone                              | 21922 |   18386 |    1505884 |      27687183224 |
| Grass                              |   238 |    5643 |    4554470 |      25700874210 |
| Paper                              |  3814 |   25635 |     861006 |      22071888810 |
| Iron                               |  3699 |  320270 |      58833 |      18842444910 |
| Ink                                | 16275 |    8552 |    2200545 |      18819060840 |
| Loam                               |  2679 |    5759 |    2608771 |      15023912189 |
| Copper                             |   672 |  904612 |      14989 |      13559229268 |
+------------------------------------+-------+---------+------------+------------------+

上述数据的问题在于原始市场数据不可避免地被异常值破坏,如下所示:

select type, price from orders where type = 34 order by price desc limit 10;

-

+------+-----------+
| type | price     |
+------+-----------+
|   34 | 200000000 |
|   34 |     15.99 |
|   34 |     12.06 |
|   34 |        10 |
|   34 |      7.67 |
|   34 |       7.5 |
|   34 |       7.3 |
|   34 |      7.17 |
|   34 |       7.1 |
|   34 |      7.06 |
+------+-----------+

核心问题:

99%的市场数据是干净的,但是异常值破坏了平均值,MySql似乎没有中值功能。我找到了几个如何找到整个列的中位数的示例,但我需要每个项目的中位数。

我将如何确定每个项目的中位数而不是每个项目的平均值,或者在运行主查询之前有效地清理这些异常值的数据?

注意:我尝试通过 std 省略结果,但商品的价格从 $17 到 $10B 不等,而无论价格范围如何,偏差仍然相对较低。

4

1 回答 1

0

我不会触及您的原始查询,因为它非常复杂,但您可以做的一个选择是使用子查询来删除任何统计异常值。例如,如果您想从orders表中删除任何离平均值超过两个标准差的异常值,您可以使用:

SELECT t1.type,
       t1.price
FROM orders t1
INNER JOIN
(
    SELECT type,
           AVG(price) AS AVG,
           STD(price) AS STD
    FROM orders
    GROUP BY type
) t2
    ON t1.type = t2.type
WHERE t1.price < ABS(2*t2.STD - t2.AVG)  -- any value more than 2 standard devations
                                         -- away from the mean is discarded

演示在这里:

SQLFiddle

于 2016-08-26T03:50:41.870 回答