7
[tbl_votes]
- id <!-- unique id of the vote) -->
- item_id <!-- vote belongs to item <id> -->
- vote <!-- number 1-10 -->

当然,我们可以通过获取来解决这个问题:

  • smallest observation所以)
  • ( lower quartilelq)
  • median我)
  • ( upper quartileuq)
  • largest observation(lo)

..一个接一个地使用多个查询,但我想知道是否可以通过单个查询来完成。

在 Oracle 中,我可以使用COUNT OVERand RATIO_TO_REPORT,但这在 mySQL 中不受支持。

对于那些不知道箱线图是什么的人:http ://en.wikipedia.org/wiki/Box_plot

任何帮助,将不胜感激。

4

3 回答 3

2

我在 PostgreSQL 中使用 PL/Python 找到了一个解决方案。

但是,如果其他人在 mySQL 中提出解决方案,我会保留这个问题。

CREATE TYPE boxplot_values AS (
  min       numeric,
  q1        numeric,
  median    numeric,
  q3        numeric,
  max       numeric
);

CREATE OR REPLACE FUNCTION _final_boxplot(strarr numeric[])
   RETURNS boxplot_values AS
$$
    x = strarr.replace("{","[").replace("}","]")
    a = eval(str(x))

    a.sort()
    i = len(a)
    return ( a[0], a[i/4], a[i/2], a[i*3/4], a[-1] )
$$
LANGUAGE 'plpythonu' IMMUTABLE;

CREATE AGGREGATE boxplot(numeric) (
  SFUNC=array_append,
  STYPE=numeric[],
  FINALFUNC=_final_boxplot,
  INITCOND='{}'
);

例子:

SELECT customer_id as cid, (boxplot(price)).*
FROM orders
GROUP BY customer_id;

   cid |   min   |   q1    | median  |   q3    |   max
-------+---------+---------+---------+---------+---------
  1001 | 7.40209 | 7.80031 |  7.9551 | 7.99059 | 7.99903
  1002 | 3.44229 | 4.38172 | 4.72498 | 5.25214 | 5.98736

资料来源: http: //www.christian-rossow.de/articles/PostgreSQL_boxplot_median_quartiles_aggregate_function.php

于 2011-12-26T22:00:13.417 回答
0

好吧,我可以在两个查询中做到这一点。执行第一个查询以获取四分位数的位置,然后使用 limit 函数在第二个查询中获取答案。

mysql> select (select floor(count(*)/4)) as first_q, (select floor(count(*)/2) from customer_data) as mid_pos, (select floor(count(*)/4*3) from customer_data ) 作为 customer_data order by measure limit 1 的third_q;

mysql> select min(measure),(select measure from customer_data order by measure limit 0,1) as firstq, (select measure from customer_data order by measure limit 5,1) as median, (select measure from customer_data order by measure limit 8 ,1) 作为 last_q,来自 customer_data 的 max(measure);

于 2012-01-04T18:08:38.267 回答
0

这是计算组e256内值范围的四分位数的示例e32,在这种情况下,(e32,e256)上的索引是必须的:

SELECT
  @group:=IF(e32=@group, e32, GREATEST(@index:=-1, e32)) as e32_,
  MIN(e256) as so,
  MAX(IF(lq_i=(@index:=@index+1), e256, NULL)) as lq,
  MAX(IF(me_i=@index, e256, NULL)) as me,
  MAX(IF(uq_i=@index, e256, NULL)) as uq,
  MAX(e256) as lo
FROM (SELECT @index:=NULL, @group:=NULL) as init, test t
JOIN (
  SELECT e32,
    COUNT(*) as cnt,
    (COUNT(*) div 4) as lq_i,    -- lq value index within the group
    (COUNT(*) div 2) as me_i,    -- me value index within the group
    (COUNT(*) * 3 div 4) as uq_i -- uq value index within the group
  FROM test
  GROUP BY e32
) as cnts
USING (e32)
GROUP BY e32;

如果不需要分组,查询会稍微简单一些。

PStest是我的随机值的游乐场表,其中e32是 Pythonint(random.expovariate(1.0) * 32)等的结果。

于 2012-01-04T21:05:44.627 回答