1

我需要查询具有“性别”列的表,如下所示:

| 编号 | 性别 | 姓名 |
-------------------------
| 1 | 中号 | 迈克尔 |
-------------------------
| 2 | F | 汉娜 |
-------------------------
| 3 | 中号 | 路易 |
-------------------------

我需要提取前 N 个结果,例如 80% 的男性和 20% 的女性。所以,如果我需要 1000 个结果,我想检索 800 个男性和 200 个女性。

  1. 是否可以在单个查询中完成?如何?

  2. 如果我没有足够的记录(假设我在上面的示例中只有 700 名男性)是否可以自动选择 700 / 300?

4

3 回答 3

2

基本上,您希望获得尽可能多的“M”,但不要超过您的百分比,然后获得足够的“F”,这样您总共有 1000 行:

with cte_m as (
    select * from Table1 where gender = 'M' limit (1000 * 0.8)
), cte as (
    select *, 0 as ord from cte_m
    union all
    select *, 1 as ord from Table1 where gender = 'F'
    order by ord
    limit 1000
)
select id, gender, name
from cte

sql fiddle demo

于 2013-08-20T04:55:05.583 回答
0

以下情况如何,假设您提供行数(“lmt”),并为 M/F 分布浮动:

create table gen (
id     integer,
gender text,
name   text
);

-- inserts 75% males and 25% females into the source table ("gen")
insert into gen select n, case when mod(n,5) = 0 then 'F' else 'M' end, (case when mod(n,5) = 0 then 'F' else 'M' end)||'_'||n::text
from generate_series(1,20000) n


-- extract 80/20 M vs F
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct),
     g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen)
select *
from g
where (gender = 'M' and rn <= (select lmt*mpct from conf))
or (gender = 'F' and rn <= (select lmt*fpct from conf));


-- Same query, to show the percent M vs F:
with conf as (select 1000 as lmt, .80::FLOAT as mpct, .20::FLOAT as fpct),
     g as (select id,gender,name,row_number() over (partition by gender order by gender) rn from gen)
select gender,count(*)
from (
    select *
    from g
    where (gender = 'M' and rn <= (select lmt*mpct from conf))
    or (gender = 'F' and rn <= (select lmt*fpct from conf))
    ) y
group by gender
于 2013-08-20T02:10:47.633 回答
-1

我没有 postgresql,但第一个场景很容易在 MS SQL 2012 中使用联合。我假设您可以在 postgre 中类似地执行此操作:

declare @MaxRows            INT
        ,@PercentageMale    INT
        ,@PercentageFemale  INT

select      @MaxRows = 1000
            ,@PercentageMale = 80
            ,@PercentageFemale = 20

select  top (@MaxRows*@PercentageMale/100)  *
FROM        someTable
WHERE       Gender = 'M'
UNION
select  top (@MaxRows*@PercentageFemale/100)    *
FROM        someTable
WHERE       Gender = 'F'

第二位实际上很容易。基本上你想选择男性的前 %,然后用女性填充列表的其余部分,直到总行数。女性的数量实际上并不相关:

declare @MaxRows            INT
        ,@PercentageMale    INT

select      @MaxRows = 1000
            ,@PercentageMale = 80

SELECT TOP @MaxRows *
FROM
(
    select  top (@MaxRows*@PercentageMale/100)  *
    FROM        someTable
    WHERE       Gender = 'M'
    UNION
    select  top (@MaxRows)  * --we never want more than @MaxRows 
                              --so no need to check for a %, 
                              --just fill in the rest of the data set
    FROM        someTable
    WHERE       Gender = 'F'
) a
于 2013-08-20T01:18:59.410 回答