0

我在 postgres11 中有下表

trial_id    name_split  drug_name_who
NCT01877395 imovax® rabies  imovax
NCT01877395 imovax® rabies  imovax rabies
NCT01877395 imovax® rabies  rabies
NCT01877395 imovax® rabies  rabies imovax
NCT00000374     olanzapine      olanzapine                  
NCT00000390     imipramine hydrochloride    imipramine hydrochloride    
NCT00000390     imipramine hydrochloride    imipramine                  

我想获取每个“trial_id name_split”的最大长度值的行。

我尝试了以下查询:

with x as (

        SELECT distinct on (trial_id,name_split) *
        FROM table
        WHERE 
            regexp_replace(name_split, '[^\w]', '#', 'g') ~* ('\y'||regexp_replace(drug_name_who, '[^\w]', '#', 'g')||'\y')
            and (length(drug_name_who) > 2)
            or (drug_name_who is null)
            ORDER  BY trial_id, name_split, length(drug_name_who) DESC NULLS LAST)
            
select * from x; 

查询可以正确获取每个 trial_id 的“drug_name_who”长度不相等的行,但是当每个 trial_id 的“drug_name_who”长度相等时,查询只选择一行(例如:NCT01877395,缺少以下行:NCT01877395 imovax® rabies伊莫瓦克斯)

所需的输出是:

trial_id    name_split  drug_name_who
NCT01877395 imovax® rabies  imovax
NCT01877395 imovax® rabies  rabies
NCT00000374     olanzapine      olanzapine                  
NCT00000390     imipramine hydrochloride    imipramine hydrochloride    

非常感谢这里的任何帮助

4

1 回答 1

0

distinct on总是每组只返回一行 - 如果order by子句不是确定性的,那么你会从关系中得到一个随机行。

如果要允许平局,则可以使用rank()and 子查询:

select *
from (
    select 
        t.*, 
        rank() over(
            partition by trial_id, name_split 
            order by length(drug_name_who) desc
        ) rn
    from mytable t
    where ...
) t
where rn = 1
于 2020-06-29T20:48:06.700 回答