0

这是我的桌子:

     chr  pos refalt
     ---------------
     chr1 123 AA
     chr1 123 AA
     chr1 123 AA
     chr1 123 AA
     chr1 123 AA
     chr1 123 AC
     chr1 123 AC
     chr1 123 AC
     chr2 456 TC
     chr3 789 GC

我需要计算具体的频率,我举个例子:

每行是一名患者,因此“chr1 123 AA”有 5 名患者,“chr1 123 AC”有 3 名。

我想知道A的频率

计算是:

13(A)
/16   , Because There are 13 people in "Chr1 123" who has A and in total they're 16 5XA (ref) 5XA(alt) + 3XA (ref) 3XC(alt)

对于 C:

3(C)/16 , Because only 3 people has C

我怎样才能在 SQL 中实现这一点是不是太复杂了?

Refalt是一varchar列,所以我需要拆分每个值以获得 ref 和 alt。

我知道这有点复杂,请向我询问更多详细信息。

4

1 回答 1

0

对于任何想知道(特别是生物学家)如何实现这一目标的人:

select substring(refalt from 1 for 1),  
           count( substring(refalt from 1 for 1) )::numeric / 
           (select 2*count(*) from ft_variants where pos_chr like 'chr1 12783') as frequency_allele1
    from ft_variants
    where pos_chr like 'chr1 12783'
    group by refalt

union

select substring(refalt from 2 for 1),  
       count( substring(refalt from 2 for 1) )::numeric / 
       (select 2*count(*) from ft_variants where pos_chr like 'chr1 12783') as frequency_allele2
from ft_variants
where pos_chr like 'chr1 12783'
group by refalt;
于 2016-11-12T14:34:56.517 回答