regex - 从两个文件的列中获取值

Question

我原来的观察是这样的：

名称分析物
弹簧 0.1
冬天 0.4

为了计算 p 值，我做了自举模拟：

名称分析物
弹簧 0.001
冬天 0
弹簧 0
冬天 0.2
弹簧 0.03
冬天 0
弹簧 0.01
冬天 0.02
弹簧 0.1
冬天 0.5
弹簧 0
冬天 0.04
弹簧 0.2
冬天 0
弹簧 0
冬天 0.06
弹簧 0
冬天 0
......

现在我想计算经验 p 值：在原始数据中，冬季分析物 = 0.4 - 如果在自举数据中，冬季分析物 >=0.4（例如 1 次）并且已完成自举（例如 100 次），那么经验 p 值对于冬季分析物计算：

1/100 = 0.01

（数据与原始数据相同或更高的次数除以观察总数）对于弹簧分析物，p 值为：

2/100 = 0.02

我想用 awk 计算这些 p 值。我对春天的解决方案是：

awk -v VAR="spring" '($1==VAR && $2>=0.1) {n++} END {print VAR,"p-value=",n/100}'

spring p-value= 0.02 我需要的帮助是将原始文件（名称为 spring 和 Winter 及其分析物、观察结果和观察次数）传递到 awk 并分配它们。

score 4 · Accepted Answer

说明及脚本内容：

像这样运行它：`awk -f script.awk original bootstrap`

# Slurp the original file in an array a
# Ignore the header

NR==FNR && NR>1 {

# Index of this array will be type
# Value of that type will be original value

    a[$1]=$2
    next
}

# If in the bootstrap file value
# of second column is greater than original value

FNR>1 && $2>a[$1] { 

# Increment an array indexed at first column
# which is nothing but type

    b[$1]++
}

# Increment another array regardless to identify
# the number of times bootstrapping was done
{
    c[$1]++
}

# for each type in array a

END {
    for (type in a) {

# print the type and calculate empirical p-value 
# which is done by dividing the number of times higher value
# of a type was seen and total number of times
# bootstrapping was done. 

        print type, b[type]/c[type]
    }
}

测试：

$ cat original 
name Analyte
spring 0.1
winter 0.4

$ cat bootstrap 
name Analyte
spring 0.001
winter 0
spring 0
winter 0.2
spring 0.03
winter 0
spring 0.01
winter 0.02
spring 0.1
winter 0.5
spring 0
winter 0.04
spring 0.2
winter 0
spring 0
winter 0.06
spring 0
winter 0

$ awk -f s.awk original bootstrap 
spring 0.111111
winter 0.111111

分析：

Spring Original Value is 0.1
Winter Original Value is 0.4
Bootstrapping done is 9 times for this sample file
Count of values higher than Spring original value = 1
Count of values higher than Winter's original value = 1
So, 1/9 = 0.111111

score 2 · Accepted Answer

这对我有用，（GNU awk 3.1.6）：

FNR == NR {
     a[$1] = $2
     next
}

$2 > a[$1] {
    b[$1]++
    }

{
    c[$1]++
}

END {
    for (i in a) print i, "p-value=",b[i]/c[i]
    }

..输出是：

winter p-value= 0.111111
spring p-value= 0.111111

regex - 从两个文件的列中获取值

2 回答 2

说明及脚本内容：

像这样运行它：awk -f script.awk original bootstrap

测试：

分析：

这对我有用，（GNU awk 3.1.6）：

Related

Reference

像这样运行它：`awk -f script.awk original bootstrap`