我有一个wig文件,我使用我编写的调用rtracklayer
包的函数将它读入了一个类似 granges 的对象:
read_wig <- function(x, format='wig', genome='mm9') {
suppressMessages(library(rtracklayer))
merged_wig <- import.wig(x, format=format, genome=genome)
merged_wig <- keepSeqlevels(merged_wig, paste0('chr', c(seq(1,19), 'X', 'Y')), pruning.mode="coarse")
return(merged_wig)
}
wig <- read_wig('~/path/to/wig')
上面的代码返回:
> wig
UCSC track 'MEFES_K27AC.downsampled.sorted'
UCSCData object with 13274466 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 [ 1, 200] * | 1
[2] chr1 [201, 400] * | 2
[3] chr1 [401, 600] * | 3
[4] chr1 [601, 800] * | 4
[5] chr1 [801, 1000] * | 0
... ... ... ... . ...
[13274462] chrY [15901401, 15901600] * | 0
[13274463] chrY [15901601, 15901800] * | 0
[13274464] chrY [15901801, 15902000] * | 0
[13274465] chrY [15902001, 15902200] * | 0
[13274466] chrY [15902201, 15902400] * | 0
-------
seqinfo: 21 sequences from mm9 genome
现在有了这个对象,我想计算对象中每一行的每个范围周围的窗口内的分数总和。例如,我想计算范围 1-10000(本例中为 123)之间的分数总和,并将此条目添加为分数旁边的列。我想对每一行都这样做。
> expected_output
UCSC track 'MEFES_K27AC.downsampled.sorted'
UCSCData object with 13274466 ranges and 1 metadata column:
seqnames ranges strand | score score_10000
<Rle> <IRanges> <Rle> | <numeric> <numeric>
[1] chr1 [ 1, 200] * | 1 123
[2] chr1 [201, 400] * | 2 ...
[3] chr1 [401, 600] * | 3 ...
[4] chr1 [601, 800] * | 4 ...
[5] chr1 [801, 1000] * | 0 ...
... ... ... ... . ...
[13274462] chrY [15901401, 15901600] * | 0 ...
[13274463] chrY [15901601, 15901800] * | 0 ...
[13274464] chrY [15901801, 15902000] * | 0 ...
[13274465] chrY [15902001, 15902200] * | 0 ...
[13274466] chrY [15902201, 15902400] * | 0 ...
-------
seqinfo: 21 sequences from mm9 genome
理想情况下,我想添加计算分数范围从 1-10000、1-20000、1-30000 等到 100000 的列。
任何帮助将非常感激!
编辑:
假发文件可以在这里找到。