我试图使用 STAR 索引为受精后 99,50 小时 (99H50) 的突变库生成基因组索引,并带有 Lawson 实验室的注释。我使用的代码如下:
module load STAR; STAR --runThreadN 10 --runMode genomeGenerate --genomeDir /gpfs/ysm/scratch60/polimanti/ag2646/99H50_new_annotation/z10starindex75/ --genomeFastaFiles /gpfs/ysm/scratch60/polimanti/ag2646/Lawsonreference/genome.fa --sjdbGTFfile /gpfs/ysm/scratch60/polimanti/ag2646/Lawsonreference/genes.gtf --sjdbOverhang 75
the batch script used to submit the job for creation of such indices is
dsq --job-file z10starindex75.txt --job-name z10starindex75 -c 10 --mem=100G -t 10:00:00 --mail-type=ALL --mail-user=aranyak.goswami@yale.edu
我试图在我的 HPC 集群上运行这段代码,它给我一个错误,如下所示:
Jan 22 22:41:39 ..... started STAR run
Jan 22 22:41:39 ... starting to generate Genome files
Jan 22 22:42:04 ... starting to sort Suffix Array. This may take a long time...
Jan 22 22:42:09 ... sorting Suffix Array chunks and saving them to disk...
Jan 22 22:47:18 ... loading chunks from disk, packing SA...
Jan 22 22:47:42 ... finished generating suffix array
Jan 22 22:47:42 ... generating Suffix Array index
Jan 22 22:49:38 ... completed Suffix Array index
Jan 22 22:49:38 ..... processing annotations GTF
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
/bin/sh: line 1: 186783 Aborted STAR --runThreadN 10 --runMode genomeGenerate --genomeDir /gpfs/ysm/scratch60/polimanti/ag2646/99H50_new_annotation/z10starindex75/ --genomeFastaFiles /gpfs/ysm/scratch60/polimanti/ag2646/Lawsonreference/genome.fa --sjdbGTFfile /gpfs/ysm/scratch60/polimanti/ag2646/Lawsonreference/genes.gtf --sjdbOverhang 75
我用谷歌搜索,发现这些错误可能源于内存分配,因此我从集群中有足够空间的空间运行。
此类作业的内存使用量由作业 ID 给出:
47861791 Array Job ID: 47861791_0
Cluster: farnam User/Group: ag2646/nicoli State: FAILED (exit code 134)
Nodes: 1 Cores per node: 10 CPU Utilized: 00:36:34 CPU Efficiency: 45.14% of 01:21:00 core-walltime Job Wall-clock time: 00:08:06 Memory Utilized: 25.64 GB Memory Efficiency: 25.64% of 100.00 GB.
我浏览了互联网并试图找出解决方案。
- 我尝试将线程数从 10 减少到 1,以减少计算内存问题。
- 我尝试通过使用以下标志来分配特定的内存限制:
`limitGenomeGenerateRAM`
48000000000
(3) --genomeChrBinNbits 16
Still the error is creeping in.
First few lines of my GTF file is chr12 UMMS gene 6160446 6177944 . - . gene_id "LL0000000001"; gene_name "a1cf";
chr12 UMMS exon 6160446 6161260 . - . gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12 UMMS exon 6163727 6163869 . - . gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12 UMMS exon 6165086 6165222 . - . gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12 UMMS exon 6165305 6165498 . - . gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12 UMMS exon 6167117 6167396 . - . gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12 UMMS exon 6168940 6169037 . - . gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12 UMMS exon 6169982 6170146 . - . gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12 UMMS exon 6170412 6170650 . - . gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12 UMMS exon 6170731 6170861 . - . gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
基因组fasta文件的一些行如下:
chr1
gatcttaaacatttattccccctgcaaacattttcaatcattacattgtc
atttcccctccaaattaaatttagccagaggcgcacaacatacgacctct
aaaaaaggtgctgtaacatgtacctatatgcagcaccactatatgagagc
ggcatagcagtgtttagtcacttggttgctttgtttatattaacttgaaa
gtgtgttttagctattgagtttaaacaaagggagcggtttacattgaatt
aaaggcaactactgatgggttgtgtaatgtttcaaagagctgttgcagca
tgagtggaaaataaaaccgtattagtgctgcctggcccagtttggcacaa
aatggagcgattccattaagagaacgattcagcataagtggaacagcTAA
AGtttatgaaaatttttaatctggatgtagagaatctcataacacagaaa
我试图提供尽可能多的细节,任何帮助都会有所帮助。