我正在尝试将列表操作到矩阵中,但由于某种原因它不起作用......
import numpy as np
dcd=np.load('dcd_250.npy')
#4. write the dcd into an array
print 'Length of dcd', len(dcd)
al_gtps = np.array(dcd).reshape(250000,5416) # reshape(SNP no, ind no)
print 'Size of al_gtps', al_gtps.size
gtps_T=al_gtps.T
print 'Size of gtps_T', gtps_T.size
allelic_gtps=[]
check=[]
#5. turn into strings
for k in gtps_T:
check=k
allelic_gtps.append("%s" % ' '.join(map(str,k)))
print 'Length of allelic_gtps', len(allelic_gtps)
together=[]
for each in allelic_gtps:
for ch in each:
if ch!=' ':
together.append(ch)
else:
pass
matrix=np.array(together).reshape(5416,500000)
np.save('matrix.npy', matrix)
数组的长度应该是:
Length of dcd 1354000000
Size of al_gtps 1354000000
Size of gtps_T 1354000000
Length of allelic_gtps 5416
Length of together 2708000000
我的最终矩阵应该有 5416 行,每行 500 000 列。这给出了 2 708 000 000,这就是我在“一起”中所拥有的。但是,我收到以下错误消息:
Traceback (most recent call last):
File "p3_gtp_format.py", line 51, in <module>
matrix=np.array(together).reshape(5416,500000)
ValueError: total size of new array must be unchanged
这不应该是内存问题,因为我正在使用大型内存机器。同样的脚本适用于较小的数据集,其中矩阵的大小为 5416 行和 200 000 列。有任何想法吗?