0

我正在尝试将列表操作到矩阵中,但由于某种原因它不起作用......

    import numpy as np

    dcd=np.load('dcd_250.npy')
    #4. write the dcd into an array
    print 'Length of dcd', len(dcd)

    al_gtps = np.array(dcd).reshape(250000,5416)            # reshape(SNP no, ind no)
    print 'Size of al_gtps', al_gtps.size                                                 

    gtps_T=al_gtps.T

    print 'Size of gtps_T', gtps_T.size

    allelic_gtps=[]
    check=[]
    #5. turn into strings
    for k in gtps_T:
        check=k
        allelic_gtps.append("%s" % ' '.join(map(str,k)))

    print 'Length of allelic_gtps', len(allelic_gtps)


    together=[]
    for each in allelic_gtps:
        for ch in each:
            if ch!=' ':
                together.append(ch)
        else:
             pass

    matrix=np.array(together).reshape(5416,500000)

    np.save('matrix.npy', matrix)

数组的长度应该是:

Length of dcd 1354000000
Size of al_gtps 1354000000
Size of gtps_T 1354000000
Length of allelic_gtps 5416
Length of together 2708000000

我的最终矩阵应该有 5416 行,每行 500 000 列。这给出了 2 708 000 000,这就是我在“一起”中所拥有的。但是,我收到以下错误消息:

    Traceback (most recent call last):
    File "p3_gtp_format.py", line 51, in <module>
    matrix=np.array(together).reshape(5416,500000)
    ValueError: total size of new array must be unchanged

这不应该是内存问题,因为我正在使用大型内存机器。同样的脚本适用于较小的数据集,其中矩阵的大小为 5416 行和 200 000 列。有任何想法吗?

4

0 回答 0