python - 用 x 轴偏移在 python (5GB) 中绘制非常大的文件

Question

我正在尝试使用 python 和 matplotlib 绘制一个非常大的文件（~5 GB）。我能够将整个文件加载到内存中（机器中可用的总量为 16 GB），但是当我使用简单的 imshow 绘制它时，我得到了分段错误。这很可能是我设置为 15000 的 ulimit 但我不能设置得更高。我得出的结论是我需要批量绘制我的数组，因此制作了一个简单的代码来做到这一点。我的主要问题是，当我绘制一批大数组时，x 坐标总是从 0 开始，我无法叠加图像来创建最终的大数组。如果您有任何建议，请告诉我。此外，由于管理权限，我无法在这台机器上安装像“Image”这样的新包。这是读取我的数组的前 12 行并制作 3 个图的代码示例。

import os
import sys
import scipy
import numpy as np
import pylab as pl
import matplotlib as mpl
import matplotlib.cm as cm
from optparse import OptionParser
from scipy import fftpack
from scipy.fftpack import *
from cmath import *
from pylab import *
import pp
import fileinput
import matplotlib.pylab as plt
import pickle

def readalllines(file1,rows,freqs):
    file = open(file1,'r')
    sizer = int(rows*freqs)
    i = 0
    q = np.zeros(sizer,'float')
    for i in range(rows*freqs):
        s =file.readline()
        s = s.split()
        #print s[4],q[i]
        q[i] = float(s[4])
        if i%262144 == 0:
            print '\r ',int(i*100.0/(337*262144)),'  percent complete',
        i += 1
    file.close()
    return q

parser = OptionParser()
parser.add_option('-f',dest="filename",help="Read dynamic spectrum from FILE",metavar="FILE")
parser.add_option('-t',dest="dtime",help="The time integration used in seconds, default 10",default=10)
parser.add_option('-n',dest="dfreq",help="The bandwidth of each frequency channel in Hz",default=11.92092896)
parser.add_option('-w',dest="reduce",help="The chuncker divider in frequency channels, integer default 16",default=16)
(opts,args) = parser.parse_args()
rows=12
freqs = 262144

file1 = opts.filename

s = readalllines(file1,rows,freqs)
s = np.reshape(s,(rows,freqs))
s = s.T
print s.shape
#raw_input()

#s_shift = scipy.fftpack.fftshift(s)


#fig = plt.figure()

#fig.patch.set_alpha(0.0)
#axes = plt.axes()
#axes.patch.set_alpha(0.0)
###plt.ylim(0,8)

plt.ion()

i = 0
for o in range(0,rows,4):

    fig = plt.figure()
    #plt.clf()

    plt.imshow(s[:,o:o+4],interpolation='nearest',aspect='auto', cmap=cm.gray_r, origin='lower')
    if o == 0:
        axis([0,rows,0,freqs])
    fdf, fdff = xticks()
    print fdf
    xticks(fdf+o)
    print xticks()
    #axis([o,o+4,0,freqs])
    plt.draw()

    #w, h = fig.canvas.get_width_height()
    #buf = np.fromstring(fig.canvas.tostring_argb(), dtype=np.uint8)
    #buf.shape = (w,h,4)

    #buf = np.rol(buf, 3, axis=2)
    #w,h,_ = buf.shape
    #img = Image.fromstring("RGBA", (w,h),buf.tostring())

    #if prev:
    #    prev.paste(img)
    #    del prev
    #prev = img
    i += 1
pl.colorbar()
pl.show()

score 4 · Accepted Answer

如果您在图形链中绘制任何超过~2k 像素的数组，将以某种方式对图像进行下采样以将其显示在监视器上。我会建议以受控方式进行下采样，例如

data = convert_raw_data_to_fft(args) # make sure data is row major
def ds_decimate(row,step = 100):
    return row[::step]
def ds_sum(row,step):
    return np.sum(row[:step*(len(row)//step)].reshape(-1,step),1)
# as per suggestion from tom10 in comments
def ds_max(row,step): 
    return np.max(row[:step*(len(row)//step)].reshape(-1,step),1)
data_plotable = [ds_sum(d) for d in data] # plug in which ever function you want

或插值。

score 2 · Accepted Answer

在绘制图像时，Matplotlib 的内存效率非常低。它创建了几个全分辨率中间数组，这可能就是你的程序崩溃的原因。

正如@tcaswell 建议的那样，一种解决方案是在将图像输入 matplotlib 之前对其进行下采样。

我还编写了一些包装代码来根据您的屏幕分辨率自动进行这种下采样。它位于https://github.com/ChrisBeaumont/mpl-modest-image，如果它有用的话。它还具有动态重新采样图像的优点，因此您仍然可以平移和缩放而不牺牲所需的分辨率。

score 0 · Accepted Answer

extent=(left, right, bottom, top)我认为您只是缺少plt.imshow.

x = np.random.randn(2, 10)
y = np.ones((4, 10))
x[0] = 0  # To make it clear which side is up, etc
y[0] = -1

plt.imshow(x, extent=(0, 10, 0, 2))
plt.imshow(y, extent=(0, 10, 2, 6))
# This is necessary, else the plot gets scaled and only shows the last array
plt.ylim(0, 6)
plt.colorbar()
plt.show()

在此处输入图像描述

python - 用 x 轴偏移在 python (5GB) 中绘制非常大的文件

3 回答 3

Related

Reference