我有一个包含 80 个 5-D zarr 文件的列表(mylist),其结构如下(T、F、B、Az、El)。该数组的形状为 [24x4096x2016x24x8]。
我想提取切片数据并使用以下函数沿某个轴运行概率
def GetPolarData(mylist, freq, FreqLo, FreqHi):
'''
This function will take the list of zarr files (T, F, B, Az, El), open them, used selected frequency to return an array
of files with Azimuth and Elevation probabilities
'''
ChanIndx = FreqCut(FreqLo, FreqHi,freq)
if len(ChanIndx) != 0:
MyData = []
for i in range(len(mylist)):
print('Adding file {} : {}'.format(i,mylist[i][32:]))
try:
zarrf = xr.open_zarr(mylist[i], group = 'arr')
m = zarrf.master.sum(dim = ['time','baseline'])
m = m[ChanIndx].sum(dim = ['frequency'])
c = zarrf.counter.sum(dim = ['time','baseline'])
c = c[ChanIndx].sum(dim = ['frequency'])
p = m.astype(float)/c.astype(float)
MyData.append(p)
except Exception as e:
print(e)
continue
else:
print("Something went wrong in Frequency selection")
print("##########################################")
print("This will be contribution to selected band")
print("##########################################")
print(f"Min {np.nanmin(MyData)*100:.3f}% ")
print(f"Max {np.nanmax(MyData)*100:.3f}% ")
print(f"Average {np.nanmean(MyData)*100:.3f}% ")
return(MyData)
如果我使用以下方法调用该函数,
FreqLo = 470.
FreqHi = 854.
MyTVData =np.array(GetPolarData(AllZarrList,Freq, FreqLo, FreqHi))
我发现在 40 核、256 GB RAM 上运行需要很长时间(超过 3 小时)
有没有办法让它运行得更快?
谢谢