我想从根文件中提取数据并使其成形,以一个 numpy 数组/张量结束,以将其填充到神经网络中。我已经能够通过填充获得我想要的轨迹数据,将其转换为一个 numpy 数组,但我想用它们所来自的喷气机的数据来扩展我的数组。所以我有所有航迹的信息,每架喷气机的信息以及它们对应的航迹间隔。我的第一个本能是构建一个轨道形状的数组,并使用类似的东西np.dstack
来合并这两个。
import uproot4 as uproot
import numpy as np
import awkward1 as ak
def ak_into_np(ak_array):
data=np.dstack([ak.to_numpy(x) for x in ak_array])
return data
def get_data(filename,padding_size):
f=uproot.open(filename)
events= f["btagana/ttree;1"]
track_data=events.arrays(filter_name=["Track_pt","Track_phi","Track_eta","Track_dxy","Track_dz","Track_charge"])
jet_interval=events.arrays(filter_name=["Jet_nFirstTrack","Jet_nLastTrack"])
jet_interval=jet_interval["Jet_nLastTrack"]-jet_interval["Jet_nFirstTrack"]
jet_data=events.arrays(filter_name=["Jet_pt","Jet_phi","Jet_eta"])
arrays_track=ak.unzip(ak.fill_none(ak.pad_none(track_data, padding_size), 0))
arrays_interval=ak.unzip(ak.fill_none(ak.pad_none(jet_interval,padding_size),0))
arrays_jet=ak.unzip(ak.fill_none(ak.pad_none(jet_data,padding_size),0))
track=ak_into_np(arrays_track)
jet=ak_into_np(arrays_jet)
interval=ak_into_np(arrays_interval)
return track,jet,interval
这是我到目前为止的地方。出于效率原因,我希望在进入 numpy 之前能够在尴尬中实现这一点。我在 numpy 中尝试了以下方法:
def extend(track,jet,interval):
events,tracks,varstrack=(np.shape(track))
events,jets,varsjet=np.shape(jet)
jet_into_track_data=[]
for i in range(events):
dataloop=[]
for k in range(jets):
if interval[i][k][0]!=0 :
dataloop.append(np.broadcast_to(jet[i][k],(interval[i][k][0],varsjet)))
else
jet_into_track_data.append(dataloop)
return jet_into_track_data
但它已经花费了大约 3 秒,甚至没有达到我仅 2000 个事件的目标。目的基本是[track_variables] ->[track_variables,jet_variables if track is in intervall]
,要存起来[(event1)[[track_1],...,[track_padding_size]],...,(eventn)[[track_1],...,[track_padding_size]]]