尽管您无法将行显式附加到 hdf5 数据集,但您可以在创建数据集时使用 maxshape 关键字来发挥优势,从而允许您“调整”数据集的大小以适应新数据。(见http://docs.h5py.org/en/latest/faq.html#appending-data-to-a-dataset)
假设您的数据集的列数始终相同,您的代码最终将看起来像这样:
import h5py
output_file = h5py.File('your_output_file.h5', 'w')
#keep track of the total number of rows
total_rows = 0
for n, f in enumerate(file_list):
your_data = <get your data from f>
total_rows = total_rows + your_data.shape[0]
total_columns = your_data.shape[1]
if n == 0:
#first file; create the dummy dataset with no max shape
create_dataset = output_file.create_dataset("Name", (total_rows, total_columns), maxshape=(None, None))
#fill the first section of the dataset
create_dataset[:,:] = your_data
where_to_start_appending = total_rows
else:
#resize the dataset to accomodate the new data
create_dataset.resize(total_rows, axis=0)
create_dataset[where_to_start_appending:total_rows, :] = your_data
where_to_start_appending = total_rows
output_file.close()