I have prepared a dataset the way I used to when using Datagenerator.flow_from_directory
from Keras. So basically, I have three folders "Train" "Valid" and "Test" and inside each one, I have folders named after the class they represent. However, instead of images, I have saved my data in those subfolders as compressed numpy files .npz.
I found that it is possible to create an input data pipeline to read .npz files using Tf.data
, however, the example in the documentation only shows how load a dataset that has labels saved in the .npz files as follows:
with np.load(path) as data:
train_examples = data['x_train']
train_labels = data['y_train']
test_examples = data['x_test']
test_labels = data['y_test']
train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels))
And there is no explanation of how to generate a dataset and automatically assign labels to the data based on its parent folder (the way it is done in flow_from_directory).
Is there a way to achieve that or should I manually import data from each folder and assign a hot-encoded label to each subset? Thank youu !!