I'm trying to using deep learning (3D CNN) to perform brain disease classification. Currently, the input size is set to be 96*96*96. This is due to the original scan have a size of 256*256*256. I first removed the background by resizing to 192*192*192 then downsampled by a factor of 2.
However, my dataset only contains 825 subjects. I want to augment the dataset to sufficient size but it troubled me a lot.
First of all, 96^3 result in 884k voxels for input. From my past experience, the number of training samples should be a lot more than the number of input units. So my first question is: Am I right about the training data should be more than input units (in this case, more than 884k).
Secondly, to perform data augmentation, what techniques are recommended? So far I tried rotation around 3 axes with 10-degree interval. But that only augments the data size by a factor of 100.
Thirdly, when training models, I used to append input data to a list and used sklearn's train-test-split function to split them. Another way is to use keras' ImageDataGenerator.flow_from_directory. However, now I'm dealing with 3D data, and no memory could afford loading thousand of 3d arrays altogether. And ImageDataGenerator does not support nifti file format. Is there any way I could prepare all my 3d arrays for my training without exhausting my memory? (I would imagine something like ImageDataGenerator. Of course, this is under my understanding that data generator sends data into the model one batch at a time. Correct me if I'm wrong)
问问题
404 次
1 回答
0
我在 MRI 的数据上遇到了完全相同的问题,所以我对此无能为力,我宁愿找到答案。然而,在生成器的事情上,我在训练之前准备了我的数据集,在另一个脚本中我进行了所有的转换(调整大小、3d 数组准备、训练/测试拆分……),然后我把所有东西都放在了 3 个大数组中(训练,验证和测试)。我使用Numpy 保存函数保存这个数组。您还可以将每个数组拆分为更小的数组。如果它仍然不适合您的 RAM,您可以创建Keras Sequence的子类。
于 2020-04-07T08:55:56.000 回答