To use DeepDIVA with your own image dataset, it is necessary to prepare the in the correct format. The data should be organized in the following general format:
root_folder/<split>/<class_name>/<*.jpg>
More in detail, the root folder root folder must contains the splits sub-folders as follow:
root_folder/train
root_folder/val
root_folder/test
where each split contains its respective data:
root_folder/train
- contains all training dataroot_folder/val
- contains all validation dataroot_folder/test
- contains all testing dataIn each of the three splits (train, val, test) different classes must be in a separate folder the class name. The file name can be arbitrary e.g. it does not have to be 0-* for classes 0 MNIST.
In general: root_folder/train/<class_name>/*.png
Example:
root_folder/train/dog/whatever.png
root_folder/train/dog/you.png
root_folder/train/dog/like.png
root_folder/train/cat/123.png
root_folder/train/cat/nsdf3.png
root_folder/train/cat/asd932_.png
If you do not have train, validation and test splits for your data the following section will help you prepare them.
If you do not have validation/test splits for your data:
Note: Set aside a certain amount of data (between 0.1 and 0.33 times of your entire dataset) as a test set. The test set is never to be used for training or for hyper-parameter optimization.
To create a validation set using DeepDIVA:
root_folder/train/cat
and all pictures of dogs in root_folder/train/dog
.python util/data/dataset_splitter.py --dataset-folder <path_to_dataset>
train
and val
folders and renames
the original folder to original_train
.original_train
instead of
making hard copies (in case you want to save space),
add the flag --symbolic
to the command.If you do not have a test set, you can create one by renaming the newly
created val
folder to test
and repeating the procedure to create another validation set.