Prepare your Data

To use DeepDIVA with your own image dataset, it is necessary to prepare the in the correct format. The data should be organized in the following general format:

root_folder/<split>/<class_name>/<*.jpg>

More in detail, the root folder root folder must contains the splits sub-folders as follow:

root_folder/train
root_folder/val
root_folder/test

where each split contains its respective data:

  • root_folder/train - contains all training data
  • root_folder/val - contains all validation data
  • root_folder/test - contains all testing data

In each of the three splits (train, val, test) different classes must be in a separate folder the class name. The file name can be arbitrary e.g. it does not have to be 0-* for classes 0 MNIST.

In general: root_folder/train/<class_name>/*.png

Example:

root_folder/train/dog/whatever.png
root_folder/train/dog/you.png
root_folder/train/dog/like.png

root_folder/train/cat/123.png
root_folder/train/cat/nsdf3.png
root_folder/train/cat/asd932_.png

If you do not have train, validation and test splits for your data the following section will help you prepare them.

Splitting your data

If you do not have validation/test splits for your data:

Note: Set aside a certain amount of data (between 0.1 and 0.33 times of your entire dataset) as a test set. The test set is never to be used for training or for hyper-parameter optimization.

To create a validation set using DeepDIVA:

  • Put all of your training data in the train folder of the dataset.
    • E.g., For a dataset of cats and dogs, put all pictures of cats in root_folder/train/cat and all pictures of dogs in root_folder/train/dog.
  • Run the command: python util/data/dataset_splitter.py --dataset-folder <path_to_dataset>
    • This command splits the dataset into train and val folders and renames the original folder to original_train.
    • To create symbolic links to the data in original_train instead of making hard copies (in case you want to save space), add the flag --symbolic to the command.

If you do not have a test set, you can create one by renaming the newly created val folder to test and repeating the procedure to create another validation set.