datamodules.RGB package

Subpackages

Submodules

datamodules.RGB.datamodule module

class DataModuleRGB(data_dir: str, data_folder_name: str, gt_folder_name: str, train_folder_name: str = 'train', val_folder_name: str = 'val', test_folder_name: str = 'test', pred_file_path_list: Optional[List[str]] = None, selection_train: Optional[Union[int, List[str]]] = None, selection_val: Optional[Union[int, List[str]]] = None, selection_test: Optional[Union[int, List[str]]] = None, num_workers: int = 4, batch_size: int = 8, shuffle: bool = True, drop_last: bool = True)[source]

Bases: AbstractDatamodule

The data module for a dataset where the classes of the ground truth are encoded as colors in the image. This data module expects full sized images. This does not mean, that the image needs to be in the original resolution but it can not consist of cropped images. If you want to work with cropped images use class: DataModuleCroppedRGB.

The structure of the folder should be as follows:

data_dir
├── train_folder_name
│   ├── data_folder_name
│   │   ├── image1.png
│   │   ├── ...
│   │   └── imageN.png
│   └── gt_folder_name
│       ├── image1.png
│       ├── ...
│       └── imageN.png
├── val_folder_name
│   ├── data_folder_name
│   │   ├── image1.png
│   │   ├── ...
│   │   └── imageN.png
│   └── gt_folder_name
│       ├── image1.png
│       ├── ...
│       └── imageN.png
└── test_folder_name
    ├── data_folder_name
    │   ├── image1.png
    │   ├── ...
    │   └── imageN.png
    └── gt_folder_name
        ├── image1.png
        ├── ...
        └── imageN.png
Parameters:
  • data_dir (str) – Path to the dataset folder.

  • data_folder_name (str) – Name of the folder where the images are stored.

  • gt_folder_name (str) – Name of the folder where the ground truth is stored.

  • train_folder_name (str) – Name of the folder where the training data is stored.

  • val_folder_name (str) – Name of the folder where the validation data is stored.

  • test_folder_name (str) – Name of the folder where the test data is stored.

  • pred_file_path_list (List[str]) – List of file paths to the images that should be predicted.

  • selection_train (Union[int, List[str], None]) – selection of the training data

  • selection_val (Union[int, List[str], None]) – selection of the validation data

  • selection_test (Union[int, List[str], None]) – selection of the test data

  • num_workers (int) – number of workers for the dataloaders

  • batch_size (int) – batch size

  • shuffle (bool) – shuffle the data

  • drop_last (bool) – drop the last batch if it is smaller than the batch size

get_output_filename_predict(index: int) str[source]

Returns the original filename of the doc image. You can just use this during testing!

Parameters:

index (int) – index of the sample

Returns:

original filename of the doc image

Return type:

str

get_output_filename_test(index: int) str[source]

Returns the original filename of the doc image. You can just use this during testing!

Parameters:

index (int) – index of the sample

Returns:

original filename of the doc image

Return type:

str

predict_dataloader() Union[DataLoader, List[DataLoader]][source]

Implement one or multiple PyTorch DataLoaders for prediction.

It’s recommended that all data downloads and preparation happen in prepare_data().

Note

Lightning adds the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.

Returns:

A torch.utils.data.DataLoader or a sequence of them specifying prediction samples.

Note

In the case where you return multiple prediction dataloaders, the predict_step() will have an argument dataloader_idx which matches the order here.

setup(stage: Optional[str] = None)[source]

Called at the beginning of fit (train + validate), validate, test, or predict. This is a good hook when you need to build models dynamically or adjust something about them. This hook is called on every process when using DDP.

Parameters:

stage – either 'fit', 'validate', 'test', or 'predict'

Example:

class LitModel(...):
    def __init__(self):
        self.l1 = None

    def prepare_data(self):
        download_data()
        tokenize()

        # don't do this
        self.something = else

    def setup(self, stage):
        data = load_data(...)
        self.l1 = nn.Linear(28, data.num_classes)
test_dataloader(*args, **kwargs) Union[DataLoader, List[DataLoader]][source]

Implement one or multiple PyTorch DataLoaders for testing.

For data processing use the following pattern:

  • download in prepare_data()

  • process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

Note

Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Returns:

A torch.utils.data.DataLoader or a sequence of them specifying testing samples.

Example:

def test_dataloader(self):
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (1.0,))])
    dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform,
                    download=True)
    loader = torch.utils.data.DataLoader(
        dataset=dataset,
        batch_size=self.batch_size,
        shuffle=False
    )

    return loader

# can also return multiple dataloaders
def test_dataloader(self):
    return [loader_a, loader_b, ..., loader_n]

Note

If you don’t need a test dataset and a test_step(), you don’t need to implement this method.

Note

In the case where you return multiple test dataloaders, the test_step() will have an argument dataloader_idx which matches the order here.

train_dataloader(*args, **kwargs) DataLoader[source]

Implement one or more PyTorch DataLoaders for training.

Returns:

A collection of torch.utils.data.DataLoader specifying training samples. In the case of multiple dataloaders, please see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

For data processing use the following pattern:

  • download in prepare_data()

  • process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

Note

Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Example:

# single dataloader
def train_dataloader(self):
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (1.0,))])
    dataset = MNIST(root='/path/to/mnist/', train=True, transform=transform,
                    download=True)
    loader = torch.utils.data.DataLoader(
        dataset=dataset,
        batch_size=self.batch_size,
        shuffle=True
    )
    return loader

# multiple dataloaders, return as list
def train_dataloader(self):
    mnist = MNIST(...)
    cifar = CIFAR(...)
    mnist_loader = torch.utils.data.DataLoader(
        dataset=mnist, batch_size=self.batch_size, shuffle=True
    )
    cifar_loader = torch.utils.data.DataLoader(
        dataset=cifar, batch_size=self.batch_size, shuffle=True
    )
    # each batch will be a list of tensors: [batch_mnist, batch_cifar]
    return [mnist_loader, cifar_loader]

# multiple dataloader, return as dict
def train_dataloader(self):
    mnist = MNIST(...)
    cifar = CIFAR(...)
    mnist_loader = torch.utils.data.DataLoader(
        dataset=mnist, batch_size=self.batch_size, shuffle=True
    )
    cifar_loader = torch.utils.data.DataLoader(
        dataset=cifar, batch_size=self.batch_size, shuffle=True
    )
    # each batch will be a dict of tensors: {'mnist': batch_mnist, 'cifar': batch_cifar}
    return {'mnist': mnist_loader, 'cifar': cifar_loader}
val_dataloader(*args, **kwargs) Union[DataLoader, List[DataLoader]][source]

Implement one or multiple PyTorch DataLoaders for validation.

The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

It’s recommended that all data downloads and preparation happen in prepare_data().

  • fit()

  • validate()

  • prepare_data()

  • setup()

Note

Lightning adds the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.

Returns:

A torch.utils.data.DataLoader or a sequence of them specifying validation samples.

Examples:

def val_dataloader(self):
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (1.0,))])
    dataset = MNIST(root='/path/to/mnist/', train=False,
                    transform=transform, download=True)
    loader = torch.utils.data.DataLoader(
        dataset=dataset,
        batch_size=self.batch_size,
        shuffle=False
    )

    return loader

# can also return multiple dataloaders
def val_dataloader(self):
    return [loader_a, loader_b, ..., loader_n]

Note

If you don’t need a validation dataset and a validation_step(), you don’t need to implement this method.

Note

In the case where you return multiple validation dataloaders, the validation_step() will have an argument dataloader_idx which matches the order here.

datamodules.RGB.datamodule_cropped module

class DataModuleCroppedRGB(data_dir: str, data_folder_name: str, gt_folder_name: str, train_folder_name: str = 'train', val_folder_name: str = 'val', test_folder_name: str = 'test', selection_train: Optional[Union[int, List[str]]] = None, selection_val: Optional[Union[int, List[str]]] = None, selection_test: Optional[Union[int, List[str]]] = None, crop_size: int = 256, num_workers: int = 4, batch_size: int = 8, shuffle: bool = True, drop_last: bool = True)[source]

Bases: AbstractDatamodule

The data module for a dataset where the classes of the ground truth are encoded as colors in the image. This data module expects cropped with a specific structure. The cropping can be done with the script class: tools/generate_cropped_dataset.py. If you do not use the script, make sure that the images are cropped and named in the same way as the script does. If you want to work with un-cropped images use class: DataModuleRGB.

The structure of the folder should be as follows:

data_dir
├── data_folder_name
│   ├── train_folder_name
│   │   ├── original_image_name_1
│   │   │   ├── image_crop_1.png
│   │   │   ├── image_crop_2.png
│   │   │   ├── ...
│   │   │   └── image_crop_N.png
│   ├── val_folder_name
│   │   ├── original_image_name_1
│   │   │   ├── image1.png
│   │   │   ├── image2.png
│   │   │   ├── ...
│   │   │   └── imageN.png
│   └── test_folder_name
│   │   ├── original_image_name_1
│   │   │   ├── image1.png
│   │   │   ├── image2.png
│   │   │   ├── ...
│   │   │   └── imageN.png
└── gt_folder_name
    ├── train_folder_name
    │   ├── original_image_name_1
    │   │   ├── image1.png
    │   │   ├── image2.png
    │   │   ├── ...
    │   │   └── imageN.png
    ├── val_folder_name
    │   ├── original_image_name_1
    │   │   ├── image1.png
    │   │   ├── image2.png
    │   │   ├── ...
    │   │   └── imageN.png
    └── test_folder_name
        ├── original_image_name_1
        │   ├── image1.png
        │   ├── image2.png
        │   ├── ...
        │   └── imageN.png
Parameters:
  • data_dir (str) – Path to the dataset folder.

  • data_folder_name (str) – Name of the folder where the images are stored.

  • gt_folder_name (str) – Name of the folder where the ground truth is stored.

  • train_folder_name (str) – Name of the folder where the training data is stored.

  • val_folder_name (str) – Name of the folder where the validation data is stored.

  • test_folder_name (str) – Name of the folder where the test data is stored.

  • selection_train (Union[int, List[str], None]) – selection of the training data

  • selection_val (Union[int, List[str], None]) – selection of the validation data

  • selection_test (Union[int, List[str], None]) – selection of the test data

  • num_workers (int) – number of workers for the dataloaders

  • batch_size (int) – batch size

  • shuffle (bool) – shuffle the data

  • drop_last (bool) – drop the last batch if it is smaller than the batch size

get_img_name_coordinates(index: int)[source]

Returns the original filename of the crop and its coordinate based on the index. You can just use this during testing!

Parameters:

index (int) – index of the crop

Returns:

filename of the crop and its coordinate

Return type:

Tuple[str, Tuple[int, int, int, int]]

setup(stage: Optional[str] = None)[source]

Called at the beginning of fit (train + validate), validate, test, or predict. This is a good hook when you need to build models dynamically or adjust something about them. This hook is called on every process when using DDP.

Parameters:

stage – either 'fit', 'validate', 'test', or 'predict'

Example:

class LitModel(...):
    def __init__(self):
        self.l1 = None

    def prepare_data(self):
        download_data()
        tokenize()

        # don't do this
        self.something = else

    def setup(self, stage):
        data = load_data(...)
        self.l1 = nn.Linear(28, data.num_classes)
test_dataloader(*args, **kwargs) Union[DataLoader, List[DataLoader]][source]

Implement one or multiple PyTorch DataLoaders for testing.

For data processing use the following pattern:

  • download in prepare_data()

  • process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

Note

Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Returns:

A torch.utils.data.DataLoader or a sequence of them specifying testing samples.

Example:

def test_dataloader(self):
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (1.0,))])
    dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform,
                    download=True)
    loader = torch.utils.data.DataLoader(
        dataset=dataset,
        batch_size=self.batch_size,
        shuffle=False
    )

    return loader

# can also return multiple dataloaders
def test_dataloader(self):
    return [loader_a, loader_b, ..., loader_n]

Note

If you don’t need a test dataset and a test_step(), you don’t need to implement this method.

Note

In the case where you return multiple test dataloaders, the test_step() will have an argument dataloader_idx which matches the order here.

train_dataloader(*args, **kwargs) DataLoader[source]

Implement one or more PyTorch DataLoaders for training.

Returns:

A collection of torch.utils.data.DataLoader specifying training samples. In the case of multiple dataloaders, please see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

For data processing use the following pattern:

  • download in prepare_data()

  • process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

Note

Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Example:

# single dataloader
def train_dataloader(self):
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (1.0,))])
    dataset = MNIST(root='/path/to/mnist/', train=True, transform=transform,
                    download=True)
    loader = torch.utils.data.DataLoader(
        dataset=dataset,
        batch_size=self.batch_size,
        shuffle=True
    )
    return loader

# multiple dataloaders, return as list
def train_dataloader(self):
    mnist = MNIST(...)
    cifar = CIFAR(...)
    mnist_loader = torch.utils.data.DataLoader(
        dataset=mnist, batch_size=self.batch_size, shuffle=True
    )
    cifar_loader = torch.utils.data.DataLoader(
        dataset=cifar, batch_size=self.batch_size, shuffle=True
    )
    # each batch will be a list of tensors: [batch_mnist, batch_cifar]
    return [mnist_loader, cifar_loader]

# multiple dataloader, return as dict
def train_dataloader(self):
    mnist = MNIST(...)
    cifar = CIFAR(...)
    mnist_loader = torch.utils.data.DataLoader(
        dataset=mnist, batch_size=self.batch_size, shuffle=True
    )
    cifar_loader = torch.utils.data.DataLoader(
        dataset=cifar, batch_size=self.batch_size, shuffle=True
    )
    # each batch will be a dict of tensors: {'mnist': batch_mnist, 'cifar': batch_cifar}
    return {'mnist': mnist_loader, 'cifar': cifar_loader}
val_dataloader(*args, **kwargs) Union[DataLoader, List[DataLoader]][source]

Implement one or multiple PyTorch DataLoaders for validation.

The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

It’s recommended that all data downloads and preparation happen in prepare_data().

  • fit()

  • validate()

  • prepare_data()

  • setup()

Note

Lightning adds the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.

Returns:

A torch.utils.data.DataLoader or a sequence of them specifying validation samples.

Examples:

def val_dataloader(self):
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (1.0,))])
    dataset = MNIST(root='/path/to/mnist/', train=False,
                    transform=transform, download=True)
    loader = torch.utils.data.DataLoader(
        dataset=dataset,
        batch_size=self.batch_size,
        shuffle=False
    )

    return loader

# can also return multiple dataloaders
def val_dataloader(self):
    return [loader_a, loader_b, ..., loader_n]

Note

If you don’t need a validation dataset and a validation_step(), you don’t need to implement this method.

Note

In the case where you return multiple validation dataloaders, the validation_step() will have an argument dataloader_idx which matches the order here.

Module contents