datamodules.RGB package

Subpackages

Submodules

datamodules.RGB.datamodule module

class DataModuleRGB(data_dir: str, data_folder_name: str, gt_folder_name: str, train_folder_name: str = 'train', val_folder_name: str = 'val', test_folder_name: str = 'test', pred_file_path_list: Optional[List[str]] = None, selection_train: Optional[Union[int, List[str]]] = None, selection_val: Optional[Union[int, List[str]]] = None, selection_test: Optional[Union[int, List[str]]] = None, num_workers: int = 4, batch_size: int = 8, shuffle: bool = True, drop_last: bool = True)[source]

Bases: AbstractDatamodule

The data module for a dataset where the classes of the ground truth are encoded as colors in the image. This data module expects full sized images. This does not mean, that the image needs to be in the original resolution but it can not consist of cropped images. If you want to work with cropped images use class: DataModuleCroppedRGB.

The structure of the folder should be as follows:

data_dir
├── train_folder_name
│   ├── data_folder_name
│   │   ├── image1.png
│   │   ├── ...
│   │   └── imageN.png
│   └── gt_folder_name
│       ├── image1.png
│       ├── ...
│       └── imageN.png
├── val_folder_name
│   ├── data_folder_name
│   │   ├── image1.png
│   │   ├── ...
│   │   └── imageN.png
│   └── gt_folder_name
│       ├── image1.png
│       ├── ...
│       └── imageN.png
└── test_folder_name
    ├── data_folder_name
    │   ├── image1.png
    │   ├── ...
    │   └── imageN.png
    └── gt_folder_name
        ├── image1.png
        ├── ...
        └── imageN.png

Parameters:

data_dir (str) – Path to the dataset folder.
data_folder_name (str) – Name of the folder where the images are stored.
gt_folder_name (str) – Name of the folder where the ground truth is stored.
train_folder_name (str) – Name of the folder where the training data is stored.
val_folder_name (str) – Name of the folder where the validation data is stored.
test_folder_name (str) – Name of the folder where the test data is stored.
pred_file_path_list (List[str]) – List of file paths to the images that should be predicted.
selection_train (Union[int, List[str], None]) – selection of the training data
selection_val (Union[int, List[str], None]) – selection of the validation data
selection_test (Union[int, List[str], None]) – selection of the test data
num_workers (int) – number of workers for the dataloaders
batch_size (int) – batch size
shuffle (bool) – shuffle the data
drop_last (bool) – drop the last batch if it is smaller than the batch size

get_output_filename_predict(index: int) → str[source]

Returns the original filename of the doc image. You can just use this during testing!

Parameters:: index (int) – index of the sample
Returns:: original filename of the doc image
Return type:: str

get_output_filename_test(index: int) → str[source]

Returns the original filename of the doc image. You can just use this during testing!

Parameters:: index (int) – index of the sample
Returns:: original filename of the doc image
Return type:: str

predict_dataloader() → Union[DataLoader, List[DataLoader]][source]

Implement one or multiple PyTorch DataLoaders for prediction.

It’s recommended that all data downloads and preparation happen in prepare_data().

predict()
prepare_data()
setup()

Note

Lightning adds the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.

Returns:: A torch.utils.data.DataLoader or a sequence of them specifying prediction samples.

Note

In the case where you return multiple prediction dataloaders, the predict_step() will have an argument dataloader_idx which matches the order here.

setup(stage: Optional[str] = None)[source]

Called at the beginning of fit (train + validate), validate, test, or predict. This is a good hook when you need to build models dynamically or adjust something about them. This hook is called on every process when using DDP.

Parameters:: stage – either 'fit', 'validate', 'test', or 'predict'

Example:

class LitModel(...):
    def __init__(self):
        self.l1 = None

    def prepare_data(self):
        download_data()
        tokenize()

        # don't do this
        self.something = else

    def setup(self, stage):
        data = load_data(...)
        self.l1 = nn.Linear(28, data.num_classes)

test_dataloader(*args, **kwargs) → Union[DataLoader, List[DataLoader]][source]

Implement one or multiple PyTorch DataLoaders for testing.

For data processing use the following pattern:

download in prepare_data()

process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

test()
prepare_data()
setup()

Note

Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Returns:: A torch.utils.data.DataLoader or a sequence of them specifying testing samples.

Example:

def test_dataloader(self):
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (1.0,))])
    dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform,
                    download=True)
    loader = torch.utils.data.DataLoader(
        dataset=dataset,
        batch_size=self.batch_size,
        shuffle=False
    )

    return loader

# can also return multiple dataloaders
def test_dataloader(self):
    return [loader_a, loader_b, ..., loader_n]

Note

If you don’t need a test dataset and a test_step(), you don’t need to implement this method.

Note

In the case where you return multiple test dataloaders, the test_step() will have an argument dataloader_idx which matches the order here.

train_dataloader(*args, **kwargs) → DataLoader[source]

Implement one or more PyTorch DataLoaders for training.

Returns:: A collection of torch.utils.data.DataLoader specifying training samples. In the case of multiple dataloaders, please see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

For data processing use the following pattern:

download in prepare_data()

process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

fit()
prepare_data()
setup()

Note

Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Example:

# single dataloader
def train_dataloader(self):
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (1.0,))])
    dataset = MNIST(root='/path/to/mnist/', train=True, transform=transform,
                    download=True)
    loader = torch.utils.data.DataLoader(
        dataset=dataset,
        batch_size=self.batch_size,
        shuffle=True
    )
    return loader

# multiple dataloaders, return as list
def train_dataloader(self):
    mnist = MNIST(...)
    cifar = CIFAR(...)
    mnist_loader = torch.utils.data.DataLoader(
        dataset=mnist, batch_size=self.batch_size, shuffle=True
    )
    cifar_loader = torch.utils.data.DataLoader(
        dataset=cifar, batch_size=self.batch_size, shuffle=True
    )
    # each batch will be a list of tensors: [batch_mnist, batch_cifar]
    return [mnist_loader, cifar_loader]

# multiple dataloader, return as dict
def train_dataloader(self):
    mnist = MNIST(...)
    cifar = CIFAR(...)
    mnist_loader = torch.utils.data.DataLoader(
        dataset=mnist, batch_size=self.batch_size, shuffle=True
    )
    cifar_loader = torch.utils.data.DataLoader(
        dataset=cifar, batch_size=self.batch_size, shuffle=True
    )
    # each batch will be a dict of tensors: {'mnist': batch_mnist, 'cifar': batch_cifar}
    return {'mnist': mnist_loader, 'cifar': cifar_loader}

val_dataloader(*args, **kwargs) → Union[DataLoader, List[DataLoader]][source]

Implement one or multiple PyTorch DataLoaders for validation.

The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

It’s recommended that all data downloads and preparation happen in prepare_data().

fit()
validate()
prepare_data()
setup()

Note

Lightning adds the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.

Returns:: A torch.utils.data.DataLoader or a sequence of them specifying validation samples.

Examples:

def val_dataloader(self):
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (1.0,))])
    dataset = MNIST(root='/path/to/mnist/', train=False,
                    transform=transform, download=True)
    loader = torch.utils.data.DataLoader(
        dataset=dataset,
        batch_size=self.batch_size,
        shuffle=False
    )

    return loader

# can also return multiple dataloaders
def val_dataloader(self):
    return [loader_a, loader_b, ..., loader_n]

Note

If you don’t need a validation dataset and a validation_step(), you don’t need to implement this method.

Note

In the case where you return multiple validation dataloaders, the validation_step() will have an argument dataloader_idx which matches the order here.

datamodules.RGB.datamodule_cropped module

class DataModuleCroppedRGB(data_dir: str, data_folder_name: str, gt_folder_name: str, train_folder_name: str = 'train', val_folder_name: str = 'val', test_folder_name: str = 'test', selection_train: Optional[Union[int, List[str]]] = None, selection_val: Optional[Union[int, List[str]]] = None, selection_test: Optional[Union[int, List[str]]] = None, crop_size: int = 256, num_workers: int = 4, batch_size: int = 8, shuffle: bool = True, drop_last: bool = True)[source]

Bases: AbstractDatamodule

The data module for a dataset where the classes of the ground truth are encoded as colors in the image. This data module expects cropped with a specific structure. The cropping can be done with the script class: tools/generate_cropped_dataset.py. If you do not use the script, make sure that the images are cropped and named in the same way as the script does. If you want to work with un-cropped images use class: DataModuleRGB.

The structure of the folder should be as follows:

data_dir
├── data_folder_name
│   ├── train_folder_name
│   │   ├── original_image_name_1
│   │   │   ├── image_crop_1.png
│   │   │   ├── image_crop_2.png
│   │   │   ├── ...
│   │   │   └── image_crop_N.png
│   ├── val_folder_name
│   │   ├── original_image_name_1
│   │   │   ├── image1.png
│   │   │   ├── image2.png
│   │   │   ├── ...
│   │   │   └── imageN.png
│   └── test_folder_name
│   │   ├── original_image_name_1
│   │   │   ├── image1.png
│   │   │   ├── image2.png
│   │   │   ├── ...
│   │   │   └── imageN.png
└── gt_folder_name
    ├── train_folder_name
    │   ├── original_image_name_1
    │   │   ├── image1.png
    │   │   ├── image2.png
    │   │   ├── ...
    │   │   └── imageN.png
    ├── val_folder_name
    │   ├── original_image_name_1
    │   │   ├── image1.png
    │   │   ├── image2.png
    │   │   ├── ...
    │   │   └── imageN.png
    └── test_folder_name
        ├── original_image_name_1
        │   ├── image1.png
        │   ├── image2.png
        │   ├── ...
        │   └── imageN.png

Parameters:

data_dir (str) – Path to the dataset folder.
data_folder_name (str) – Name of the folder where the images are stored.
gt_folder_name (str) – Name of the folder where the ground truth is stored.
train_folder_name (str) – Name of the folder where the training data is stored.
val_folder_name (str) – Name of the folder where the validation data is stored.
test_folder_name (str) – Name of the folder where the test data is stored.
selection_train (Union[int, List[str], None]) – selection of the training data
selection_val (Union[int, List[str], None]) – selection of the validation data
selection_test (Union[int, List[str], None]) – selection of the test data
num_workers (int) – number of workers for the dataloaders
batch_size (int) – batch size
shuffle (bool) – shuffle the data
drop_last (bool) – drop the last batch if it is smaller than the batch size

get_img_name_coordinates(index: int)[source]

Returns the original filename of the crop and its coordinate based on the index. You can just use this during testing!

Parameters:: index (int) – index of the crop
Returns:: filename of the crop and its coordinate
Return type:: Tuple[str, Tuple[int, int, int, int]]

setup(stage: Optional[str] = None)[source]

Called at the beginning of fit (train + validate), validate, test, or predict. This is a good hook when you need to build models dynamically or adjust something about them. This hook is called on every process when using DDP.

Parameters:: stage – either 'fit', 'validate', 'test', or 'predict'

Example:

class LitModel(...):
    def __init__(self):
        self.l1 = None

    def prepare_data(self):
        download_data()
        tokenize()

        # don't do this
        self.something = else

    def setup(self, stage):
        data = load_data(...)
        self.l1 = nn.Linear(28, data.num_classes)

test_dataloader(*args, **kwargs) → Union[DataLoader, List[DataLoader]][source]

Implement one or multiple PyTorch DataLoaders for testing.

For data processing use the following pattern:

download in prepare_data()

process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

test()
prepare_data()
setup()

Note

Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Returns:: A torch.utils.data.DataLoader or a sequence of them specifying testing samples.

Example:

def test_dataloader(self):
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (1.0,))])
    dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform,
                    download=True)
    loader = torch.utils.data.DataLoader(
        dataset=dataset,
        batch_size=self.batch_size,
        shuffle=False
    )

    return loader

# can also return multiple dataloaders
def test_dataloader(self):
    return [loader_a, loader_b, ..., loader_n]

Note

If you don’t need a test dataset and a test_step(), you don’t need to implement this method.

Note

In the case where you return multiple test dataloaders, the test_step() will have an argument dataloader_idx which matches the order here.

train_dataloader(*args, **kwargs) → DataLoader[source]

Implement one or more PyTorch DataLoaders for training.

Returns:: A collection of torch.utils.data.DataLoader specifying training samples. In the case of multiple dataloaders, please see this section.

The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

For data processing use the following pattern:

download in prepare_data()

process and split in setup()

However, the above are only necessary for distributed processing.

Warning

do not assign state in prepare_data

fit()
prepare_data()
setup()

Note

Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.

Example:

# single dataloader
def train_dataloader(self):
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (1.0,))])
    dataset = MNIST(root='/path/to/mnist/', train=True, transform=transform,
                    download=True)
    loader = torch.utils.data.DataLoader(
        dataset=dataset,
        batch_size=self.batch_size,
        shuffle=True
    )
    return loader

# multiple dataloaders, return as list
def train_dataloader(self):
    mnist = MNIST(...)
    cifar = CIFAR(...)
    mnist_loader = torch.utils.data.DataLoader(
        dataset=mnist, batch_size=self.batch_size, shuffle=True
    )
    cifar_loader = torch.utils.data.DataLoader(
        dataset=cifar, batch_size=self.batch_size, shuffle=True
    )
    # each batch will be a list of tensors: [batch_mnist, batch_cifar]
    return [mnist_loader, cifar_loader]

# multiple dataloader, return as dict
def train_dataloader(self):
    mnist = MNIST(...)
    cifar = CIFAR(...)
    mnist_loader = torch.utils.data.DataLoader(
        dataset=mnist, batch_size=self.batch_size, shuffle=True
    )
    cifar_loader = torch.utils.data.DataLoader(
        dataset=cifar, batch_size=self.batch_size, shuffle=True
    )
    # each batch will be a dict of tensors: {'mnist': batch_mnist, 'cifar': batch_cifar}
    return {'mnist': mnist_loader, 'cifar': cifar_loader}

val_dataloader(*args, **kwargs) → Union[DataLoader, List[DataLoader]][source]

Implement one or multiple PyTorch DataLoaders for validation.

The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.

It’s recommended that all data downloads and preparation happen in prepare_data().

fit()
validate()
prepare_data()
setup()

Note

Lightning adds the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.

Returns:: A torch.utils.data.DataLoader or a sequence of them specifying validation samples.

Examples:

def val_dataloader(self):
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (1.0,))])
    dataset = MNIST(root='/path/to/mnist/', train=False,
                    transform=transform, download=True)
    loader = torch.utils.data.DataLoader(
        dataset=dataset,
        batch_size=self.batch_size,
        shuffle=False
    )

    return loader

# can also return multiple dataloaders
def val_dataloader(self):
    return [loader_a, loader_b, ..., loader_n]

Note

If you don’t need a validation dataset and a validation_step(), you don’t need to implement this method.

Note

In the case where you return multiple validation dataloaders, the validation_step() will have an argument dataloader_idx which matches the order here.

datamodules.RGB package

Subpackages

Submodules

datamodules.RGB.datamodule module

datamodules.RGB.datamodule_cropped module

Module contents