datamodules.RGB package
Subpackages
Submodules
datamodules.RGB.datamodule module
- class DataModuleRGB(data_dir: str, data_folder_name: str, gt_folder_name: str, train_folder_name: str = 'train', val_folder_name: str = 'val', test_folder_name: str = 'test', pred_file_path_list: Optional[List[str]] = None, selection_train: Optional[Union[int, List[str]]] = None, selection_val: Optional[Union[int, List[str]]] = None, selection_test: Optional[Union[int, List[str]]] = None, num_workers: int = 4, batch_size: int = 8, shuffle: bool = True, drop_last: bool = True)[source]
Bases:
AbstractDatamodule
The data module for a dataset where the classes of the ground truth are encoded as colors in the image. This data module expects full sized images. This does not mean, that the image needs to be in the original resolution but it can not consist of cropped images. If you want to work with cropped images use class: DataModuleCroppedRGB.
The structure of the folder should be as follows:
data_dir ├── train_folder_name │ ├── data_folder_name │ │ ├── image1.png │ │ ├── ... │ │ └── imageN.png │ └── gt_folder_name │ ├── image1.png │ ├── ... │ └── imageN.png ├── val_folder_name │ ├── data_folder_name │ │ ├── image1.png │ │ ├── ... │ │ └── imageN.png │ └── gt_folder_name │ ├── image1.png │ ├── ... │ └── imageN.png └── test_folder_name ├── data_folder_name │ ├── image1.png │ ├── ... │ └── imageN.png └── gt_folder_name ├── image1.png ├── ... └── imageN.png
- Parameters:
data_dir (str) – Path to the dataset folder.
data_folder_name (str) – Name of the folder where the images are stored.
gt_folder_name (str) – Name of the folder where the ground truth is stored.
train_folder_name (str) – Name of the folder where the training data is stored.
val_folder_name (str) – Name of the folder where the validation data is stored.
test_folder_name (str) – Name of the folder where the test data is stored.
pred_file_path_list (List[str]) – List of file paths to the images that should be predicted.
selection_train (Union[int, List[str], None]) – selection of the training data
selection_val (Union[int, List[str], None]) – selection of the validation data
selection_test (Union[int, List[str], None]) – selection of the test data
num_workers (int) – number of workers for the dataloaders
batch_size (int) – batch size
shuffle (bool) – shuffle the data
drop_last (bool) – drop the last batch if it is smaller than the batch size
- get_output_filename_predict(index: int) str [source]
Returns the original filename of the doc image. You can just use this during testing!
- Parameters:
index (int) – index of the sample
- Returns:
original filename of the doc image
- Return type:
str
- get_output_filename_test(index: int) str [source]
Returns the original filename of the doc image. You can just use this during testing!
- Parameters:
index (int) – index of the sample
- Returns:
original filename of the doc image
- Return type:
str
- predict_dataloader() Union[DataLoader, List[DataLoader]] [source]
Implement one or multiple PyTorch DataLoaders for prediction.
It’s recommended that all data downloads and preparation happen in
prepare_data()
.predict()
prepare_data()
Note
Lightning adds the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
- Returns:
A
torch.utils.data.DataLoader
or a sequence of them specifying prediction samples.
Note
In the case where you return multiple prediction dataloaders, the
predict_step()
will have an argumentdataloader_idx
which matches the order here.
- setup(stage: Optional[str] = None)[source]
Called at the beginning of fit (train + validate), validate, test, or predict. This is a good hook when you need to build models dynamically or adjust something about them. This hook is called on every process when using DDP.
- Parameters:
stage – either
'fit'
,'validate'
,'test'
, or'predict'
Example:
class LitModel(...): def __init__(self): self.l1 = None def prepare_data(self): download_data() tokenize() # don't do this self.something = else def setup(self, stage): data = load_data(...) self.l1 = nn.Linear(28, data.num_classes)
- test_dataloader(*args, **kwargs) Union[DataLoader, List[DataLoader]] [source]
Implement one or multiple PyTorch DataLoaders for testing.
For data processing use the following pattern:
download in
prepare_data()
process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
test()
prepare_data()
Note
Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
- Returns:
A
torch.utils.data.DataLoader
or a sequence of them specifying testing samples.
Example:
def test_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=False ) return loader # can also return multiple dataloaders def test_dataloader(self): return [loader_a, loader_b, ..., loader_n]
Note
If you don’t need a test dataset and a
test_step()
, you don’t need to implement this method.Note
In the case where you return multiple test dataloaders, the
test_step()
will have an argumentdataloader_idx
which matches the order here.
- train_dataloader(*args, **kwargs) DataLoader [source]
Implement one or more PyTorch DataLoaders for training.
- Returns:
A collection of
torch.utils.data.DataLoader
specifying training samples. In the case of multiple dataloaders, please see this section.
The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
For data processing use the following pattern:
download in
prepare_data()
process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
fit()
prepare_data()
Note
Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
Example:
# single dataloader def train_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=True, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=True ) return loader # multiple dataloaders, return as list def train_dataloader(self): mnist = MNIST(...) cifar = CIFAR(...) mnist_loader = torch.utils.data.DataLoader( dataset=mnist, batch_size=self.batch_size, shuffle=True ) cifar_loader = torch.utils.data.DataLoader( dataset=cifar, batch_size=self.batch_size, shuffle=True ) # each batch will be a list of tensors: [batch_mnist, batch_cifar] return [mnist_loader, cifar_loader] # multiple dataloader, return as dict def train_dataloader(self): mnist = MNIST(...) cifar = CIFAR(...) mnist_loader = torch.utils.data.DataLoader( dataset=mnist, batch_size=self.batch_size, shuffle=True ) cifar_loader = torch.utils.data.DataLoader( dataset=cifar, batch_size=self.batch_size, shuffle=True ) # each batch will be a dict of tensors: {'mnist': batch_mnist, 'cifar': batch_cifar} return {'mnist': mnist_loader, 'cifar': cifar_loader}
- val_dataloader(*args, **kwargs) Union[DataLoader, List[DataLoader]] [source]
Implement one or multiple PyTorch DataLoaders for validation.
The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
It’s recommended that all data downloads and preparation happen in
prepare_data()
.fit()
validate()
prepare_data()
Note
Lightning adds the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
- Returns:
A
torch.utils.data.DataLoader
or a sequence of them specifying validation samples.
Examples:
def val_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=False ) return loader # can also return multiple dataloaders def val_dataloader(self): return [loader_a, loader_b, ..., loader_n]
Note
If you don’t need a validation dataset and a
validation_step()
, you don’t need to implement this method.Note
In the case where you return multiple validation dataloaders, the
validation_step()
will have an argumentdataloader_idx
which matches the order here.
datamodules.RGB.datamodule_cropped module
- class DataModuleCroppedRGB(data_dir: str, data_folder_name: str, gt_folder_name: str, train_folder_name: str = 'train', val_folder_name: str = 'val', test_folder_name: str = 'test', selection_train: Optional[Union[int, List[str]]] = None, selection_val: Optional[Union[int, List[str]]] = None, selection_test: Optional[Union[int, List[str]]] = None, crop_size: int = 256, num_workers: int = 4, batch_size: int = 8, shuffle: bool = True, drop_last: bool = True)[source]
Bases:
AbstractDatamodule
The data module for a dataset where the classes of the ground truth are encoded as colors in the image. This data module expects cropped with a specific structure. The cropping can be done with the script class: tools/generate_cropped_dataset.py. If you do not use the script, make sure that the images are cropped and named in the same way as the script does. If you want to work with un-cropped images use class: DataModuleRGB.
The structure of the folder should be as follows:
data_dir ├── data_folder_name │ ├── train_folder_name │ │ ├── original_image_name_1 │ │ │ ├── image_crop_1.png │ │ │ ├── image_crop_2.png │ │ │ ├── ... │ │ │ └── image_crop_N.png │ ├── val_folder_name │ │ ├── original_image_name_1 │ │ │ ├── image1.png │ │ │ ├── image2.png │ │ │ ├── ... │ │ │ └── imageN.png │ └── test_folder_name │ │ ├── original_image_name_1 │ │ │ ├── image1.png │ │ │ ├── image2.png │ │ │ ├── ... │ │ │ └── imageN.png └── gt_folder_name ├── train_folder_name │ ├── original_image_name_1 │ │ ├── image1.png │ │ ├── image2.png │ │ ├── ... │ │ └── imageN.png ├── val_folder_name │ ├── original_image_name_1 │ │ ├── image1.png │ │ ├── image2.png │ │ ├── ... │ │ └── imageN.png └── test_folder_name ├── original_image_name_1 │ ├── image1.png │ ├── image2.png │ ├── ... │ └── imageN.png
- Parameters:
data_dir (str) – Path to the dataset folder.
data_folder_name (str) – Name of the folder where the images are stored.
gt_folder_name (str) – Name of the folder where the ground truth is stored.
train_folder_name (str) – Name of the folder where the training data is stored.
val_folder_name (str) – Name of the folder where the validation data is stored.
test_folder_name (str) – Name of the folder where the test data is stored.
selection_train (Union[int, List[str], None]) – selection of the training data
selection_val (Union[int, List[str], None]) – selection of the validation data
selection_test (Union[int, List[str], None]) – selection of the test data
num_workers (int) – number of workers for the dataloaders
batch_size (int) – batch size
shuffle (bool) – shuffle the data
drop_last (bool) – drop the last batch if it is smaller than the batch size
- get_img_name_coordinates(index: int)[source]
Returns the original filename of the crop and its coordinate based on the index. You can just use this during testing!
- Parameters:
index (int) – index of the crop
- Returns:
filename of the crop and its coordinate
- Return type:
Tuple[str, Tuple[int, int, int, int]]
- setup(stage: Optional[str] = None)[source]
Called at the beginning of fit (train + validate), validate, test, or predict. This is a good hook when you need to build models dynamically or adjust something about them. This hook is called on every process when using DDP.
- Parameters:
stage – either
'fit'
,'validate'
,'test'
, or'predict'
Example:
class LitModel(...): def __init__(self): self.l1 = None def prepare_data(self): download_data() tokenize() # don't do this self.something = else def setup(self, stage): data = load_data(...) self.l1 = nn.Linear(28, data.num_classes)
- test_dataloader(*args, **kwargs) Union[DataLoader, List[DataLoader]] [source]
Implement one or multiple PyTorch DataLoaders for testing.
For data processing use the following pattern:
download in
prepare_data()
process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
test()
prepare_data()
Note
Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
- Returns:
A
torch.utils.data.DataLoader
or a sequence of them specifying testing samples.
Example:
def test_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=False ) return loader # can also return multiple dataloaders def test_dataloader(self): return [loader_a, loader_b, ..., loader_n]
Note
If you don’t need a test dataset and a
test_step()
, you don’t need to implement this method.Note
In the case where you return multiple test dataloaders, the
test_step()
will have an argumentdataloader_idx
which matches the order here.
- train_dataloader(*args, **kwargs) DataLoader [source]
Implement one or more PyTorch DataLoaders for training.
- Returns:
A collection of
torch.utils.data.DataLoader
specifying training samples. In the case of multiple dataloaders, please see this section.
The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
For data processing use the following pattern:
download in
prepare_data()
process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
fit()
prepare_data()
Note
Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
Example:
# single dataloader def train_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=True, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=True ) return loader # multiple dataloaders, return as list def train_dataloader(self): mnist = MNIST(...) cifar = CIFAR(...) mnist_loader = torch.utils.data.DataLoader( dataset=mnist, batch_size=self.batch_size, shuffle=True ) cifar_loader = torch.utils.data.DataLoader( dataset=cifar, batch_size=self.batch_size, shuffle=True ) # each batch will be a list of tensors: [batch_mnist, batch_cifar] return [mnist_loader, cifar_loader] # multiple dataloader, return as dict def train_dataloader(self): mnist = MNIST(...) cifar = CIFAR(...) mnist_loader = torch.utils.data.DataLoader( dataset=mnist, batch_size=self.batch_size, shuffle=True ) cifar_loader = torch.utils.data.DataLoader( dataset=cifar, batch_size=self.batch_size, shuffle=True ) # each batch will be a dict of tensors: {'mnist': batch_mnist, 'cifar': batch_cifar} return {'mnist': mnist_loader, 'cifar': cifar_loader}
- val_dataloader(*args, **kwargs) Union[DataLoader, List[DataLoader]] [source]
Implement one or multiple PyTorch DataLoaders for validation.
The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
It’s recommended that all data downloads and preparation happen in
prepare_data()
.fit()
validate()
prepare_data()
Note
Lightning adds the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
- Returns:
A
torch.utils.data.DataLoader
or a sequence of them specifying validation samples.
Examples:
def val_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=False ) return loader # can also return multiple dataloaders def val_dataloader(self): return [loader_a, loader_b, ..., loader_n]
Note
If you don’t need a validation dataset and a
validation_step()
, you don’t need to implement this method.Note
In the case where you return multiple validation dataloaders, the
validation_step()
will have an argumentdataloader_idx
which matches the order here.