datamodules.RGB.datasets package

Submodules

datamodules.RGB.datasets.cropped_dataset module

Load a dataset of historic documents by specifying the folder where its located.

class CroppedDatasetRGB(path: Path, data_folder_name: str, gt_folder_name: str, selection: Optional[Union[int, List[str]]] = None, is_test: bool = False, image_transform: Optional[callable] = None, target_transform: Optional[callable] = None, twin_transform: Optional[callable] = None)[source]

Bases: Dataset

A generic data loader where the images are arranged in this way:

path ├── data_folder_name │ ├── original_image_name_1 │ │ ├── image_crop_1.png │ │ ├── … │ │ └── image_crop_N.png │ └──original_image_name_N │ ├── image_crop_1.png │ ├── … │ └── image_crop_N.png └── gt_folder_name

├── original_image_name_1 │ ├── image_crop_1.png │ ├── … │ └── image_crop_N.png └──original_image_name_N

├── image_crop_1.png ├── … └── image_crop_N.png

Parameters:
  • path (Path) – Path to dataset folder (train / val / test)

  • data_folder_name (str) – name of the folder that contains the data

  • gt_folder_name (str) – name of the folder that contains the ground truth

  • selection (Optional[Union[int, List[str]]], optional) – selection of the data, defaults to None

  • is_test (bool, optional) – flag to indicate if the dataset is used for testing, defaults to False

  • image_transform (callable, optional) – image transformation, defaults to None

  • target_transform (callable, optional) – target transformation, defaults to None

  • twin_transform (callable, optional) – twin transformation, defaults to None

static get_gt_data_paths(directory: Path, data_folder_name: str, gt_folder_name: str, selection: Optional[Union[int, List[str]]] = None) List[Tuple[Any, Any, str, Any]][source]

Returns a list of tuples that contain the path to the gt and image that belong together.

Structure of the folder

directory/data/ORIGINAL_FILENAME/FILE_NAME_X_Y.png directory/gt/ORIGINAL_FILENAME/FILE_NAME_X_Y.png

Parameters:
  • directory (Path) – Path to dataset folder (train / val / test)

  • data_folder_name (str) – name of the folder that contains the data

  • gt_folder_name (str) – name of the folder that contains the ground truth

  • selection (Optional[Union[int, List[str]]], optional) – selection of the data, defaults to None

Returns:

List of tuples that contain the path to the gt and image that belong together

Return type:

List[Tuple[Any, Any, str, Any]]

datamodules.RGB.datasets.full_page_dataset module

Load a dataset of historic documents by specifying the folder where its located.

class DatasetRGB(path: Path, data_folder_name: str, gt_folder_name: str, image_dims: ImageDimensions, selection: Optional[Union[int, List[str]]] = None, is_test: bool = False, image_transform: Optional[callable] = None, target_transform: Optional[callable] = None, twin_transform: Optional[callable] = None, **kwargs)[source]

Bases: Dataset

A generic data loader where the images are arranged in this way:

root/gt/xxx.png root/gt/xxy.png root/gt/xxz.png

root/data/xxx.png root/data/xxy.png root/data/xxz.png

Parameters:
  • path (Path) – path to the dataset

  • data_folder_name (str) – name of the folder where the data is located

  • gt_folder_name (str) – name of the folder where the ground truth is located

  • image_dims (ImageDimensions) – dimensions of the image

  • selection (Optional[Union[int, List[str]]]) – selection of the data, can be an int or a list of strings

  • is_test (bool, optional) – flag to indicate if the dataset is used for testing

  • image_transform (callable, optional) – image transformation

  • target_transform (callable, optional) – target transformation

  • twin_transform (callable, optional) – twin transformation

static get_img_gt_path_list(directory: Path, data_folder_name: str, gt_folder_name: str, selection: Optional[Union[int, List[str]]] = None) List[Tuple[Any, Any, Any]][source]

Returns a list of tuples that contain the path to the gt and image that belong together.

Structure of the folder

directory/data/ORIGINAL_FILENAME/FILE_NAME_X_Y.png directory/gt/ORIGINAL_FILENAME/FILE_NAME_X_Y.png

Parameters:
  • directory (Path) – Path to dataset folder (train / val / test)

  • data_folder_name (str) – name of the folder that contains the data

  • gt_folder_name (str) – name of the folder that contains the ground truth

  • selection (Optional[Union[int, List[str]]], optional) – selection of the data, defaults to None

Returns:

List of tuples that contain the path to the gt and image that belong together

Return type:

List[Tuple[Any, Any, str, Any]]

Module contents