datamodules.IndexedFormats.datasets package

Submodules

datamodules.IndexedFormats.datasets.full_page_dataset module

Load a dataset of historic documents by specifying the folder where its located.

class DatasetIndexed(path: Path, data_folder_name: str, gt_folder_name: str, image_dims: ImageDimensions, is_test=False, selection: Optional[Union[int, List[str]]] = None, image_transform=None)[source]

Bases: Dataset

A dataset where the images are arranged in this way:

root/gt/xxx.gif root/gt/xxy.gif root/gt/xxz.gif

root/data/xxx.png root/data/xxy.png root/data/xxz.png

And the ground truth is represented in an index format like GIF.

Parameters:
  • path (Path) – Path to the dataset

  • data_folder_name (str) – Name of the folder where the data is located

  • gt_folder_name (str) – Name of the folder where the ground truth is located

  • image_dims (ImageDimensions) – Image dimensions of the dataset

  • is_test (bool) – Flag to indicate if the dataset is used for testing

  • selection (Optional[Union[int, List[str]]]) – Selection of the dataset, can be an integer or a list of strings

  • image_transform (Optional[Callable]) – Transformations that are applied to the image

static get_img_gt_path_list(directory: Path, data_folder_name: str, gt_folder_name: str, selection: Optional[Union[int, List[str]]] = None) List[Tuple[Path, Path]][source]

Structure of the folder

directory/data/FILE_NAME.png directory/gt/FILE_NAME.gif

Parameters:
  • directory (Path) – Path to the dataset

  • data_folder_name (str) – Name of the folder where the data is located

  • gt_folder_name (str) – Name of the folder where the ground truth is located

  • selection (Optional[Union[int, List[str]]]) – Selection of the dataset, can be an integer or a list of strings

Returns:

List of tuples with the path to the data and the ground truth

Return type:

List[Tuple[Path, Path]]

Raises:

ValueError – If the folder data or gt is not found in the directory

Module contents