datamodules.IndexedFormats.datasets package
Submodules
datamodules.IndexedFormats.datasets.full_page_dataset module
Load a dataset of historic documents by specifying the folder where its located.
- class DatasetIndexed(path: Path, data_folder_name: str, gt_folder_name: str, image_dims: ImageDimensions, is_test=False, selection: Optional[Union[int, List[str]]] = None, image_transform=None)[source]
Bases:
Dataset
A dataset where the images are arranged in this way:
root/gt/xxx.gif root/gt/xxy.gif root/gt/xxz.gif
root/data/xxx.png root/data/xxy.png root/data/xxz.png
And the ground truth is represented in an index format like GIF.
- Parameters:
path (Path) – Path to the dataset
data_folder_name (str) – Name of the folder where the data is located
gt_folder_name (str) – Name of the folder where the ground truth is located
image_dims (ImageDimensions) – Image dimensions of the dataset
is_test (bool) – Flag to indicate if the dataset is used for testing
selection (Optional[Union[int, List[str]]]) – Selection of the dataset, can be an integer or a list of strings
image_transform (Optional[Callable]) – Transformations that are applied to the image
- static get_img_gt_path_list(directory: Path, data_folder_name: str, gt_folder_name: str, selection: Optional[Union[int, List[str]]] = None) List[Tuple[Path, Path]] [source]
Structure of the folder
directory/data/FILE_NAME.png directory/gt/FILE_NAME.gif
- Parameters:
directory (Path) – Path to the dataset
data_folder_name (str) – Name of the folder where the data is located
gt_folder_name (str) – Name of the folder where the ground truth is located
selection (Optional[Union[int, List[str]]]) – Selection of the dataset, can be an integer or a list of strings
- Returns:
List of tuples with the path to the data and the ground truth
- Return type:
List[Tuple[Path, Path]]
- Raises:
ValueError – If the folder data or gt is not found in the directory