GTRefiner.BuildingTools.Visitors package

Submodules

GTRefiner.BuildingTools.Visitors.Colorer module

class GTRefiner.BuildingTools.Visitors.Colorer.Colorer(color_table: Optional[ColorTable] = None)

Bases: Visitor

Default implementation of Colorer sets the colors of the vector objects and the different levels of the pixel ground truth. Using a color table in c{JSON} format, a client can assign individual colors to the various vector objects of the VectorGT or layers of the PixelGT. The colorer takes the responsibility of reading this information and assigning it to the target elements. It supports three strategies: paint each TextElement of a layout class the same, paint each TextElement of a layout class differently, or give each a different color. In principle, any other coloring strategy is conceivable. To do this, the colors in the Color palette must be selected accordingly. If specifics groups should have a certain color (or iteration of colors), a new Colorer can be written, to introduce the desired rules. :param color_table: colors of color table are used to define the colors of the differenet page elements PageElement, layouts Layout and regions, Region. :type color_table: ColorTable

visit_page(page: Page)

Set the colors of the ground truth information from page. Supported strategies are alternating, all the same, all different or any other combination of colors given in the color table. This Colorer just iterates over the colors given. :param page: Is going to be colored according to the color table from the instance of self :type page: Page

GTRefiner.BuildingTools.Visitors.Cropper module

class GTRefiner.BuildingTools.Visitors.Cropper.Cropper(target_dim: ImageDimension)

Bases: Visitor

The Cropping Visitor crops the image to a desired dimension. It is possible to specify whether the page is left- or right-bound. The current use of the cropping function is to get rid of useless edge pixels, but it could also be used for illustration purposes or targeted cropping of text regions. The behavior is inherited through the Croppable interface.

visit_page(page: Page)

Default implementation of Cropper :param page: Crops all elements of page :class: Page to a given target dimension, where a page can be cut left or cut right. This is figured out algorithmically. :type page: Page

GTRefiner.BuildingTools.Visitors.Grouper module

class GTRefiner.BuildingTools.Visitors.Grouper.BlockGrouper(layout_class: LayoutClasses)

Bases: Grouper

Groups the elements of the :class: Layout given into different blocks depending on their minimal x-coordinate. Takes use of the np.histogram to group into different bins. :param layout_class: Layout-class to be grouped. :type layout_class: LayoutClasses

group(region: TextRegion, bins: int = 6) TextRegion

Detect clusters based on x value of page elements. :param region: region to be sorted. :type region: TextRegion :param bins: number of x-oriented bins, defaults to 6 :type bins: int :return: returns the given amount of bins (or less) see numpy documentation :rtype: List[PageElement]

visit_page(page: Page)

Groups all the elements of the layout-class :class: LayoutClasses of this instance.

class GTRefiner.BuildingTools.Visitors.Grouper.Grouper

Bases: Visitor

The Grouping tool works at the level of text regions. It creates, divides and combines the text elements into logical (sub)groups depending on the algorithm. Currently, the Grouper module supports clustering text elements based on their smallest x-coordinate (Blockgrouper) and subdividing them into blocks of close, adjacent text elements (Textgrouper).

abstract group(region: TextRegion) TextRegion

Groups all layouts class: Layout within a region class: TextRegion. :param region: Region to be (re-)grouped :type region: TextRegion :return: grouped Region :rtype: TextRegion

abstract visit_page(page: Page)

No default implementation available for the Grouper.

class GTRefiner.BuildingTools.Visitors.Grouper.ThresholdGrouper(x_threshold: int, y_threshold: int, layout_class: LayoutClasses)

Bases: Grouper

Threshold grouper splits regions if their elements are too far apart (only if both the x and y threshold are exceeded). :param x_threshold: x threshold in pixels to define determin if an element should be split off. :type x_threshold: int :param y_threshold: y threshold in pixels to define determin if an element should be split off. :type y_threshold: int :param layout_class: Layout-class to be grouped. :type layout_class: LayoutClasses

group(region: TextRegion) TextRegion

Splits regions if their elements are too far apart (only if both the x and y threshold are exceeded). Warning: Make sure regions are sorted in either ascending or descending order the way a text is read. :param region: region to be grouped :type region: TextRegion

visit_page(page: Page)

Groups all the elements of the layout-class :class: LayoutClasses of this instance. :param page: page to be grouped :type page: Page

GTRefiner.BuildingTools.Visitors.IllustratorVisitor module

class GTRefiner.BuildingTools.Visitors.IllustratorVisitor.Illustrator(background: <module 'PIL.Image' from '/opt/miniconda3/envs/BachelorThesis/lib/python3.8/site-packages/PIL/Image.py'> = None, color_table: ~GTRefiner.GTRepresentation.Table.ColorTable = None, vis_table: ~GTRefiner.GTRepresentation.Table.VisibilityTable = None, outline: ~typing.Tuple = None)

Bases: Visitor

Illustrator serves for visualizing processes. :param background: If you want the vector gt to be drawn on a background, specify the image :type background: Image :param color_table: If you want another color table than the quick and dirty specified in this class :type color_table: ColorTable :param vis_table: If you want another vis table than the quick and dirty specified in this class :type vis_table: VisibilityTable :param outline: Specify the outline color here, if None is given no outline will be drawn. :type outline: tuple

color_table = <GTRefiner.GTRepresentation.Table.ColorTable object>
comment_color = [(255, 20, 255)]
decoration_color = [(20, 255, 255)]
main_text_color = [(10, 20, 255)]
vis_table = <GTRefiner.GTRepresentation.Table.VisibilityTable object>
visit_page(page: Page)

Illustrate the page. :param page: page to illustrate :type page: Page :return: If a background is given its going to be blended. :rtype: Image

GTRefiner.BuildingTools.Visitors.Layerer module

class GTRefiner.BuildingTools.Visitors.Layerer.Layerer

Bases: Visitor

The Layerer Visitor is used to combine the two ground truths (vector gt and pixel-based gt). It paints the desired vector objects of the vector GT onto the layers of the pixel-level GT and combines them to form an RGB image. In doing so, it overlays the vector GT as a binary image on top of the combined layers of the pixel GT, keeping only pixels that are visible in both the vector GT and the pixel-level GT.

classmethod visit_page(page: Page)

The default implementation of Combiner combines the vector gt and pixel gt by drawing the vector objects on the according layer of the levels within the pixel gt. It takes use of the Layarable :class: Layerable interface. :param page: :type page: :return: :rtype:

GTRefiner.BuildingTools.Visitors.Resizer module

class GTRefiner.BuildingTools.Visitors.Resizer.Resizer(target_dim: ImageDimension)

Bases: Visitor

Resize a page (and all it’s ground-truth information, including the original image) to a target dimension. The default implementation scales the PixelGT in four steps. As in the last presented strategy, first all relevant text pixels are set as visible True and all others as invisible False. In the next step, the image is blurred using Gaussian methods - the ground truth image is now in grayscale. Finally, the blurred image is bicubically interpolated and binarized again (according to Otsu, Niblack or Sauvola). Blurring leads to a thickening of the text elements. The more blurring is applied, the more the text elements merge into each other. :param target_dim: Target dimension :type target_dim: ImageDimension

visit_page(page: Page)

Resize a page (and all it’s ground-truth information, including the original image) to a target dimension. :param target_dim: Target dimension :type target_dim: ImageDimension

GTRefiner.BuildingTools.Visitors.Sorter module

class GTRefiner.BuildingTools.Visitors.Sorter.AscendingSorter

Bases: Sorter

visit_page(page: Page)

Sort all elems of a region in ascending order (ascending = lowest y value first). :param page: page to be sorted. :type page: Page

class GTRefiner.BuildingTools.Visitors.Sorter.DescendingSorter

Bases: Sorter

visit_page(page: Page)

Sort all elems of a region in descending order (descending = highest y value first). :param page: page to be sorted. :type page: Page

class GTRefiner.BuildingTools.Visitors.Sorter.Sorter

Bases: Visitor

Sort a given container of objects. The text elements of the vectorized ground truth of the DIVA-HisDB are not consistently sorted, which is why the Sorter should always be used if the order of the text elements matters. We implement this function by having the layout and TextRegion classes both override __lt__() base-function of the Python object. Thus they provide an interface for efficient sorting (thanks to Python’s built-in sorting algorithms) of text elements and regions. The sorter tool can be used to invoke, add to, and modify this behavior as desired. The sorter goes hand in hand with the grouper tool, see module Grouper, and the alternating colorer, see module Colorer.

abstract visit_page(page: Page)

Visit the page and apply the new behaviour of the concrete implementation of this Visitor :class: Visitor.

GTRefiner.BuildingTools.Visitors.TextLineDecorator module

class GTRefiner.BuildingTools.Visitors.TextLineDecorator.AscenderDescenderDecorator(x_height: int)

Bases: TextLineDecorator

Parameters:

x_height (int) – Based on this int value and a baseline provided by the TextLine element calculate Ascenders, Descenders and x-Height (Rectangles).

visit_page(page: Page)

Decorate all TextLine instaces elements of page.

class GTRefiner.BuildingTools.Visitors.TextLineDecorator.HeadAndTailDecorator

Bases: TextLineDecorator

“Example of another Decorator Class”

class GTRefiner.BuildingTools.Visitors.TextLineDecorator.HistogramDecorator

Bases: TextLineDecorator

“Example of another Decorator Class”

class GTRefiner.BuildingTools.Visitors.TextLineDecorator.TextLineDecorator

Bases: Visitor

abstract classmethod visit_page(page: Page)

Decorate textline elements of page.

GTRefiner.BuildingTools.Visitors.VisibilityVisitor module

class GTRefiner.BuildingTools.Visitors.VisibilityVisitor.VisibilityVisitor(vis_table: Optional[VisibilityTable] = None)

Bases: Visitor

Based on a visibility table, set all elements in the vector ground-truth VectorGT and all layers of the pixel level ground-truth:class:PixelLevelGT to the specified boolean value. Analogous to the Colorer, the Visibility-Visitor reads a visibility table that defines whether a layout class should be visible or not. If the user decides that only individual text regions or text elements are of interest, a new visitor can be written that implements the desired functionality. :param vis_table: visibility table. :type vis_table: VisibilityTable

visit_page(page: Page)

Based on a visibility table, set all elements in the vector ground-truth VectorGT and all layers of the pixel level ground-truth:class:PixelLevelGT to the specified boolean value. :param page: Page that should be set visible according to the visibility table provided within the page :class: Page or can be set Visible with a custom visibility table provided by the instance (at instantiation). :type page: Page

Module contents