Julien Lerouge

Julien Lerouge

Senior Data Scientist @ QuickSign
  • Deep learning
  • Image processing
  • Document analysis & understanding (classification, OCR, NLP)

Publication

A structural signature based on texture for digitized historical book page categorization

1L3I Laboratory, University of La Rochelle, av M. Crépeau, 17042 La Rochelle Cedex 1, France
2Normandie Université, LITIS EA 4108, University of Rouen, 76801, Saint-Etienne du Rouvray, France

Abstract :

The work conducted in this article presents a structural signature based on texture for the characterization and categorization of digitized historical book pages. The proposed signature does not assume a priori knowledge regarding page layout and content, and hence, it is applicable to a large variety of ancient books. By integrating varying low-level features (e.g. texture) characterizing the different page components (i.e. different text fonts or graphic regions) on the one hand, and structural information describing the page layout on the other hand, the proposed signature provides a rich and holistic description of the layout and content of the analyzed book pages. More precisely, the signature-based characterization approach consists of two stages. The first stage is extracting automatically homogeneous regions. Then, the second one is proposing a graph-based page signature, which is based on the extracted homogeneous regions, reflecting its layout and content. This signature ensures the implementation of numerous applications for managing effectively a corpus or collections of books (e.g. information retrieval in digital libraries according to several criteria or page categorization). To illustrate the effectiveness of the proposed page signature, a detailed experimental evaluation has been conducted in this article for assessing two possible categorization applications, unsupervised page classification and page stream segmentation.