Dataset of Pages from Early Printed Books with Multiple Font Groups Mathias Seuret∗ Saskia Limbach∗ Nikolaus Weichselbaumer∗ Pattern Recognition Lab, Gutenberg-Institut für Gutenberg-Institut für Friedrich-Alexander-Universität Weltliteratur und schriftorientierte Weltliteratur und schriftorientierte Erlangen-Nürnberg Medien Abteilung Medien Abteilung Erlangen, Germany Buchwissenschaft Buchwissenschaft
[email protected] Mainz, Germany Mainz, Germany
[email protected] [email protected] Andreas Maier Vincent Christlein Pattern Recognition Lab, Pattern Recognition Lab, Friedrich-Alexander-Universität Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen-Nürnberg Erlangen, Germany Erlangen, Germany
[email protected] [email protected] ABSTRACT ACM Reference Format: Based on contemporary scripts, early printers developed a Mathias Seuret, Saskia Limbach, Nikolaus Weichselbaumer, An- dreas Maier, and Vincent Christlein. 2019. Dataset of Pages from large variety of different fonts. While fonts may slightly differ Early Printed Books with Multiple Font Groups. In HIP’19 : 5th from one printer to another, they can be divided into font International Workshop on Historical Document Imaging and Pro- groups, such as Textura, Antiqua, or Fraktur. The recognition cessing, Sydney, Australia. ACM, New York, NY, USA, 6 pages. of font groups is important for computer scientists to select https://doi.org/10.1145/3352631.3352640 adequate OCR models, and of high interest to humanities scholars studying early printed books and the history of fonts. 1 INTRODUCTION In this paper, we introduce a new, public dataset for the The dataset presented in this paper1 is meant to aid the recognition of font groups in early printed books, and evaluate automatic recognition of font groups in scans of early modern2 several state-of-the-art CNNs for the font group recognition books (see Sec.