Article OCR4all - An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings Christian Reul 1, Dennis Christ 1, Alexander Hartelt 1, Nico Balbach 1, Maximilian Wehner 1, Uwe Springmann 2, Christoph Wick 1, Christine Grundig 3, Andreas Büttner 4, and Frank Puppe 1 1 Chair for Artificial Intelligence and Applied Computer Science, University of Würzburg, 97074 Würzburg, Germany 2 Center for Information and Language Processing, LMU Munich, 80538 Munich, Germany 3 Institute for Modern Art History, University of Zurich, 8006 Zurich, Switzerland 4 Institute for Philosophy, University of Würzburg, 97074 Würzburg, Germany * Correspondence:
[email protected] Version September 11, 2019 submitted to Appl. Sci. Abstract: Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography. Nevertheless, in the last few years great progress has been made in the area of historical OCR, resulting in several powerful open-source tools for preprocessing, layout recognition and segmentation, character recognition and post-processing. The drawback of these tools often is their limited applicability by non-technical users like humanist scholars and in particular the combined use of several tools in a workflow. In this paper we present an open-source OCR software called OCR4all, which combines state-of-the-art OCR components and continuous model training into a comprehensive workflow. A comfortable GUI allows error corrections not only in the final output, but already in early stages to minimize error propagations. Further on, extensive configuration capabilities are provided to set the degree of automation of the workflow and to make adaptations to the carefully selected default parameters for specific printings, if necessary.