<<

Optical Recognition Rossum Reading Group 2018 - 08 - 16 O nás Auxiliary Historical Sciences (Archival science) Outline

1. The evolution of scripts in modern history 2. Archivists and written source processing 3. Obstacles in reading and rewriting 4. OCR obstacles from the archivists point of view The evolution of scripts in modern history

Novogothic (“german”) script

Humanistic (“”) script

Novogothic script

Frakturschrift

Kanzleischrift

Kurrentschrift

And how it actually looks like...

Humanistic sript

Rotunda humanistica

Humanistic semi-cursive

Archivists and written source processing

Paleography (PVH) Archivists and written source processing

Paleography (PVH)

Why learn reading/rewriting? Archivists and written source processing

Paleografie (PVH)

Why learn reading/rewriting?

research (students, historians, genealogs…)

archivists

publish source editions Archivists and written source processing

Paleography (PVH)

Why learn reading/rewriting?

How to rewrite? Archivists and written source processing

Paleography (PVH)

Why learn reading/rewriting?

How to rewrite?

transliteration

transcription

Obstacles in reading and rewriting

Multiple languages

Obstacles in reading and rewriting

Multiple languages

Numerals and abbreviations

Obstacles in reading and rewriting

Multiple languages

Numerals and abbreviations

Special shapes

Obstacles in reading and rewriting

Multiple languages

Numerals and abbreviations

Special shapes

Grammar Obstacles in reading and rewriting

Multiple languages

Numerals and abbreviations

Special shapes

Grammar

digraphs

different letters: j = g/y, i = j, v = w...

different grammatical habits OCR obstacles from the archivists POV

Letter variations OCR obstacles from the archivists POV

Letter variations

Manuscript variations

OCR obstacles from the archivists POV

Letter variations

Manuscript variations

Current state of digitalization Optical Character Recognition OCR OCR Preprocess

I/OCR

Errors

Layout

Proofread OCR OCR OCR OCR Google Keep baseline GKB GKB GKB OCR What about HTR? Optical Kur... Recognition Con Pro OKR state of the art OKR state of the art under the hood OKR state of the art “References”

● Dropbox

● Line segmentation with FCN ● Text alignment using HMM ● Query text ● DTW

● Mass transcription of modern English ● Comparison of CRNN to LSTM, HMM in historical HTR ● Older case study for old Actually ... ECCO IAM

Actually EEBO

... Bentham

GWP

ICFR2018

DAS(y mod 4) Thank you

http://opticalkurrentrecognition.jdem.cz/