Restoring and Mining the Records of the Joseon Dynasty via Neural Language Modeling and Machine Translation Kyeongpil Kang Kyohoon Jin Soyoung Yang Scatter Lab Chung-Ang University KAIST Seoul, South Korea Seoul, South Korea Daejeon, South Korea
[email protected] [email protected] [email protected] Soojin Jang Jaegul Choo Youngbin Kim Chung-Ang University KAIST Chung-Ang University Seoul, South Korea Daejeon, South Korea Seoul, South Korea
[email protected] [email protected] [email protected] Abstract records in a digital form for long-term preservation. A representative example is the Google Books Li- Understanding voluminous historical records brary Project1. However, despite the importance of provides clues on the past in various aspects, such as social and political issues and even the historical records, it has been challenging to natural science facts. However, it is generally properly utilize the records for the following rea- difficult to fully utilize the historical records, sons. First, the nontrivial amounts of the documents since most of the documents are not written are partially damaged and unrecognizable due to in a modern language and part of the contents unfortunate historical events or environments, such are damaged over time. As a result, restoring as wars and disasters, as well as the weak durability the damaged or unrecognizable parts as well as of paper documents. These factors result in difficul- translating the records into modern languages are crucial tasks. In response, we present a ties to translate and understand the records. Second, multi-task learning approach to restore and as most of the records are written in ancient and out- translate historical documents based on a self- dated languages, non-experts are difficult to read attention mechanism, specifically utilizing two and understand them.