JHTIJHTI (Japanese(Japanese HistoricalHistorical TextText Initiative)Initiative) Project:Project: A Design and Implementation of the Full Text Coordinated Retrieval System of Japanese Historical Resources using XML

Ikuo Oketani1, Osaka International University Delmer Brown, University of , Berkeley Yuko Okubo, University of California, Berkeley

1 TheThe purposespurposes ofof thisthis fullfull--texttext retrievalretrieval systemsystem (1)(1)

Mainly, about the research of the Japanese history and Japanese literature

™ Research of Japanese ancient history ™ Research of Japanese literature ™ Research of Japanese spiritual life ™ For assisting foreign researchers in studying Japanese classics ™ For assisting general studies in Japanese culture ™ Promotion of international collaboration

2 TheThe purposespurposes ofof thisthis fullfull--texttext retrievalretrieval systemsystem (2)(2)

™Project name: JHTI (Japanese Historical Text Initiative)

The final goals of the JHTI are to digitize and to create databases of the twenty-five volumes of Japanese classic texts (discussed later on.)

3 Document names which we are planning to digitize in near future (*:able to search, **:work in progress)

Text 1: Kojiki (古事記) * Text 15: Meiji igo Shukyo kankei Horei * Text 2: Nihon Shoki (日本書紀) * (明治以降神社関係法令史料) Text 3: Shoku Nihongi (続日本紀) * Text 16: Kokutai no Hongi (国体の本義)** Text 4: Izumo Fudoki (出雲風土記) * Text 17: Tenri-kyo (天理教) Text 5: Kogoshui (古語拾遺) Text 18: Kurozumi-kyo (黒住教) Text 6: Engi Shiki (延喜式) * Text 19: Konko-kyo (金光教) Text 7: Eiga Monogatari (栄華物語)** Text 20: Omoto-kyo (大本教) Text 8: Okagami (大鏡) * Text 21: Itto-en (一燈園)** Text 9: Azuma Kagami (吾妻鏡) ** Text 22: Tensho Kotai Jingu-kyo Text 10: Gukansho (愚管抄) * (天照皇太神宮教) Text 11: Jinno Shotoki (神皇正統記) * Text 23: Rissho Kosei-kai (立正佼成会) Text 12: Taiheiki (太平記) * Text 24:Tsubaki Ookami Yashiro (椿大神社) Text 13: Daijingu Jin'iki (大神宮神威記) Text 25:Manyousyu (万葉集)** Text 14: Dokushi Yoron (読史余論)**

4 The purposes of the JHTI (Japanese Historical Text Initiative) project

Three points of our presentation

1. The purposes of this full-text retrieval system 2. The characteristics of our full-text retrieval system - Database Management System (OpenText) - Search Engine (PAT70) - XML-tagged text - UTF-8 (Unicode) 3. The future expansion of this full-text retrieval system - Correlation with the GIS 5 Summary of the Japanese Classical Texts (1) - Nihon-shoki -

We explain as an example of “Nihon-shoki.” - The Imperial Chronicle of

About the Nihon-shoki - The Imperial Chronicle of Japan - The chronicle consist of 30 volumes - The chronicle was edited in A. D. 720 - The chronicle covers the period from the mythological age to the end of the reign of Empress Jito, 697

6 Summary of the Japanese Classical Texts (2) - Shoku-Nihongi -

We explain as an example of “Shoku-Nihongi.” - The Chronicles of Japan, continued, from 697-791 A.D. -

About the Shoku-Nihongi - The Imperial Chronicle of Japan - The chronicle consist of 40 volumes - The chronicle was edited in A. D. 797 - The chronicle covers the period from Emperor Monmu, 697 to Emperor Kanmu,791

7 The Development of the Retrieval System using XML

The features of XML

™ New tags and attributes can be defined optionally ™ The information on structures in a document, such as chapters and paragraphs, can be defined ™ Independence from a machinery and applications ™ Adoption of UTF-8 (Unicode) Gaiji (non-standard kanji characters) : %ufxxx;

8 An example of DTD (Document Type Definition) of XML

9 An example of DTD (Document Type Definition) of XML

・ ・ ・ 10 The Outline of our Full Text Coordinated Retrieval System

™ Database Management System : OpenText - Search Algorithm : Adoption of Patricia (Practical Algorithm To Retrieval Information Code In Alphanumeric) tree method - Possibility of high-speed full text search ™ Search Engine : PAT70 - High-speed Search Engine - Possible to execute using Command Mode ™ CGI (Common Gateway Interface) :

11 The Mechanism of our Full Text Coordinated Retrieval System 12 The Characteristics of our Full Text Coordinated Retrieval System

Our Full Text Retrieval System includes four types of Documents. These four documents are as follows:

●English document W. G. Aston's NIHONGI: Chronicles of Japan from the Earliest times to A.D. 720. (translated from the original Chinese and Japanese), both Vol.1 &2.Printed by the Japan Society (1896). ●Japanese (Chinese writing style) document The original text of「書紀集解」(Shoki Shuge) 河村秀根(Kawamura, Hidene)著, 1785, 30 volumes. ●The image of Japanese Nihonshoki document The original text of「書紀集解」(Shoki Shuge) 河村秀根(Kawamura, Hidene)著, 1785, 30 volumes.(stored in UCB East Asian Library) ●Romanized document

13 Page Image of Shoki Shuge

14 The Functions of our Full Text Coordinated Retrieval System Three retrieval methods of this system

(1) Keyword retrieval (We will describe later) (2) Subject retrieval For example, we can retrieve paragraphs by typing God’s name, name of a place, ritual, and shrine’s name. (3) Browsing function We have also made a “browsing function. We assume that this function is very useful for the beginners of the Japanese history study. + English Translation Assisting Program + Frequency of appearance of vocabulary 15 Diagram 2. Home page of the JHTI project 16 Diagram 3. A sample of entering keyword 「東征」 on the screen” 17 Shoku-Nihongi

A sample of entering keyword 「平城」 on the screen” 18 Gaijii (non-standard Kanji characters) list that appears in the “Nihon Shoki”

Adoption of UTF-8 code Nihon-shoki Shoku-nihongi Number of Gaiji 969 → 131 145 → 11 Kinds of Gaiji 305 → 75 54→ 7 19 Diagram 4. A screen display of the retrieval result of this system20 Diagram 5. Matching paragraphs displayed in English and in Japanese 21 22 The Future Expansion of our Full-Text Coordinated Retrieval System Our Immediate Tasks - To develop the texts listed in Table 1. - Correlation with the GIS (Geographic Information System) We will correlate the present system with the GIS (Geographic Information System), in order to use the information of the geographic distribution of Shinto shrines for our system more efficiently. Currently, this system is experimentally working on the website as follows: URL : http://sunsite.berkeley.edu/jhti/ URL : http://pnc-ecai.oiu.ac.jp/ URL : http://sunsite.berkeley.edu/jhti/nihongi.html

23 Document names which we are planning to digitize in near future (*:able to search, **:work in progress)

Text 1: Kojiki (古事記) * Text 15: Meiji igo Shukyo kankei Horei * Text 2: Nihon Shoki (日本書紀) * (明治以降神社関係法令史料) Text 3: Shoku Nihongi (続日本紀) * Text 16: Kokutai no Hongi (国体の本義)** Text 4: Izumo Fudoki (出雲風土記) * Text 17: Tenri-kyo (天理教) Text 5: Kogoshui (古語拾遺) Text 18: Kurozumi-kyo (黒住教) Text 6: Engi Shiki (延喜式) * Text 19: Konko-kyo (金光教) Text 7: Eiga Monogatari (栄華物語)** Text 20: Omoto-kyo (大本教) Text 8: Okagami (大鏡) * Text 21: Itto-en (一燈園)** Text 9: Azuma Kagami (吾妻鏡) ** Text 22: Tensho Kotai Jingu-kyo Text 10: Gukansho (愚管抄) * (天照皇太神宮教) Text 11: Jinno Shotoki (神皇正統記) * Text 23: Rissho Kosei-kai (立正佼成会) Text 12: Taiheiki (太平記) * Text 24:Tsubaki Ookami Yashiro (椿大神社) Text 13: Daijingu Jin'iki (大神宮神威記) Text 25:Manyousyu (万葉集)** Text 14: Dokushi Yoron (読史余論)**

24 The texts we have already created and developed for the retrieval system

(1) Kojiki (古事記) (2) Nihon Shoki (日本書紀) (3) Shoku Nihongi (続日本紀) (4) Izumo Fudoki (出雲風土記) (5) Engi Shiki (延喜式) (6) Okagami (大鏡) (7) Gukansho (愚管抄) (8) Jinno Shotoki (神皇正統記) (9) Taiheiki (太平記) (10) Meiji igo Shukyo kankei Horei (明治以降神社 関係法令史料)

25 Appendix

26 Appendix(1) http://sunsite.berkeley.edu/jhti/ http://pnc-ecai.oiu.ac.jp/ http://sunsite.berkeley.edu/jhti/nihongi.html

27 Appendix

28 Appendix

29 Appendix

30 Appendix

31 Appendix

32 Appendix

33 Appendix

34