Writing Styles Which Make It Impossible to Segment Lines Into Words Without Using Lexical/Syntactic Knowledge1
Total Page:16
File Type:pdf, Size:1020Kb
Writing styles which make it impossible to segment lines into words without using lexical/syntactic knowledge1 Most (or all) writing systems were originated and then evolved as a means to render speech into some tangible form which could be easily disseminated and preserved for future reference. We call \text" the result of such a rendering process. Since text was mainly aimed at somehow mimicking the speech events to be rendered, it is no surprise that most forms of early writing did not care at all about separation of the text into words. Many writing styles went in this way over the millennia. Scriptio continua is still in use in Thai, as well as in other Southeast Asian alphasyllabaric systems and in languages that use Chinese characters. Nowadays, there are millions of preserved documents written in writing styles where word separa- tion is inexistent or highly inconsistent. Clearly, the only possibility to actually read and understand this kind of documents (for humans and machines alike), is by making use of lexical and syntactical knowledge about the language from which the text was produced. In this Annex we show a few examples of some of the most important of these writing styles which were widely used in Europe at least until the XVI century. \Escritura Cortesana" was widely used in edicts and other documents in the Castillan courts through the XV and XVI centuries in Spain. See examples in Figure 1. \Procesal encadenada" was a Spanish writing style, also widely used in the XVI century in Spain in the notarial ambit and by the Audience scribes. It was typically executed in a very fast way, linking the letters and words without lifting the quill off the paper. (see examples in Figure 2). Merovingian is another example of writing were it is very difficult to isolate words. It was a medieval script developed in France during the Merovingian dynasty and was mainly used through the VII and VIII centuries before the development of the Carolingian minuscule style. Examples of this writing style are shown in Figure 3. Carolingian minuscule is another writing script were isolating the words is problematic. This style began around the VIII century, when scribes developed a minuscule script that effectively became the standard script for manuscripts in the centuries IX through XI. See examples of this style in Figure 4. Insular minuscule is still another style that exhibit word segmentation problems. This was a medieval script system originally used in Ireland, that spread throughout continental Europe under the influence of Irish Christianity. It was developed in the VII century and was used as late as the XII century, though its most flourishing period was between 600 and 850. See examples of this style in Figure 5. The list can be very large. See a few additional examples in Figures 6{8. In general, all the writing styles using cursive lowercase tend exhibit several problems to be isolated into words. See also: www.ayto-toledo.org/archivo/exposiciones/letracortesana/letracortesana.asp http://medievalwriting.50megs.com/scripts/scrindex.htm http://en.wikipedia.org/wiki/Scriptio_continua http://employees.oneonta.edu/farberas/arth/arth212/carolingian_culture/carolingian_ scripts.html 1Thanks are due to palepgrapher Celio Hern´andez-Tornero for his collaboration in writting this Annex. 1 Figure 1: Example of \escritura cortesana" writing style. Figure 2: Example of \procesal encadenada" writing style. Figure 3: Examples of Merovingian writing style. Figure 4: Example of Carolingian minuscule writing style. 2 Figure 5: Example of Insular minuscule writing style. Figure 6: Example of Roman style (IV century). Figure 7: Example of Greek cursive writing style (VI century). Figure 8: Example of Latin cursive writing style (VI Cent.). 3.