Presentation of the Possibilities Offered by IFAO-Grec Unicode in Greek and Coptic, Especially in the PUA
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only. -
Typing in Greek Sarah Abowitz Smith College Classics Department
Typing in Greek Sarah Abowitz Smith College Classics Department Windows 1. Down at the lower right corner of the screen, click the letters ENG, then select Language Preferences in the pop-up menu. If these letters are not present at the lower right corner of the screen, open Settings, click on Time & Language, then select Region & Language in the sidebar to get to the proper screen for step 2. 2. When this window opens, check if Ελληνικά/Greek is in the list of keyboards on your computer under Languages. If so, go to step 3. Otherwise, click Add A New Language. Clicking Add A New Language will take you to this window. Look for Ελληνικά/Greek and click it. When you click Ελληνικά/Greek, the language will be added and you will return to the previous screen. 3. Now that Ελληνικά is listed in your computer’s languages, click it and then click Options. 4. Click Add A Keyboard and add the Greek Polytonic option. If you started this tutorial without the pictured keyboard menu in step 1, it should be in the lower right corner of your screen now. 5. To start typing in Greek, click the letters ENG next to the clock in the lower right corner of the screen. Choose “Greek Polytonic keyboard” to start typing in greek, and click “US keyboard” again to go back to English. Mac 1. Click the apple button in the top left corner of your screen. From the drop-down menu, choose System Preferences. When the window below appears, click the “Keyboard” icon. -
The Greek Alphabet & Pronunciation
Lesson 1 tHe Greek aLPHaBet & Pronunciation n this lesson, we learn how to identify and pronounce the letters of I the Greek alphabet. We also distinguish smooth and rough breathing marks and learn the sounds of Greek diphthongs. Finally, we practice reading a few Greek words, such as Ἀχαιός, ἴφθιμος, and προϊάπτω. The classical Greek alphabet has 24 letters (plus two archaic letters that help explain older forms of Greek). Greek Latin Greek Latin Letter Equivalents Sound Name Transcription a as in father (when short, as Α, α A, a ἄλφα alpha in aha) Β, β B, b b as in bite βῆτα beta always g as in get (never soft, Γ, γ G, g γάμμα gamma as in gym) Δ, δ D, d d as in deal δέλτα delta Ε, ε E, e e as in red ἒ ψιλόν epsilon zd as in Mazda (many also pronounce this dz or simply z, Ζ, ζ Z, z because these are simpler to ζῆτα zeta pronounce for native English speakers) long a as in gate or as in Η, η E, e ἦτα eta (French) fête Θ, θ th th as in thick θῆτα theta long e as in feet and police or , ι I, i ἰῶτα iota short i as in hit 2 , κ K, k or C, c k as in kill κάππα kappa , λ L, l l as in language λάμβδα lambda , μ M, m m as in man μῦ mu , ν N, n n as in never νῦ nu , ξ X, x x as in box ξῖ xi o as in ought, but shorter (that is, a “closed” o), or as , ο O, o ὂ μικρόν omicron in the British pronunciation of pot , π P, p p as in pie πῖ pi a trilled r (as in continental , ρ R, r ῥῶ rho European languages) Σ, σ, ς S, s s as in sing σίγμα sigma Τ, τ T, t t as in tip ταῦ tau u as in (French) tu or U, u or (German) Müller, but the u in Υ, υ ὖ ψιλόν upsilon -
Polytonic Transliteration of Ancient Greek (P-TAG) 16.06.2006 / 30.11.2010 / 01.12.2013, W.S
Polytonic Transliteration of Ancient Greek (P-TAG) 16.06.2006 / 30.11.2010 / 01.12.2013, W.S. There seems to be no transliteration scheme for Ancient Greek that is at the same time - precise in representing data that are phonologically or grammatically relevant; - economical, dispensing with signs that have lost their significance (final sigma, smooth breathing/coronis); - comfortable in data entry and robust in transmission, avoiding uncommon characters; - easily understood, being based on common ways of spelling Greek loan words. Current transliteration schemes either demand special training (e.g. Beta Code) or do not support polytonic pronunciation (see e.g. The Perseus Project) or utilize rare signs that are not easy to enter and not always preserved in data transmission. P-TAG is expected to fulfil these requirements. It seems to be especially appropriate for rendering chunks of Greek that occur in Roman script documents. As our scheme ignores a few niceties of Greek orthography, if converted, it will yield no 100% Beta Code transcript or Unicode Greek script, but the result will surely be intelligible. So far, however, conversion has not been tested. We used the new transliteration scheme in the ITALI section when rendering Greek terms contained in headings or index entries: e.g. { anagnó:risis }. The present proposal is limited to the typographical signs that occur in standard editions of classical Greek literature, excluding special signs that are used in editions of inscriptions, very early texts and dialect texts (other than those written in the major "literary dialects"), as well as editorial signs used in the apparatus criticus or in editions of fragments. -
The Unicode Standard 5.0 Code Charts
Greek Extended Range: 1F00–1FFF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 5.0. This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.0/ for charts showing only the characters added in Unicode 5.0. See http://www.unicode.org/Public/5.0.0/charts/ for a complete archived file of character code charts for Unicode 5.0. Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 5.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, and #34, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available on-line. See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation. -
The Brill Typeface User Guide & Complete List of Characters
The Brill Typeface User Guide & Complete List of Characters Version 2.06, October 31, 2014 Pim Rietbroek Preamble Few typefaces – if any – allow the user to access every Latin character, every IPA character, every diacritic, and to have these combine in a typographically satisfactory manner, in a range of styles (roman, italic, and more); even fewer add full support for Greek, both modern and ancient, with specialised characters that papyrologists and epigraphers need; not to mention coverage of the Slavic languages in the Cyrillic range. The Brill typeface aims to do just that, and to be a tool for all scholars in the humanities; for Brill’s authors and editors; for Brill’s staff and service providers; and finally, for anyone in need of this tool, as long as it is not used for any commercial gain.* There are several fonts in different styles, each of which has the same set of characters as all the others. The Unicode Standard is rigorously adhered to: there is no dependence on the Private Use Area (PUA), as it happens frequently in other fonts with regard to characters carrying rare diacritics or combinations of diacritics. Instead, all alphabetic characters can carry any diacritic or combination of diacritics, even stacked, with automatic correct positioning. This is made possible by the inclusion of all of Unicode’s combining characters and by the application of extensive OpenType Glyph Positioning programming. Credits The Brill fonts are an original design by John Hudson of Tiro Typeworks. Alice Savoie contributed to Brill bold and bold italic. The black-letter (‘Fraktur’) range of characters was made by Karsten Lücke. -
Recognition of Greek Polytonic on Historical Degraded Texts Using Hmms
2016 12th IAPR Workshop on Document Analysis Systems Recognition of Greek Polytonic on Historical Degraded Texts using HMMs Vassilis Katsouros1, Vassilis Papavassiliou1, Fotini Simistira3,1, and Basilis Gatos2 1Institute for Language and Speech Processing (ILSP) Athena Research and Innovation Center, Athens, Greece Email: {vpapa, fotini, vsk}@ilsp.gr 2Computational Intelligence Laboratory, Institute of Informatics and Telecommunications 1DWLRQDO&HQWHUIRU6FLHQWLILF5HVHDUFK³'HPRNULWRV´$WKHQV*UHHFH [email protected] 3DIVA Research Group University of Fribourg, Switzerland Email: [email protected] Abstract² Optical Character Recognition (OCR) of ancient SODFHGDERYHWKHYRZHOVWKHDFXWHDFFHQWHJȐWRPDUNKLgh Greek polytonic scripts is a challenging task due to the large pitch on a short vowel, the grave accent, e.g. , to mark normal number of character classes, resulting from variations of or low pitch, and the circumflex, e.g. ઼, to mark high and diacritical marks on the vowel letters. Classical OCR systems falling pitch within one syllable; the second category involves require a character segmentation phase, which in the case of two breathings placed also above vowels, the rough breathing, Greek polytonic scripts is the main source of errors that finally e.g. ਖ, to indicate a voiceless glottal fricative (/h/) before a affects the overall OCR performance. This paper suggests a vowel and the smooth breathing, e.g. ਕ, to indicate the absence character segmentation free HMM-based recognition system and RIKWKHGLDHUHVLVDSSHDUVRQWKHOHWWHUVȚDQGȣHJȧDQGȨ -
Information, Characters, Unicode
Information, Characters, Unicode Unicode © 24 August 2021 1 / 107 Hidden Moral Small mistakes can be catastrophic! Style Care about every character of your program. Tip: printf Care about every character in the program’s output. (Be reasonably tolerant and defensive about the input. “Fail early” and clearly.) Unicode © 24 August 2021 2 / 107 Imperative Thou shalt care about every Ěaracter in your program. Unicode © 24 August 2021 3 / 107 Imperative Thou shalt know every Ěaracter in the input. Thou shalt care about every Ěaracter in your output. Unicode © 24 August 2021 4 / 107 Information – Characters In modern computing, natural-language text is very important information. (“number-crunching” is less important.) Characters of text are represented in several different ways and a known character encoding is necessary to exchange text information. For many years an important encoding standard for characters has been US ASCII–a 7-bit encoding. Since 7 does not divide 32, the ubiquitous word size of computers, 8-bit encodings are more common. Very common is ISO 8859-1 aka “Latin-1,” and other 8-bit encodings of characters sets for languages other than English. Currently, a very large multi-lingual character repertoire known as Unicode is gaining importance. Unicode © 24 August 2021 5 / 107 US ASCII (7-bit), or bottom half of Latin1 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SS SI DLE DC1 DC2 DC3 DC4 NAK SYN ETP CAN EM SUB ESC FS GS RS US !"#$%&’()*+,-./ 0123456789:;<=>? @ABCDEFGHIJKLMNO PQRSTUVWXYZ[\]^_ `abcdefghijklmno pqrstuvwxyz{|}~ DEL Unicode Character Sets © 24 August 2021 6 / 107 Control Characters Notice that the first twos rows are filled with so-called control characters. -
Romanization of Greek 1 Romanization of Greek
Romanization of Greek 1 Romanization of Greek Romanization of Greek is the representation of Greek language texts, that are usually written in the Greek alphabet, with the Latin alphabet, or a system for doing so. There are several methods for the romanization of Greek, especially depending on whether the language written with Greek letters is Ancient Greek or Modern Greek and whether a phonetic transcription or a graphemic transliteration is intended. The conventional rendering of classical Greek names in English originates in the way Latin represented Greek loanwords in antiquity. The ⟨κ⟩ is replaced with ⟨c⟩, the diphthongs ⟨αι⟩ and ⟨οι⟩ are rendered as ⟨ae⟩ and ⟨oe⟩ (or ⟨æ, œ⟩); and ⟨ει⟩ and ⟨ου⟩ are simplified to ⟨i⟩ and ⟨u⟩. In modern scholarly transliteration of Ancient Greek, ⟨κ⟩ will instead be rendered as ⟨k⟩, and the vowel combinations ⟨αι, οι, ει, ου⟩ as ⟨ai, oi, ei, ou⟩ respectively. The letters ⟨θ⟩ and ⟨φ⟩ are generally rendered as ⟨th⟩ and ⟨ph⟩; ⟨χ⟩ as either ⟨ch⟩ or ⟨kh⟩; and word-initial ⟨ρ⟩ as ⟨rh⟩. For Modern Greek, there are multiple different transcription conventions. They differ widely, depending on their purpose, on how close they stay to the conventional letter correspondences of Ancient Greek–based transcription systems, and to what degree they attempt either an exact letter-by-letter transliteration or rather a phonetically based transcription. Standardized formal transcription systems have been defined by the International Organization for Standardization (as ISO 843), by the United Nations Group of Experts on Geographical Names, by the Library of Congress, and others. The different systems can create confusion. -
Greek Polytonic – Easy Accent Overview
Greek Polytonic – Easy Accent This document contains four sections: 1 Overview 2 Layout of the Easy Accent Keyboard 3 Shift State Graphics 4 Installation Instructions Overview Easy Accent is a custom Greek Polytonic keyboard that John Schwandt (Senior Fellow New Saint Andrews College) created to facilitate typing in ancient Greek while using the modern national Greek keyboard layout. Microsoft comes with a Greek Polytonic keyboard that anyone can add to their Windows system. Once added all you need to do is press Alt+Shift to change from typing in English to typing in Greek. Since this is done in the operating system it works in any program, even in system programs such as naming files and directories. It has all of the letters in the same location as the modern national system but this doesn’t leave much room for all of the possible diacritic marking combinations. If you have tried to access these marking in this keyboard you understand how difficult and unintuitive it is. There had to be an easier way. Since Greek is generally written either in uncials (upper case) or miniscules (lower case) forms without frequent shifting between the two, the Shift Key seemed to be better used for one of the far more frequent accent marks. This lead to the development of all accents being shift states rather than dead keys. The trick was to find three shift states that would still allow for all case form and not conflict with the “control” or “alt” shift states which many programs use for hotkeys. The Easy Accent keyboard uses Shift, AltGr (right Alt), and AltGr+Shift for these shift states. -
Part Two the First Problem with Word-Processing in Ancient Greek Is
Word Processing in Greek - Part Two The first problem with word-processing in ancient Greek is finding a font with Greek characters. (see Part 1) When you have a Unicode font with the polytonic Greek characters, the next problem is how to access them from the keyboard. Problem # 2 - Accessing the Greek characters . In order to do word-processing in ancient Greek, one not only needs a font with Greek characters, as well as the Latin characters used for English, one also needs to be able to access them easily by means of a keyboard. There are programs available, some for free download, which give a "virtual keyboard" which enables one to use a single keyboard, but switch between character sets. Some include the ability to switch between Hebrew, Russian, even simple Chinese. However, I have not found one which is completely reliable. One of the best, which I used for some time, works very well up to a point - then it seems to run out of memory and freezes the word-processor or even crashes the computer. However, I have found a simpler method, using MSWord, which involves a bit of time to set up - but less time than installing and getting used to a virtual keyboard program. If you don't have MSWord, the word-processing program you do have will probably have a similar capability. RTLM - Read the lovely manual and find out how to "insert symbol" and "allocate short cuts". Using MSWord, and a Unicode font, it is possible to set up a series of "short cuts" - keystrokes which will access characters outside the "Basic Latin" range associated with the normal keys. -
Hyphenation Patterns for Ancient and Modern Greek
Hyphenation Patterns for Ancient and Modern Greek Dimitrios Filippou Kato Gatzea GR-385 00 Volos Greece [email protected] Abstract Several files with Greek hyphenation patterns for TEX can be found on CTAN. However, most of these patterns are for use with Modern Greek texts only. Some of these patterns contain mistakes or are incomplete. Other patterns are suitable only for some now-outdated “Greek TEX” packages. In 2000, after having exam- ined the patterns that existed already, the author made new sets of hyphenation patterns for typesetting Ancient and Modern Greek texts with the greek option of the babel package or with Dryllerakis’ GreeKTEX package. Lately, these pat- terns have found their way even into the ibycus package, which can be used with the Thesaurus Linguae Graecae, and into Ω with the antomega package. The new hyphenation patterns, while not exhaustive, do respect the gram- matical and phonetic rules of three distinct Greek writing systems. In general, all Greek words are hyphenated after a vowel and before a consonant. However, for typesetting Ancient Greek texts, the hyphenation patterns follow the rules estab- lished in 1939 by the Academy of Athens, which allow for breaking up compound words between the last consonant of the first constituent word and the first letter of the second constituent word, provided that the first constituent word has not been changed by elision. For typesetting polytonic (multi-accent) Modern Greek texts, the hyphenation rules distinguish between the nasal and the non-nasal double consonants µπ, ντ, and γκ. In accordance with the latest Greek grammar rules, in monotonic (uni-accent) Modern Greek texts, these double consonants are not split.