The Adoption of Punctuation 1N Japanese Script
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
On Translation a Short Introduction to the Japanese Language Will
On Translation A short introduction to the Japanese language will illustrate the kind of difficulties one encounters in translating Japanese into the European languages, and vice versa. Linguistically, Japanese is an isolated language. It has no relation to Chinese. It must have had some relation to Korean, another isolated language, but the two went into different directions thousands of years ago. Some linguists claim that the Japanese language, along with the Korean, belongs to the Ural-Altaic family, yet the claim remains hypothetical. The Japanese language features some characteristics that would seem most strange to those who are only familiar with the European languages. For example, a grammatical subject is unnecessary in Japanese to construct a grammatically complete sentence. “淋しい” (Sabishii) means (someone is) lonely. It is a complete sentence, but there is no subject. The sentence may mean ‘I’m lonely,’ ‘you are lonely,’ ‘he/she is lonely,’ ‘the rock is lonely,’ ‘all human beings are lonely,’ etc, depending on the context. It may furthermore refer to a vague sense of loneliness which needn’t be specified. It is true that in some European languages, such as Italian, a grammatically complete sentence is possible without a named subject. But the subject can always be determined by the inflection of the verb (and often also by the changes in the articles, adjectives and nouns): “Sono sola,” “Sei solo.” A Japanese sentence may be very long and still be without a subject. Tale of Genji , for instance, might contain a sequence of three long sentences without subjects, yet in each a different subject would be implied. -
Title Classical Japanese in Linguistic and Cross-Cultural Perspective Sub
Title Classical Japanese in linguistic and cross-cultural perspective Sub Title Author De Wolf, Charles Publisher 慶應義塾大学日吉紀要刊行委員会 Publication 2020 year Jtitle 慶應義塾大学日吉紀要. 英語英米文学 (The Keio University Hiyoshi review of English studies). Vol.73, No.2020 (9. ) ,p.69- 87 Abstract Notes Genre Departmental Bulletin Paper URL https://koara.lib.keio.ac.jp/xoonips/modules/xoonips/detail.php?ko ara_id=AN10030060-20200930-0069 慶應義塾大学学術情報リポジトリ(KOARA)に掲載されているコンテンツの著作権は、それぞれの著作者、学会または 出版社/発行者に帰属し、その権利は著作権法によって保護されています。引用にあたっては、著作権法を遵守して ご利用ください。 The copyrights of content available on the KeiO Associated Repository of Academic resources (KOARA) belong to the respective authors, academic societies, or publishers/issuers, and these rights are protected by the Japanese Copyright Act. When quoting the content, please follow the Japanese copyright act. Powered by TCPDF (www.tcpdf.org) Classical Japanese in Linguistic and Cross-Cultural Perspective1) Charles De Wolf In the preface to his famous A Dictionary of the English Language (1755), Samuel Johnson notes: “When we see men grow old and die at a certain time one after another, from century to century, we laugh at the elixir that promises to prolong life to a thousand years; and with equal justice may the lexicographer be derided, who being able to produce no example of a nation that has preserved their words and phrases from mutability, shall imagine that his dictionary can embalm his language, and secure it from corruption and decay, that it is in his power to change sublunary nature, or clear the world at once from folly, vanity, and affectation.” I cite this not only to show that, though no modern linguist, Johnson was quite aware that “mutability” applies to human language as well as all else that is “sublunary,” but also to note that, as learned as he was, Johnson knew far less about the history of the English language than anyone with curiosity and access to Wikipedia can learn, in a matter of minutes or at most hours. -
ISO Basic Latin Alphabet
ISO basic Latin alphabet The ISO basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in[1] various national and international standards and used widely in international communication. The two sets contain the following 26 letters each:[1][2] ISO basic Latin alphabet Uppercase Latin A B C D E F G H I J K L M N O P Q R S T U V W X Y Z alphabet Lowercase Latin a b c d e f g h i j k l m n o p q r s t u v w x y z alphabet Contents History Terminology Name for Unicode block that contains all letters Names for the two subsets Names for the letters Timeline for encoding standards Timeline for widely used computer codes supporting the alphabet Representation Usage Alphabets containing the same set of letters Column numbering See also References History By the 1960s it became apparent to thecomputer and telecommunications industries in the First World that a non-proprietary method of encoding characters was needed. The International Organization for Standardization (ISO) encapsulated the Latin script in their (ISO/IEC 646) 7-bit character-encoding standard. To achieve widespread acceptance, this encapsulation was based on popular usage. The standard was based on the already published American Standard Code for Information Interchange, better known as ASCII, which included in the character set the 26 × 2 letters of the English alphabet. Later standards issued by the ISO, for example ISO/IEC 8859 (8-bit character encoding) and ISO/IEC 10646 (Unicode Latin), have continued to define the 26 × 2 letters of the English alphabet as the basic Latin script with extensions to handle other letters in other languages.[1] Terminology Name for Unicode block that contains all letters The Unicode block that contains the alphabet is called "C0 Controls and Basic Latin". -
Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only. -
Threshold Digraphs
Volume 119 (2014) http://dx.doi.org/10.6028/jres.119.007 Journal of Research of the National Institute of Standards and Technology Threshold Digraphs Brian Cloteaux1, M. Drew LaMar2, Elizabeth Moseman1, and James Shook1 1National Institute of Standards and Technology, Gaithersburg, MD 20899 2The College of William and Mary, Williamsburg, VA 23185 [email protected] [email protected] [email protected] [email protected] A digraph whose degree sequence has a unique vertex labeled realization is called threshold. In this paper we present several characterizations of threshold digraphs and their degree sequences, and show these characterizations to be equivalent. Using this result, we obtain a new, short proof of the Fulkerson-Chen theorem on degree sequences of general digraphs. Key words: digraph realizations; Fulkerson-Chen theorem; threshold digraphs. Accepted: May 6, 2014 Published: May 20, 2014 http://dx.doi.org/10.6028/jres.119.007 1. Introduction Creating models of real-world networks, such as social and biological interactions, is a central task for understanding and measuring the behavior of these networks. A usual first step in this type of model creation is to construct a digraph with a given degree sequence. We examine the extreme case of digraph construction where for a given degree sequence there is exactly one digraph that can be created. What follows is a brief introduction to the notation used in the paper. For notation not otherwise defined, see Diestel [1]. We let G=( VE , ) be a digraph where E is a set of ordered pairs called arcs. If (,vw )∈ E, then we say w is an out-neighbor of v and v is an in-neighbor of w. -
Source-Based Translation and Foreignization: a Japanese Case
Source-Based Translation and Foreignization Source-Based Translation and Foreignization: A Japanese Case Yukari Fukuchi Meldrum Modern Languages and Cultural Studies, University of Alberta Introduction Foreignization, as currently understood in Translation Studies, is a concept that is charged with “more emphasis on the ideological pressure against the target-language culture than on the faithfulness to the original text” (Tamaki, 2005: 239). In other words, it is a conscious operation of bringing a foreign flavor into translations in order to counteract the effects of domestication, claimed by Venuti (1995) to be the cause of invisibility of translation and translators. Tamaki, in her 2005 paper, also cautions that the concept of foreignization should not be confused with a literal method of translation. Literal translation does not involve ideological intentions and is a mere translation method. In this paper, I will attempt to provide a supporting view that source-based translation, often seen in Japanese translation, needs to be understood outside of foreignization in the above sense. Specifically, I will illustrate that Japanese readers, in premodern times, had to gain specific knowledge and adapt to what was required in order to read and interpret texts in a satisfactory manner. This could have been a factor for the source-orientedness of Japanese translations still observed in a certain form today. By examining this background of Japanese text culture, the more source-based translation is shown to be merely a translation carried out by a literal method without any political or ideological intentions. Therefore, the concept of foreignization does not have a place in Japanese translation. -
BBJ1RBR '(]~ F'o £.A2WB IJ.Boibiollb, •
801 80 mua1. , .. DDl&IOI o• OOLU•••• . I I:~ I BBJ1RBR '(]~ f'O £.A2WB IJ.BOIBIOllB, • •• • I r. • • Not Current- 1803 , 0 TIUI t ohn Tock r, q., of o ton, be i an k ep hi office at th eat of the o nbt practi , ither a an attorn y or a 0 &&all ontinue to b clerk of the ame. hat (until furth r orde ) it hall be om or cou ello to practi e in th· uch for thr e ea pa t in th uprem p cti ly b long, and that their pri ate r to be fair. &. t coun llors hall not pract• e a ........ ello , in h · court. 6. , That they hall re pecti el take the ------, do olemnly ear, that I ill or coun ellor of the court) uprightly, and upport the o titution of the nited &. t (unle and until it hall othe · e of · court hall be in the name of the That he cou ello and attorne , hall ake either an oath, or, in proper cribe b the rule of this court on that r•••, 1790, iz.: '' , , do olemnly be) that will demean m elf, as an attor .....· oo rt uprightl , an according to la , and that tion of he nited ta •'' hi f ju ice, in a er to the motion of the info him and the bar, that thi court of king' bench, and of chancery, in Eng ice of th• court; and that they will, tio therein circumstances may ren- x_...vu•• Not Current- 1803 0 '191, ebraary 4. -
Whenever There Is a Line Or Tilde Over a Vowel, It Is an Abbreviation for the Letters M Or N. in This Case, the Full Word Is Quadam
Whenever there is a line or tilde over a vowel, it is an abbreviation for the letters m or n. In this case, the full word is quadam. In this instance, the line over the e represents an omission of the letter n, giving the word debent when written in full. Whenever a word in Latin had an instance of the letter i in succession, the second i was always written as a j. This applies to all words wherein ii is present and also to the Roman numeral ii, which was written as ij. Therefore, cambij = cambii, negotijs = negotiis. The e with a cedilla was an abbreviation for the letters ae. This word spelled out is quae. Following the letter q, a period or semicolon was used as an abbreviation for the letters ue. The first example reads denique, the second one is quacunque. In the seventeenth century, the letter u and v were interchangeable and not yet viewed as two distinct letters. While not universal, a general rule is that if u or v began a word, it was written as a v; if a u or v occurred in the middle or end of a word, it was written as a u. In the 17th century, a lowercase s had two forms- one long and one short. As a general rule, when a lowercase s occurred as the first letter of the word or in any other place in the word except as the final letter, it was written with a long s, which resembles the letter f. -
Patterns Identification for Hitting Adjacent Key Errors Correction Using Neural Network Models
PATTERNS IDENTIFICATION FOR HITTING ADJACENT KEY ERRORS CORRECTION USING NEURAL NETWORK MODELS Jun Li Department of Land Economy, University of Cambridge, 19 Silver Street, Cambridge, U.K. Karim Ouazzane, Sajid Afzal, Hassan Kazemian Faculty of Computing, London Metropolitan University, London, U.K. Keywords: QWERTY Keyboard, Probabilistic Neural Network, Backpropagation, Key Distance, Time Gap, Error Margin Distance. Abstract: People with Parkinson diseases or motor disability miss-stroke keys. It appears that keyboard layout, key distance, time gap are affecting this group of people’s typing performance. This paper studies these features based on neural network learning algorithms to identify the typing patterns, further to correct the typing mistakes. A specific user typing performance, i.e. Hitting Adjacent Key Errors, is simulated to pilot this research. In this paper, a Time Gap and a Prediction using Time Gap model based on BackPropagation Neural Network, and a Distance, Angle and Time Gap model based on the use of Probabilistic Neural Network are developed respectively for this particular behaviour. Results demonstrate a high performance of the designed model, about 70% of all tests score above Basic Correction Rate, and simulation also shows a very unstable trend of user’s ‘Hitting Adjacent Key Errors’ behaviour with this specific datasets. 1 INTRODUCTION technologies to tackle the typing errors as a whole, while there is no specific solution and convincing Computer users with disabilities or some elderly results for solving the ‘Miss-stroke’ error. people may have difficulties in accurately In the field of computer science, a neural manipulating the QWERTY keyboard. For example, network is a mathematical model or computational motor disability can cause significant typing model that is inspired by the structure and/or mistakes. -
The Dot Product
The Dot Product In this section, we will now concentrate on the vector operation called the dot product. The dot product of two vectors will produce a scalar instead of a vector as in the other operations that we examined in the previous section. The dot product is equal to the sum of the product of the horizontal components and the product of the vertical components. If v = a1 i + b1 j and w = a2 i + b2 j are vectors then their dot product is given by: v · w = a1 a2 + b1 b2 Properties of the Dot Product If u, v, and w are vectors and c is a scalar then: u · v = v · u u · (v + w) = u · v + u · w 0 · v = 0 v · v = || v || 2 (cu) · v = c(u · v) = u · (cv) Example 1: If v = 5i + 2j and w = 3i – 7j then find v · w. Solution: v · w = a1 a2 + b1 b2 v · w = (5)(3) + (2)(-7) v · w = 15 – 14 v · w = 1 Example 2: If u = –i + 3j, v = 7i – 4j and w = 2i + j then find (3u) · (v + w). Solution: Find 3u 3u = 3(–i + 3j) 3u = –3i + 9j Find v + w v + w = (7i – 4j) + (2i + j) v + w = (7 + 2) i + (–4 + 1) j v + w = 9i – 3j Example 2 (Continued): Find the dot product between (3u) and (v + w) (3u) · (v + w) = (–3i + 9j) · (9i – 3j) (3u) · (v + w) = (–3)(9) + (9)(-3) (3u) · (v + w) = –27 – 27 (3u) · (v + w) = –54 An alternate formula for the dot product is available by using the angle between the two vectors. -
The Selnolig Package: Selective Suppression of Typographic Ligatures*
The selnolig package: Selective suppression of typographic ligatures* Mico Loretan† 2015/10/26 Abstract The selnolig package suppresses typographic ligatures selectively, i.e., based on predefined search patterns. The search patterns focus on ligatures deemed inappropriate because they span morpheme boundaries. For example, the word shelfful, which is mentioned in the TEXbook as a word for which the ff ligature might be inappropriate, is automatically typeset as shelfful rather than as shelfful. For English and German language documents, the selnolig package provides extensive rules for the selective suppression of so-called “common” ligatures. These comprise the ff, fi, fl, ffi, and ffl ligatures as well as the ft and fft ligatures. Other f-ligatures, such as fb, fh, fj and fk, are suppressed globally, while making exceptions for names and words of non-English/German origin, such as Kafka and fjord. For English language documents, the package further provides ligature suppression rules for a number of so-called “discretionary” or “rare” ligatures, such as ct, st, and sp. The selnolig package requires use of the LuaLATEX format provided by a recent TEX distribution, e.g., TEXLive 2013 and MiKTEX 2.9. Contents 1 Introduction ........................................... 1 2 I’m in a hurry! How do I start using this package? . 3 2.1 How do I load the selnolig package? . 3 2.2 Any hints on how to get started with LuaLATEX?...................... 4 2.3 Anything else I need to do or know? . 5 3 The selnolig package’s approach to breaking up ligatures . 6 3.1 Free, derivational, and inflectional morphemes . -
Unicode Encoding the TITUS Project
ALCTS "Library Catalogs and Non- 2004 ALA Annual Conference Orlando Roman Scripts" Program Unicode Encoding and Online Data Access Ralf Gehrke / Jost Gippert The TITUS Project („Thesaurus indogermanischer Text- und Sprachmaterialien“) (since 1987/1993) www.ala.org/alcts 1 ALCTS "Library Catalogs and Non- 2004 ALA Annual Conference Orlando Roman Scripts" Program Scope of the TITUS project: • Electronic retrieval engine covering the textual heritage of all ancient Indo-European languages • Present retrieval task: – Documentation of the usage of all word forms occurring in the texts, in their resp. contexts • Survey of the parts of the text database: – http://titus.uni-frankfurt.de/texte/texte2.htm Data formats (since 1995): • Text formats: – WordCruncher Text format (8-Bit) – HTML (UTF-8 Unicode 4.0) – (Plain 7-bit ASCII format) • Database format: – MS Access (relational, Unicode-based) – Retrieval via SQL www.ala.org/alcts 2 ALCTS "Library Catalogs and Non- 2004 ALA Annual Conference Orlando Roman Scripts" Program Original Scripts Covered: • Latin (with all kinds of diacritics), incl. variants* • Greek • Slavic (Cyrillic and Glagolitic*) • Armenian • Georgian • Devangar • Other Brhm scripts (Tocharian, Khotanese)* • Avestan* • Middle Persian (Pahlav)* • Manichean* • Arabic (incl. Persian) • Runic • Ogham • and many more * not yet encodable (as such) in Unicode Example 1a: Donelaitis (Lithuanian: formatted text incl. diacritics: 8-bit version) www.ala.org/alcts 3 ALCTS "Library Catalogs and Non- 2004 ALA Annual Conference Orlando Roman Scripts" Program Example 1b: Donelaitis (Lithuanian: formatted text incl. diacritics: Unicode version) Example 2a: Catechism (Old Prussian: formatted text incl. diacritics: 8-bit version, special TITUS font) www.ala.org/alcts 4 ALCTS "Library Catalogs and Non- 2004 ALA Annual Conference Orlando Roman Scripts" Program Example 2b: Catechism (Old Prussian: formatted text incl.