A Linguistically Universal Knowledge Representation 3 4 Anonymous Author(S) 5 6 Abstract Table 1
Total Page:16
File Type:pdf, Size:1020Kb
1 2 Pyash: A Linguistically Universal Knowledge Representation 3 4 Anonymous Author(s) 5 6 Abstract Table 1. Glossary 7 A knowledge representation that can precisely store all human 8 knowledge is essential to Artificial General Intelligence being a Natural Language a language that developed naturally through 9 continuation of humanity as post-humanity. By using Linguistic usage by humans, for example English and Chinese. 10 Universals as a basis combined with an orthogonal vocabulary, en- Controlled Natural Language (CNL) is a langauge based on a 11 coded with high density, it can be accomplished with database natural language with certain restictions in vocabulary and 12 level organization. Pyash is such a language that already has tool- grammar, for example Special English and Attempto Con- 13 ing available. In this paper will cover a brief introduction to Pyash trolled English. 14 and it’s usability as a knowledge representation format. Artifical Language is a language that has it’s own grammar and 15 vocabulary even if inspired from human languages, for ex- 16 CCS Concepts •Software and its engineering →General pro- ample Esperanto and Lojban. 17 gramming languages; •Human-centered computing →Natu- 18 ral language interfaces; 19 Keywords grammar, programming language, pivot language Storing all information in the density of text or higher maybe de- 20 sireable for electronic minds as the decline of storage cost is slow- 21 ACM Reference format: Anonymous Author(s). 2017. Pyash: A Linguistically Universal Knowl- ing down[27]. While this maybe because the current PMR/SMR 22 edge Representation. In Proceedings of , Canada, , 7 pages. storage technology is reaching capacity. It is as yet unknown if 23 DOI: 10.1145/nnnnnnn.nnnnnnn MAMR drive technology will help the price decline substantially, 24 even though it can triple density[37]. 25 26 1 Introduction 27 Knowledge has been represented as human language text for thou- 2 Human Fluent 28 sands of years. As we transition to electronic bodies and minds, we Some may be familiar with terms such as “human readable” or pos- 29 can maintain continuity with our linguistic heritage. Though with sibly even “human speakable”, there is a surprisingly large varia- 30 electronic minds, everything will ideally be accessible in a declar- tions of what that may constitute. To make the meaning more clear 31 ative fashion, including that which in human minds is tied up in the term “human fluent” is being used as a language that a human 32 nondeclarative memory. can gain fluency in, the same way they would any natural human 33 Much of the focus on knowledge representation in the field has language. It being a machine readable knowledge representation 34 been on first-order-logic – which is good for semantic knowledge. and programming language allows for meeting the machine half- 35 While predicate logic can be used as a Turing complete program- way, as was desired but not accomplished by Pyash’s predecessor 36 ming language [28], such as with Prolog[63] and SQL[56], only Lojban[57][19]. Since the language is very easy to parse, encode 37 roughly 2% of programming is done in declarative languages [55]. and process by machines it is also machine-fluent. 38 Declarative code is most akin to specification knowledge, whereas 39 imperative code is most akin to implementation knowledge. Both 2.1 Comparison to Lojban 40 are necessary to have a complete knowledge base. In 2017, Google Lojban was a language designed to be very different from human 41 trends showed that ”How”, as in how-to do something[24], was languages, to study the Sapir-Whorf hypothesis, it was based on 42 one of the most common queries. This was humans searching for predicate logic instead of human language. Lojban was never meant 43 imperative or implementation knowledge. to be easy to use (comprehensibility), nor was it meant for trans- 44 Additionally, humans also store episodic knowledge in text, and lation⁇, it was only meant as a highly expressive speakable and 45 there has even been some cursory work on formalizing it[49]. writeable formal language. 46 The amount of knowledge stored in text is ever growing, with Pyash however is meant and designed for translation, compre- 47 the Library Genesis project archiving 75TB of non-fiction books hensibility and as a formal representation. Wheras Lojban has a 48 and scientific papers[51], which is mostly text in various formats. naturalness of 2⁇, making it about as accessible as predicate cal- 49 On the web as a whole as of 2014, archive.org had just over 18PB of culus off of which it is based, Pyash has a naturalness of 5,sinceit 50 archives[4]. Though much of it is images and other media, itcan is based on Linguistic Universals of human languages. 51 be described or captioned with text by neural nets[14][68][64]. 52 2.2 Comparison to Esperanto 53 Permission to make digital or hard copies of all or part of this work for personal or 54 classroom use is granted without fee provided that copies are not made or distributed While Pyash and Esperanto are both meant for comprehensibility for profit or commercial advantage and that copies bear this notice and the fullcita- and translation, Esperanto is not meant for formal representation, 55 tion on the first page. Copyrights for components of this work owned by others than 56 ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- and is difficult to parse due to it’s agglutinative nature, with some- publish, to post on servers or to redistribute to lists, requires prior specific permission 57 times ambigious suffixes and compound words. Esperanto having and/or a fee. Request permissions from [email protected]. a very variable word length is difficult to encode in fixed width 58 , Canada 59 © 2017 ACM. 978-x-xxxx-xxxx-x/YY/MM…$15.00 fashion. Because Esperanto is difficult to parse, encode and pro- DOI: 10.1145/nnnnnnn.nnnnnnn 60 cess it is not machine-fluent while Pyash is. 61 1 , , Canada Anon. 1 Table 2. Formal Grammar 2 3 4 5 ‘’ denotes letter [] denotes perhaps (optional) grammar grammar words start with 6 & denotes and-or <> symbol surrounded by diagonal brack- lower-case letter 7 | denotes exclusive-or ets. … denotes others excluded for brevity 8 , denotes sequence Base base words start with upper-case + denotes one or more of the preceding 9 ::= denotes copula letter symbol 10 <Initial> ::= ‘b’|‘c’|‘d’|‘f’|‘g’|‘k’|‘l’|‘m’|‘n’|‘p’|‘q’|‘r’|‘s’|‘t’|‘v’|‘w’|‘x’|‘y’|‘z’|‘1’|‘8’ 11 <second> ::= ‘l’|‘y’|‘w’|‘r’|‘x’|‘f’|‘s’|‘c’|‘v’|‘z’|‘j’ 12 <vowel> ::= ‘a’|‘e’|‘i’|‘o’|‘u’|‘6’ 13 <tone> ::= ‘’|‘2’|‘7’ 14 <final> ::= ‘p’|‘t’|‘k’|‘f’|‘s’|‘c’|‘m’|‘n’ 15 <short-grammar-word> := <initial>, <vowel>, <tone> 16 <long-grammar-word> ::= <initial>, <second>, <vowel>, <tone>,‘h’ 17 <short-baseword> ::= ‘h’, <initial>, <vowel>, <tone>, <final> 18 <long-baseword> ::= <initial>, <second>, <vowel>, <tone>, <final> 19 <base-word> ::= <short-base-word> | <long-base-word> 20 <boundary-word> ::= <base-word> (which is not found in contents) 21 <encoding-word> ::= Number (for C-style number literals) | Letter (for UTF-8) 22 <quote> ::= quoted , <encoding-word> , [<boundary-word>], ‘_’, (contents) ‘_’, [<boundary-word>], <encoding-word>, quoted 23 <complex-word> ::= <base-word>+ | <quote> 24 <grammar-word> ::= <short-grammar-word> | <long-grammar-word> 25 <word> ::= <grammar-word> | <base-word> 26 <gender> ::= masculine-gender | feminine-gender | anthropic-gender | zoic-gender | … 27 <name> ::= <complex-word>, name, [<gender>] 28 <pronoun> ::= <name> & <gender> & it 29 <simple-number> ::= Zero & One & Two … <complex-word> (where <base-words> within numerical base for locale) 30 <complex-number> ::= <simple-number>, [floating-point <simple-number>] [negatory-quantifier] 31 <sort-width> ::= paucal-number (8bit) | number (16bit, default) | plural-number (32bit) | multal-number (64bit) 32 <simple-sort> ::= [<sort-width>], letter | word | number (unsigned) | integer | floating-point-number | referential 33 <unit-prefix> ::= (numberic-base to the power of) <simple-number>, [negatory-quantifer], prefix 34 <SI-sort> ::= [<unit-prefix>], metre | celsius | kilogram | hertz | radian |mole… 35 <extended-sort> ::= <base-word>, sort 36 <complex-sort> ::= <extended-sort> | <simple-sort> | <SI-sort> 37 <vector-long> ::= One | Two | Three | Four | Eight | Sixteen 38 <vector-sort> ::= <complex-sort>, <vector-long>, vector 39 <sequential-long> ::= <simple-number> 40 <sequential-sort> ::= <complex-sort>, <sequential-long>, vector 41 <sort> ::= <sequential-sort> | <vector-sort> | <complex-sort> 42 <grammatical-case> ::= accusative-case | nominative-case | instrumental-case | ablative-case | benefactive-case | … 43 <extended-connective> ::= <complex-word>, connective-particle 44 <connective> ::= <extended-connective>, and-or, exclusive-or, and 45 <sort-phrase> ::= <complex-word> | <pronoun>, [<sort>] 46 <adjective> ::= <complex-word>, adjective 47 <verb> ::= [<adjective], <complex-word>, [<negation>] 48 <dependent-clause> ::= [clause-tail], <noun-phrase>+, <verb>, [dependent-clause] 49 <genitive-phrase> ::= <dependent-clause> | <complex-word>, genitive 50 <noun-phrase> ::= [<sort-phrase>], [<genitive-phrase>+], [<adjective>+], [<complex-word>], <grammatical-case>, [<connective>] 51 | [<dependent-clause>], <grammatical-case> 52 <verb-phrase> ::= [<verb>], [<tense>], [<aspect>], [<evidential>], <mood> , [<connective>] 53 <independent-clause> ::= (one-or-more) <noun-phrase>, <verb-phrase> 54 <nominal-sentence> ::= <name>, nominative-case, <noun-phrase>