Unicode Compression: Does Size Really Matter? TR CS-2002-11
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Unicode/IDN Security Mark Davis President, Unicode Consortium Chief SW Globalization Arch., IBM the Unicode Consortium
Unicode/IDN Security Mark Davis President, Unicode Consortium Chief SW Globalization Arch., IBM The Unicode Consortium Software globalization standards: define properties and behavior for every character in every script Unicode Standard: a unique code for every character Common Locale Data Repository: LDML format plus repository for required locale data Collation, line breaking, regex, charset mapping, … Used by every major modern operating system, browser, office software, email client,… Core of XML, HTML, Java, C#, C (with ICU), Javascript, … Security ~ Identity System A X = x System B X ≠ x IDN You get an email about your paypal.com account, click on the link… You carefully examine your browser's address box to make sure that it is actually going to http://paypal.com/ … But actually it is going to a spoof site: “paypal.com” with the Cyrillic letter “p”. You (System A) think that they are the same DNS (System B) thinks they are different Examples: Letters Cross-Script p in Latin vs p in Cyrillic In-Script Sequences rn may appear at display sizes like m ! + ा typically looks identical to # so̷s looks like søs Rendering Support ä with two umlauts may look the same as ä with one el is actually e + l Examples: Numbers Western 0 1 2 3 4 5 6 7 8 9 Bengali ৺৺৺৺৺৺৺৺৺৺ Oriya ୱୱୱୱୱୱୱୱୱୱ Thus ৺ୱ = 42 Syntax Spoofing http://example.org/1234/not.mydomain.com http://example.org/1234/not.mydomain.com / = fraction-slash Also possible without Unicode: http://example.org--long-and-obscure-list-of- characters.mydomain.com UTR -
Network Working Group D. Goldsmith Request for Comments: 1641 M
Network Working Group D. Goldsmith Request for Comments: 1641 M. Davis Category: Experimental Taligent, Inc. July 1994 Using Unicode with MIME Status of this Memo This memo defines an Experimental Protocol for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Abstract The Unicode Standard, version 1.1, and ISO/IEC 10646-1:1993(E) jointly define a 16 bit character set (hereafter referred to as Unicode) which encompasses most of the world's writing systems. However, Internet mail (STD 11, RFC 822) currently supports only 7- bit US ASCII as a character set. MIME (RFC 1521 and RFC 1522) extends Internet mail to support different media types and character sets, and thus could support Unicode in mail messages. MIME neither defines Unicode as a permitted character set nor specifies how it would be encoded, although it does provide for the registration of additional character sets over time. This document specifies the usage of Unicode within MIME. Motivation Since Unicode is starting to see widespread commercial adoption, users will want a way to transmit information in this character set in mail messages and other Internet media. Since MIME was expressly designed to allow such extensions and is on the standards track for the Internet, it is the most appropriate means for encoding Unicode. RFC 1521 and RFC 1522 do not define Unicode as an allowed character set, but allow registration of additional character sets. In addition to allowing use of Unicode within MIME bodies, another goal is to specify a way of using Unicode that allows text which consists largely, but not entirely, of US-ASCII characters to be represented in a way that can be read by mail clients who do not understand Unicode. -
HAIL: an Algorithm for the Hardware Accelerated Identification of Languages, Master's Thesis, May 2006
Washington University in St. Louis Washington University Open Scholarship All Computer Science and Engineering Research Computer Science and Engineering Report Number: WUCSE-2006-36 2006-01-01 HAIL: An Algorithm for the Hardware Accelerated Identification of Languages, Master's Thesis, May 2006 Charles M. Kastner This thesis examines in detail the Hardware-Accelerated Identification of Languages (HAIL) project. The goal of HAIL is to provide an accurate means to identify the language and encoding used in streaming content, such as documents passed over a high-speed network. HAIL has been implemented on the Field-programmable Port eXtender (FPX), an open hardware platform developed at Washington University in St. Louis. HAIL can accurately identify the primary languages and encodings used in text at rates much higher than what can be achieved by software algorithms running on microprocessors. Follow this and additional works at: https://openscholarship.wustl.edu/cse_research Part of the Computer Engineering Commons, and the Computer Sciences Commons Recommended Citation Kastner, Charles M., " HAIL: An Algorithm for the Hardware Accelerated Identification of Languages, Master's Thesis, May 2006" Report Number: WUCSE-2006-36 (2006). All Computer Science and Engineering Research. https://openscholarship.wustl.edu/cse_research/187 Department of Computer Science & Engineering - Washington University in St. Louis Campus Box 1045 - St. Louis, MO - 63130 - ph: (314) 935-6160. Department of Computer Science & Engineering 2006-36 HAIL: An Algorithm for the Hardware Accelerated Identification of Languages, Master's Thesis, May 2006 Authors: Charles M. Kastner Corresponding Author: [email protected] Web Page: http://www.arl.wustl.edu/projects/fpx/reconfig.htm Abstract: This thesis examines in detail the Hardware-Accelerated Identification of Languages (HAIL) project. -
NILES 66O YARTON BILER IL 6Y714 Èditiofl ; 4 (RJW9663900
REC'O'J35 AUG % N o ç N1LE PUBLIC LiBRARY NILES 66O YARTON BILER IL 6Y714 Èditiofl ;_4_ (RJW9663900 VOL 39, NO.23 flIURSDYNOVEMBER301995 so CENTS PER COPY - Parkway Bank, Pontarelli --Car crashes,three police departm... ents usedto-apprehend thief - condos getgreen light byKathIeenQujrfrId Wild police chase endsin Voting unanimously, the would generate increased traffic Nues Village Board has finally congestion on their streets which laidtorestthecontroversy couldcreate unsafeconditions for -- - surrounding Parkway Bank and the areasmany children. - thief's arrest in Nues -ilssoon to be nwghhors, by Of particular concern to the By Rosemary Tirio approving the rezonrng petition residents was the exit peoposed to Niles andMorton Drove police About thetimeMulvaney to Greenwood and then proceed- permittingthe bank lo begin be developed by the hank onto joined Glenviewpolice in a highreached East Lake Avenue,aed soath ou Greenwood- to constrnction. Dobsen Street. - speed chase around 1 p.m. Nov.Glenview detective picked upon Shermer Road and Carol Avenue Parkway Blank hasbeen The Nibs Village Boaed heard 27 that endcd al Shermer Roadthe dispatch andjoined the chase. iuNites. attempl'mgto purchasethe- the mower atits -October 24 and Carol Avenue inNiles. Mulvaneycontinued-alhigh Around 1:30 p.m., the two of- property at 7601 N. Milwaukee meeting, bot tabled their vole The incident bogan in the park- speed fenders exited their moving veN- - Avenar(currently,the Nues until a Public Hearing could be leg lot of the Oleobrook Market Traveling at speeds betweencte al Shermer and Carol and a VillageHall) and establish a -conducted. Place shopping center at Willow65 mph and 70 mph, Mulvoneyfootchaseenssed. -
THE MEANING of “EQUALS” Charles Darr Take a Look at the Following Mathematics Problem
PROFESSIONAL DEVELOPMENT THE MEANING OF “EQUALS” Charles Darr Take a look at the following mathematics problem. has been explored by mathematics education researchers internationally (Falkner, Levi and Carpenter, 1999; Kieran, 1981, 1992; MacGregor Write the number in the square below that you think and Stacey, 1997; Saenz-Ludlow and Walgamuth, 1998; Stacey and best completes the equation. MacGregor, 1999; Wright, 1999). However, the fact that so many students appear to have this 4 + 5 = + 3 misconception highlights some important considerations that we, as mathematics educators, can learn from and begin to address. How difficult do you consider this equation to be? At what age do you In what follows I look at two of these considerations. I then go on to think a student should be able to complete it correctly? discuss some strategies we can employ in the classroom to help our I recently presented this problem to over 300 Year 7 and 8 students at students gain a richer sense of what the equals sign represents. a large intermediate school. More than half answered it incorrectly.1 Moreover, the vast majority of the students who wrote something other The meaning of “equals” than 6 were inclined to write the missing number as “9”. Firstly, it is worth considering what the equals sign means in a This result is not just an intermediate school phenomenon, however. mathematical sense, and why so many children seem to struggle to Smaller numbers of students in other Years were also tested. In Years 4, develop this understanding. 5 and 6, even larger percentages wrote incorrect responses, while at Years From a mathematical point of view, the equals sign is not a command 9 and 10, more than 20 percent of the students were still not writing a to do something. -
Vntex — Typesetting Vietnamese Hàn Thế Thành Reinhard Kotucha
VnTEX — Typesetting Vietnamese Hàn Thế Thành Reinhard Kotucha Abstract VnTEX is an extension to Donald Knuth’s TEX typesetting system which provides support for typesetting Vietnamese. The primary site of VnTEX is http://vntex.sf.net. 1 Where to get Help The current maintainers of VnTEX are: I Hàn Thế Thành [email protected] I Reinhard Kotucha [email protected] I Werner Lemberg [email protected] There is a mailing list (very low traffic) for questions about VnTEX and typesetting Vietnamese. To subscribe to the list, visit: http://lists.sourceforge.net/lists/listinfo/vntex-users There is also a Wiki: http://vntex.info 2 Related Documents The following files are part of the VnTEX distribution I Hàn Thế Thành, Hỗ trợ tiếng Việt cho TEX I Hàn Thế Thành, Minimal steps to typeset Vietnamese I Hàn Thế Thành và Thái Phú Khánh Hòa, Dùng font với VnTEX The following files are not part of VnTEX but might be part of the TEX distribution you are using. I The American Mathematical Society, Hướng dẫn sử dụng gói amsmath, http://ctan.org/tex-archive/info/amslatex/vietnamese/amsldoc-vi.pdf http://ctan.org/tex-archive/info/amslatex/vietnamese/amsldoc-print-vi.pdf I H. Partl, E. Schlegl, I. Hyna, T. Oetiker, Một tài liệu ngắn gọn giới thiệu về LATEX 2", Translated by Nguyễn Tân Khoa. http://ctan.org/tex-archive/info/lshort/vietnamese/lshort-vi.pdf I Wolfgang May, Andreas Schlechte, Mở rộng môi trường định lý. Translated by Huỳnh Kỳ Anh. http://ctan.org/tex-archive/info/translations/vn/ntheorem-doc-vn.pdf 1 3 Typesetting Vietnamese In order to typeset Vietnamese, you need a text editor which supports Vietnamese. -
Basis Technology Unicode対応ライブラリ スペックシート 文字コード その他の名称 Adobe-Standard-Encoding A
Basis Technology Unicode対応ライブラリ スペックシート 文字コード その他の名称 Adobe-Standard-Encoding Adobe-Symbol-Encoding csHPPSMath Adobe-Zapf-Dingbats-Encoding csZapfDingbats Arabic ISO-8859-6, csISOLatinArabic, iso-ir-127, ECMA-114, ASMO-708 ASCII US-ASCII, ANSI_X3.4-1968, iso-ir-6, ANSI_X3.4-1986, ISO646-US, us, IBM367, csASCI big-endian ISO-10646-UCS-2, BigEndian, 68k, PowerPC, Mac, Macintosh Big5 csBig5, cn-big5, x-x-big5 Big5Plus Big5+, csBig5Plus BMP ISO-10646-UCS-2, BMPstring CCSID-1027 csCCSID1027, IBM1027 CCSID-1047 csCCSID1047, IBM1047 CCSID-290 csCCSID290, CCSID290, IBM290 CCSID-300 csCCSID300, CCSID300, IBM300 CCSID-930 csCCSID930, CCSID930, IBM930 CCSID-935 csCCSID935, CCSID935, IBM935 CCSID-937 csCCSID937, CCSID937, IBM937 CCSID-939 csCCSID939, CCSID939, IBM939 CCSID-942 csCCSID942, CCSID942, IBM942 ChineseAutoDetect csChineseAutoDetect: Candidate encodings: GB2312, Big5, GB18030, UTF32:UTF8, UCS2, UTF32 EUC-H, csCNS11643EUC, EUC-TW, TW-EUC, H-EUC, CNS-11643-1992, EUC-H-1992, csCNS11643-1992-EUC, EUC-TW-1992, CNS-11643 TW-EUC-1992, H-EUC-1992 CNS-11643-1986 EUC-H-1986, csCNS11643_1986_EUC, EUC-TW-1986, TW-EUC-1986, H-EUC-1986 CP10000 csCP10000, windows-10000 CP10001 csCP10001, windows-10001 CP10002 csCP10002, windows-10002 CP10003 csCP10003, windows-10003 CP10004 csCP10004, windows-10004 CP10005 csCP10005, windows-10005 CP10006 csCP10006, windows-10006 CP10007 csCP10007, windows-10007 CP10008 csCP10008, windows-10008 CP10010 csCP10010, windows-10010 CP10017 csCP10017, windows-10017 CP10029 csCP10029, windows-10029 CP10079 csCP10079, windows-10079 -
New Mexico Daily Lobo, Volume 087, No 48, 10/27/1982." 87, 48 (1982)
University of New Mexico UNM Digital Repository 1982 The aiD ly Lobo 1981 - 1985 10-27-1982 New Mexico Daily Lobo, Volume 087, No 48, 10/ 27/1982 University of New Mexico Follow this and additional works at: https://digitalrepository.unm.edu/daily_lobo_1982 Recommended Citation University of New Mexico. "New Mexico Daily Lobo, Volume 087, No 48, 10/27/1982." 87, 48 (1982). https://digitalrepository.unm.edu/daily_lobo_1982/130 This Newspaper is brought to you for free and open access by the The aiD ly Lobo 1981 - 1985 at UNM Digital Repository. It has been accepted for inclusion in 1982 by an authorized administrator of UNM Digital Repository. For more information, please contact [email protected]. 3 '?'if· 7?5( l)rJ ~ Q t..U Vol. 87 No.48 Wednesday, October 27, 1982 Rumors abound in presidential search Craig Chrissinger Be mali llo county, said he had not heard of the Anaya rumor. He With the third presidential finalist labeled that rumor as false, adding to visit campus scheduled for next Anaya has not promised anyone week, rumors are circulating that In jobs, including his campaign terim President John Perovich is workers. being considered as a possible Perovich was nominated as a seventh candidate for the post. candidate, but removed his name Perovich denied Monday that the from consideration in August when UNM Board of Regents had the Presidential Search Committee approached him. He declined to still was screening the candidates. comment about whether he would Regents Chairman Henry Jara allow his name to be considered, if millo has said the Regents still arc asked to be !l candidate. -
It's the Symbol You Put Before the Answer
It’s the symbol you put before the answer Laura Swithinbank discovers that pupils often ‘see’ things differently “It’s the symbol you put before the answer.” 3. A focus on both representing and solving a problem rather than on merely solving it; ave you ever heard a pupil say this in response to being asked what the equals sign is? Having 4. A focus on both numbers and letters, rather than H heard this on more than one occasion in my on numbers alone (…) school, I carried out a brief investigation with 161 pupils, 5. A refocusing of the meaning of the equals sign. Year 3 to Year 6, in order to find out more about the way (Kieran, 2004:140-141) pupils interpret the equals sign. The response above was common however, what really interested me was the Whilst reflecting on the Kieran summary, some elements variety of other answers I received. Essentially, it was immediately struck a chord with me with regard to my own clear that pupils did not have a consistent understanding school context, particularly the concept of ‘refocusing the of the equals sign across the school. This finding spurred meaning of the equals sign’. The investigation I carried me on to consider how I presented the equals sign to my out clearly echoed points raised by Kieran. For example, pupils, and to deliberate on the following questions: when asked to explain the meaning of the ‘=’ sign, pupils at my school offered a wide range of answers including: • How can we define the equals sign? ‘the symbol you put before the answer’; ‘the total of the • What are the key issues and misconceptions with number’; ‘the amount’; ‘altogether’; ‘makes’; and ‘you regard to the equals sign in primary mathematics? reveal the answer after you’ve done the maths question’. -
Perl II Operators, Truth, Control Structures, Functions, and Processing the Command Line
Perl II Operators, truth, control structures, functions, and processing the command line Dave Messina v3 2012 1 Math 1 + 2 = 3 # kindergarten x = 1 + 2 # algebra my $x = 1 + 2; # Perl What are the differences between the algebra version and the Perl version? 2 Math my $x = 5; my $y = 2; my $z = $x + $y; 3 Math my $sum = $x + $y; my $difference = $x - $y; my $product = $x * $y; my $quotient = $x / $y; my $remainder = $x % $y; 4 Math my $x = 5; my $y = 2; my $sum = $x + $y; my $product = $x - $y; Variable names are arbitrary. Pick good ones! 5 What are these called? my $sum = $x + $y; my $difference = $x - $y; my $product = $x * $y; my $quotient = $x / $y; my $remainder = $x % $y; 6 Numeric operators Operator Meaning + add 2 numbers - subtract left number from right number * multiply 2 numbers / divide left number from right number % divide left from right and take remainder take left number to the power ** of the right number 7 Numeric comparison operators Operator Meaning < Is left number smaller than right number? > Is left number bigger than right number? <= Is left number smaller or equal to right? >= Is left number bigger or equal to right? == Is left number equal to right number? != Is left number not equal to right number? Why == ? 8 Comparison operators are yes or no questions or, put another way, true or false questions True or false: > Is left number larger than right number? 2 > 1 # true 1 > 3 # false 9 Comparison operators are true or false questions 5 > 3 -1 <= 4 5 == 5 7 != 4 10 What is truth? 0 the number 0 is false "0" -
New Emoji Requests from Twitter Users: When, Where, Why, and What We Can Do About Them
1 New Emoji Requests from Twitter Users: When, Where, Why, and What We Can Do About Them YUNHE FENG, University of Tennessee, Knoxville, USA ZHENG LU, University of Tennessee, Knoxville, USA WENJUN ZHOU, University of Tennessee, Knoxville, USA ZHIBO WANG, Wuhan University, China QING CAO∗, University of Tennessee, Knoxville, USA As emojis become prevalent in personal communications, people are always looking for new, interesting emojis to express emotions, show attitudes, or simply visualize texts. In this study, we collected more than thirty million tweets mentioning the word “emoji” in a one-year period to study emoji requests on Twitter. First, we filtered out bot-generated tweets and extracted emoji requests from the raw tweets using a comprehensive list of linguistic patterns. To our surprise, some extant emojis, such as fire and hijab , were still frequently requested by many users. A large number of non-existing emojis were also requested, which were classified into one of eight emoji categories by Unicode Standard. We then examined patterns of new emoji requests by exploring their time, location, and context. Eagerness and frustration of not having these emojis were evidenced by our sentiment analysis, and we summarize users’ advocacy channels. Focusing on typical patterns of co-mentioned emojis, we also identified expressions of equity, diversity, and fairness issues due to unreleased but expected emojis, and summarized the significance of new emojis on society. Finally, time- continuity sensitive strategies at multiple time granularity levels were proposed to rank petitioned emojis by the eagerness, and a real-time monitoring system to track new emoji requests is implemented. -
Package 'Fontmplus'
Package ‘fontMPlus’ February 27, 2017 Title Additional 'ggplot2' Themes Using 'M+' Fonts Version 0.1.1 Description Provides 'ggplot2' themes based on the 'M+' fonts. The 'M+' fonts are a font family under a free license. The font family provides multilingual glyphs. The fonts provide 'Kana', over 5,000 'Kanji', Basic Latin, Latin-1 Supplement, Latin Extended-A, and 'IPA' Extensions glyphs. Most of the Greek, Cyrillic, Vietnamese, and extended glyphs and symbols are included too. So the fonts are in conformity with ISO-8859-1, 2, 3, 4, 5, 7, 9, 10, 13, 14, 15, 16, Windows-1252, T1, and VISCII encoding. More information about the fonts can be found at <http://mplus-fonts.osdn.jp/about-en.html>. Depends R (>= 3.0.0) License MIT + file LICENSE Encoding UTF-8 LazyData true RoxygenNote 6.0.1 Imports hrbrthemes, extrafont, ggplot2 Suggests stringr, knitr, rmarkdown URL https://github.com/bhaskarvk/fontMPlus BugReports https://github.com/bhaskarvk/fontMPlus/issues VignetteBuilder knitr NeedsCompilation no Author Bhaskar Karambelkar [aut, cre], MPlus [cph] Maintainer Bhaskar Karambelkar <[email protected]> Repository CRAN Date/Publication 2017-02-27 08:15:30 1 2 fontMPlus R topics documented: fontMPlus . .2 import_mplus . .3 mplus.fontfamilies . .3 mplus.fonttable . .4 theme_ipsum_mplus_c1 . .4 theme_ipsum_mplus_c2 . .6 theme_ipsum_mplus_m1 . .7 theme_ipsum_mplus_m2 . .9 theme_ipsum_mplus_mn1 . 10 theme_ipsum_mplus_p1 . 12 theme_ipsum_mplus_p2 . 14 Index 16 fontMPlus Additional ggplot2 themes using M+ fonts. Description This is an add-on pacakge for hrbrthemes pacakge. It provides seven ggplot2 themes based on M+ fonts. M+ FONTS The M+ FONTS are a font family under the Free license. You can use, copy, and distribute them, with or without modification, either commercially or noncommercially.