Tilde at WMT 2020: News Task Systems

Total Page:16

File Type:pdf, Size:1020Kb

Tilde at WMT 2020: News Task Systems Tilde at WMT 2020: News Task Systems Rihards Krislauksˇ yz and Marcis¯ Pinnisyz yTilde / Vienibas gatve 75A, Riga, Latvia zFaculty of Computing, University of Latvia / Rain¸a bulv. 19, Riga, Latvia [email protected] Abstract The paper is further structured as follows: Sec- tion2 describes the data used to train our NMT This paper describes Tilde’s submission to the systems, Section3 describes our efforts to identify WMT2020 shared task on news translation the best-performing recipes for training of our final for both directions of the English$Polish lan- systems, Section5 summarises the results of our guage pair in both the constrained and the un- final systems, and Section6 concludes the paper. constrained tracks. We follow our submissions form the previous years and build our base- 2 Data line systems to be morphologically motivated sub-word unit-based Transformer base models For training of the constrained NMT systems, we that we train using the Marian machine trans- used data from the WMT 2020 shared task on lation toolkit. Additionally, we experiment news translation1. For unconstrained systems, we with different parallel and monolingual data 2 selection schemes, as well as sampled back- used data from the Tilde Data Library . The 10 translation. Our final models are ensembles of largest publicly available datasets that were used Transformer base and Transformer big models to train the unconstrained systems were Open Sub- which feature right-to-left re-ranking. titles from the Opus corpus (Tiedemann, 2016), ParaCrawl (Banon´ et al., 2020) (although it was 1 Introduction discarded due to noise found in the corpus), DGT Translation Memories (Steinberger et al., 2012), This year, we developed both constrained and un- Microsoft Translation and User Interface Strings constrained NMT systems for the English$Polish Glossaries3 from multiple releases up to 2018, the language pair. We base our methods on the sub- Tilde MODEL corpus (Rozis and Skadin¸sˇ, 2017), missions of the previous years (Pinnis et al., 2017b, WikiMatrix (Schwenk et al., 2019), Digital Corpus 2018, 2019) including methods for parallel data of the European Parliament (Hajlaoui et al., 2014), filtering from Pinnis(2018). Specifically, we lean JRC-Acquis (Steinberger et al., 2006), Europarl on Pinnis(2018) and Junczys-Dowmunt(2018) for (Koehn, 2005), and the QCRI Educational Domain data selection and filtering, (Pinnis et al., 2017b) Corpus (Abdelali et al., 2014). for morphologically motivated sub-word units and synthetic data generation, Edunov et al.(2018) for 2.1 Data Filtering and Pre-Processing sampled back-translation and finally Morishita et al. First, we filtered data using Tilde’s parallel data fil- (2018) for re-ranking with right-to-left models. We tering methods (Pinnis, 2018) that allow discarding use the Marian toolkit (Junczys-Dowmunt et al., sentence pairs that are corrupted, have low content 2018) to train models of Transformer architecture overlap, feature wrong language content, feature (Vaswani et al., 2017). too high non-letter ratio, etc. The exact filter con- Although document level NMT as showcased by figuration is defined in the paper by (Pinnis, 2018). (Junczys-Dowmunt, 2019) have yielded promising Then, we pre-processed all data using Tilde’s results for the English-German language pair, we parallel data pre-processing workflow that nor- were not able to collect sufficient document level 1 data for the English-Polish language pair. As a http://www.statmt.org/wmt20/translation-task.html 2https://www.tilde.com/products-and-services/data- result, all our systems this year translate individual library sentences. 3https://www.microsoft.com/en-us/language/translations 175 Proceedings of the 5th Conference on Machine Translation (WMT), pages 175–180 Online, November 19–20, 2020. c 2020 Association for Computational Linguistics Lang. Filtered the corpora, therefore, this was not done for the Scenario Raw pair Tilde +DCCEF datasets that were used for the experiments docu- En ! Pl 4.3M (c) 10.8M 6.5M mented in Section3. ! Pl En 4.3M For backtranslation experiments, we used all En ! Pl 23.3M (u) 55.4M 31.5M available monolingual data from the WMT shared Pl ! En 24.1M task on news translation. In order to make use of En ! Pl 21.6M (u) w/o PC 48.8M 27.0M Pl ! En 21.3M the Polish CommonCrawl corpus, we scored sen- tences using the in-domain language models and Table 1: Parallel data statistics before and after filter- selected top-scoring sentences as additional mono- ing. (c) - constrained, (u) - unconstrained, “w/o PC” - lingual data for back-translation. “without ParaCrawl”. Many of the data processing steps were sped up via parallelization with GNU Parallel (Tange, 2011). malizes punctuation (quotation marks, apostro- phes, decodes HTML entities, etc.), identifies non- 3 Experiments translatable entities and replaces them with place- holders (e.g., e-mail addresses, Web site addresses, In this section, we describe the details of the meth- XML tags, etc.), tokenises the text using Tilde’s reg- ods used and experiments performed to identify ular expression-based tokeniser, and applies true- the best-performing recipes for training of Tilde’s casing. NMT systems for the WMT 2020 shared task on In preliminary experiments, we identified also news translation. All experiments that are de- that morphology-driven word splitting (Pinnis et al., scribed in this section were carried out on the con- 2017a) for English$Polish allowed us to increase strained datasets unless specifically indicated that translation quality by approximately 1 BLEU point. also unconstrained datasets were used. The finding complies with our findings from previ- ous years (Pinnis et al., 2018, 2017b). Therefore, 3.1 NMT architecture we applied morphology-driven word splitting also All NMT systems that are described further have for this year’s experiments. the Transformer architecture (Vaswani et al., 2017). Then, we trained baseline NMT models (see We trained the systems using the Marian toolkit Section 3.2) and language models, which are nec- (Junczys-Dowmunt et al., 2018). The Transformer essary for dual conditional cross-entropy filtering base model configuration was used throughout the (DCCEF) (Junczys-Dowmunt, 2018) in order to experiments except for the experiments with the big select parallel data that is more similar to the news model configuration that are described in Section5. domain (for usefulness of DCCEF, refer to Sec- We used gradient accumulation over multiple physi- tion 3.3). For in-domain (i.e., news) and out-of- cal batches (the --optimizer-delay parameter in Mar- domain language model training, we used four ian) to increase the effective batch size to around monolingual datasets of 3.7M and 10.6M sen- 1600 sentences in the base model experiments and tences4 for the constrained and unconstrained sce- 1000 sentences in big model experiments. The narios respectively. Once the models were trained, Adam optimizer with a learning rate of 0.0005 and We filtered parallel data using DCCEF. The parallel with 1600 warm-up update steps (i.e., the learning data statistics before and after filtering are given in rate linearly rises during warm-up; afterwards de- Table1. cays proportionally to the inverse of the square root For our final systems, we also generated syn- of the step number) was used. For language model thetic data by randomly replacing one to three con- training, a learning rate of 0.0003 was used. tent words on both source and target sides with unknown token identifiers. This has shown to in- 3.2 Baseline models crease robustness of NMT systems when dealing We trained baseline models using the Transformer with rare or unknown phenomena (Pinnis et al., base configuration as defined in Section 3.1. The 2017a). This process almost doubles the size of validation results for the baseline NMT systems are provided in Table2. As we noticed last year that 4The sizes correspond to the smallest monolingual in- domain dataset, which in both cases were news in Polish. the ParaCrawl corpus contained a large proportion For other datasets, random sub-sets were selected. (by our estimates up to 50%) (Pinnis et al., 2019) 176 System En ! Pl Pl ! En strategies for data filtering in the preparation of the back-translated data. Ng et al.(2019) have de- Constrained scribed a method for domain data extraction from Baseline 21.67 32.69 general domain monolingual data using domain +DCCEF 22.19 33.45 and out-of-domain language models. We compared Unconstrained said method with a simpler alternative of using only Baseline 21.86 33.08 an in-domain language model for in-domain data +DCCEF 22.51 30.86 scoring. We sorted the monolingual data according Baseline w/o ParaCrawl 24.29 29.47 to the scores produced by the in-domain language +DCCEF 22.60 28.59 model or by the combination of in-domain and out-of-domain language model scores and experi- Table 2: Comparison of baseline NMT systems trained mented with different cutoff points when selecting on data that were prepared with and without DCCEF. data for back-translation. Considering the above, we carried out experi- of machine translated content, we trained baseline ments along two dimensions – 1) monolingual data systems with and without ParaCrawl. It can be seen selection strategy, which was either combined or that when training the En ! Pl unconstrained sys- in-domain, signifying whether the combined score tem using ParaCrawl, we loose over 2 BLEU points. of both language models or just the score from This is because most machine translated content the in-domain language model was used, respec- is on the non-English (in this case Polish) side. tively, and 2) the bitext and synthetic data mixture For the Pl ! En direction, the machine-translated selection strategy, which was one of: content acts as back-translated data and, therefore, does not result in quality degradation.
Recommended publications
  • Ffontiau Cymraeg
    This publication is available in other languages and formats on request. Mae'r cyhoeddiad hwn ar gael mewn ieithoedd a fformatau eraill ar gais. [email protected] www.caerphilly.gov.uk/equalities How to type Accented Characters This guidance document has been produced to provide practical help when typing letters or circulars, or when designing posters or flyers so that getting accents on various letters when typing is made easier. The guide should be used alongside the Council’s Guidance on Equalities in Designing and Printing. Please note this is for PCs only and will not work on Macs. Firstly, on your keyboard make sure the Num Lock is switched on, or the codes shown in this document won’t work (this button is found above the numeric keypad on the right of your keyboard). By pressing the ALT key (to the left of the space bar), holding it down and then entering a certain sequence of numbers on the numeric keypad, it's very easy to get almost any accented character you want. For example, to get the letter “ô”, press and hold the ALT key, type in the code 0 2 4 4, then release the ALT key. The number sequences shown from page 3 onwards work in most fonts in order to get an accent over “a, e, i, o, u”, the vowels in the English alphabet. In other languages, for example in French, the letter "c" can be accented and in Spanish, "n" can be accented too. Many other languages have accents on consonants as well as vowels.
    [Show full text]
  • Basis Technology Unicode対応ライブラリ スペックシート 文字コード その他の名称 Adobe-Standard-Encoding A
    Basis Technology Unicode対応ライブラリ スペックシート 文字コード その他の名称 Adobe-Standard-Encoding Adobe-Symbol-Encoding csHPPSMath Adobe-Zapf-Dingbats-Encoding csZapfDingbats Arabic ISO-8859-6, csISOLatinArabic, iso-ir-127, ECMA-114, ASMO-708 ASCII US-ASCII, ANSI_X3.4-1968, iso-ir-6, ANSI_X3.4-1986, ISO646-US, us, IBM367, csASCI big-endian ISO-10646-UCS-2, BigEndian, 68k, PowerPC, Mac, Macintosh Big5 csBig5, cn-big5, x-x-big5 Big5Plus Big5+, csBig5Plus BMP ISO-10646-UCS-2, BMPstring CCSID-1027 csCCSID1027, IBM1027 CCSID-1047 csCCSID1047, IBM1047 CCSID-290 csCCSID290, CCSID290, IBM290 CCSID-300 csCCSID300, CCSID300, IBM300 CCSID-930 csCCSID930, CCSID930, IBM930 CCSID-935 csCCSID935, CCSID935, IBM935 CCSID-937 csCCSID937, CCSID937, IBM937 CCSID-939 csCCSID939, CCSID939, IBM939 CCSID-942 csCCSID942, CCSID942, IBM942 ChineseAutoDetect csChineseAutoDetect: Candidate encodings: GB2312, Big5, GB18030, UTF32:UTF8, UCS2, UTF32 EUC-H, csCNS11643EUC, EUC-TW, TW-EUC, H-EUC, CNS-11643-1992, EUC-H-1992, csCNS11643-1992-EUC, EUC-TW-1992, CNS-11643 TW-EUC-1992, H-EUC-1992 CNS-11643-1986 EUC-H-1986, csCNS11643_1986_EUC, EUC-TW-1986, TW-EUC-1986, H-EUC-1986 CP10000 csCP10000, windows-10000 CP10001 csCP10001, windows-10001 CP10002 csCP10002, windows-10002 CP10003 csCP10003, windows-10003 CP10004 csCP10004, windows-10004 CP10005 csCP10005, windows-10005 CP10006 csCP10006, windows-10006 CP10007 csCP10007, windows-10007 CP10008 csCP10008, windows-10008 CP10010 csCP10010, windows-10010 CP10017 csCP10017, windows-10017 CP10029 csCP10029, windows-10029 CP10079 csCP10079, windows-10079
    [Show full text]
  • Supplemental Punctuation Range: 2E00–2E7F
    Supplemental Punctuation Range: 2E00–2E7F This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
    [Show full text]
  • Diacritics-ELL.Pdf
    Diacritics J.C. Wells, University College London Dkadvkxkdw avf ekwxkrhykwjkrh qavow axxadjfe xs pfxxfvw sg xjf aptjacfx, gsv f|aqtpf xjf adyxf addfrx sr xjf ‘ kr dag‘. M swx parhyahf svxjshvatjkfw cawfe sr xjf Laxkr aptjacfx qaof wsqf ywf sg ekadvkxkdw, aw kreffe es xjswf cawfe sr sxjfv aptjacfxw are {vkxkrh w}wxfqw. Tjf gsdyw sg xjkw avxkdpf kw sr xjf vspf sg ekadvkxkdw kr xjf svxjshvatj} sg parhyahfw {vkxxfr {kxj xjf Laxkr aptjacfx. Ireffe, xjf svkhkr sg wsqf pfxxfvw xjax avf rs{ a wxareave tavx sg xjf aptjacfx pkfw kr xjf ywf sg ekadvkxkdw. Tjf pfxxfv G {aw krzfrxfe kr Rsqar xkqfw aw a zavkarx sg C, ekwxkrhykwjfe c} xjf dvswwcav sr xjf ytwxvsof. Tjf pfxxfv J {aw rsx ekwxkrhykwjfe gvsq I, rsv U gvsq V, yrxkp xjf 16xj dfrxyv} (Saqtwsr 1985: 110). Tjf rf{ pfxxfv 1 kw sczksywp} a zavkarx sr r are ws dsype cf wffr aw krdsvtsvaxkrh a ekadvkxkd xakp. Dkadvkxkdw tvstfv, xjsyhj, avf wffr aw qavow axxadjfe xs a cawf pfxxfv. Ir xjkw wfrwf, m y 1 es rsx krzspzf ekadvkxkdw. Tjf f|xfrwkzf ywf sg ekadvkxkdw xs wyttpfqfrx xjf Laxkr aptjacfx kr dawfw {jfvf kx {aw wffr aw kraefuyaxf gsv xjf wsyrew sg sxjfv parhyahfw kw hfrfvapp} axxvkcyxfe xs xjf vfpkhksyw vfgsvqfv Jar Hyw (1369-1415), {js efzkwfe a vfgsvqfe svxjshvatj} gsv C~fdj krdsvtsvaxkrh 9addfrxfe: pfxxfvw wydj aw ˛ ¹ = > ?. M swx ekadvkxkdw avf tpadfe acszf xjf cawf pfxxfv {kxj {jkdj xjf} avf awwsdkaxfe. A gf{, js{fzfv, avf tpadfe cfps{ kx (aw “) sv xjvsyhj kx (aw B). 1 Laxkr pfxxfvw dsqf kr ps{fv-dawf are yttfv-dawf zfvwksrw.
    [Show full text]
  • List of Approved Special Characters
    List of Approved Special Characters The following list represents the Graduate Division's approved character list for display of dissertation titles in the Hooding Booklet. Please note these characters will not display when your dissertation is published on ProQuest's site. To insert a special character, simply hold the ALT key on your keyboard and enter in the corresponding code. This is only for entering in a special character for your title or your name. The abstract section has different requirements. See abstract for more details. Special Character Alt+ Description 0032 Space ! 0033 Exclamation mark '" 0034 Double quotes (or speech marks) # 0035 Number $ 0036 Dollar % 0037 Procenttecken & 0038 Ampersand '' 0039 Single quote ( 0040 Open parenthesis (or open bracket) ) 0041 Close parenthesis (or close bracket) * 0042 Asterisk + 0043 Plus , 0044 Comma ‐ 0045 Hyphen . 0046 Period, dot or full stop / 0047 Slash or divide 0 0048 Zero 1 0049 One 2 0050 Two 3 0051 Three 4 0052 Four 5 0053 Five 6 0054 Six 7 0055 Seven 8 0056 Eight 9 0057 Nine : 0058 Colon ; 0059 Semicolon < 0060 Less than (or open angled bracket) = 0061 Equals > 0062 Greater than (or close angled bracket) ? 0063 Question mark @ 0064 At symbol A 0065 Uppercase A B 0066 Uppercase B C 0067 Uppercase C D 0068 Uppercase D E 0069 Uppercase E List of Approved Special Characters F 0070 Uppercase F G 0071 Uppercase G H 0072 Uppercase H I 0073 Uppercase I J 0074 Uppercase J K 0075 Uppercase K L 0076 Uppercase L M 0077 Uppercase M N 0078 Uppercase N O 0079 Uppercase O P 0080 Uppercase
    [Show full text]
  • Toward a Historically Faithful Performance of the Piano Works of Anton´Inqweˇrt´Y
    15 Toward a historically faithful performance of the piano works of Anton´ınQweˇrt´y William Gunther Brian Kell Google, Inc. Google, Inc. [email protected] [email protected] SIGBOVIK ’18 Carnegie Mellon University April −2, 2018 Concrete The great Czech composer Anton´ın Dvoˇr´ak (1841–1904) wrote many pieces for the piano, including the famous Humoresque No. 7 in G-flat Ma- jor [2]. Unfortunately, typical performances of these works today sound nothing like what the composer intended because most modern pianos are configured with a different keyboard layout. Through painstaking histor- ical research, we have reconstructed the original Dvoˇr´ak piano keyboard layout. We have applied this discovery by transposing the Humoresque so that it is playable on a modern piano, enabling the first historically faithful performance of this piece in over a century. 1 92 Figure 1: A Dvorak keyboard with the original or “classic” layout [3]. There are several variants of the Dvorak layout, but Dvoˇr´akwas a classical composer, so this is almost certainly the one he used. Furthermore, this layout has 44 white keys (not counting the spacebar, which is clearly used only for rests). That is exactly half of the number of keys on a piano. Thus we may confidently conclude that the left half of Dvoˇr´ak’s piano layout was just these 44 keys, while the right half was the same keys again with the Shift key held down. Figure 2: A modern QWERTY keyboard with the United States layout [4]. This layout has 47 white keys (not counting the spacebar), but obviouslythree of them are useless: nobody really needs the characters ‘~]}\| [1].
    [Show full text]
  • 1 Symbols (2286)
    1 Symbols (2286) USV Symbol Macro(s) Description 0009 \textHT <control> 000A \textLF <control> 000D \textCR <control> 0022 ” \textquotedbl QUOTATION MARK 0023 # \texthash NUMBER SIGN \textnumbersign 0024 $ \textdollar DOLLAR SIGN 0025 % \textpercent PERCENT SIGN 0026 & \textampersand AMPERSAND 0027 ’ \textquotesingle APOSTROPHE 0028 ( \textparenleft LEFT PARENTHESIS 0029 ) \textparenright RIGHT PARENTHESIS 002A * \textasteriskcentered ASTERISK 002B + \textMVPlus PLUS SIGN 002C , \textMVComma COMMA 002D - \textMVMinus HYPHEN-MINUS 002E . \textMVPeriod FULL STOP 002F / \textMVDivision SOLIDUS 0030 0 \textMVZero DIGIT ZERO 0031 1 \textMVOne DIGIT ONE 0032 2 \textMVTwo DIGIT TWO 0033 3 \textMVThree DIGIT THREE 0034 4 \textMVFour DIGIT FOUR 0035 5 \textMVFive DIGIT FIVE 0036 6 \textMVSix DIGIT SIX 0037 7 \textMVSeven DIGIT SEVEN 0038 8 \textMVEight DIGIT EIGHT 0039 9 \textMVNine DIGIT NINE 003C < \textless LESS-THAN SIGN 003D = \textequals EQUALS SIGN 003E > \textgreater GREATER-THAN SIGN 0040 @ \textMVAt COMMERCIAL AT 005C \ \textbackslash REVERSE SOLIDUS 005E ^ \textasciicircum CIRCUMFLEX ACCENT 005F _ \textunderscore LOW LINE 0060 ‘ \textasciigrave GRAVE ACCENT 0067 g \textg LATIN SMALL LETTER G 007B { \textbraceleft LEFT CURLY BRACKET 007C | \textbar VERTICAL LINE 007D } \textbraceright RIGHT CURLY BRACKET 007E ~ \textasciitilde TILDE 00A0 \nobreakspace NO-BREAK SPACE 00A1 ¡ \textexclamdown INVERTED EXCLAMATION MARK 00A2 ¢ \textcent CENT SIGN 00A3 £ \textsterling POUND SIGN 00A4 ¤ \textcurrency CURRENCY SIGN 00A5 ¥ \textyen YEN SIGN 00A6
    [Show full text]
  • Spanish and UEB
    AUSTRALIAN BRAILLE AUTHORITY A subcommittee of the Round Table on Information Access for People with Print Disabilities Inc. www.brailleaustralia.org email: [email protected] Spanish and UEB Introduction This document should be read in conjunction with the Australian Braille Authority, Guidelines for Foreign Language Material, 2019 which can be found on the brailleaustralia.org website. Follow these guidelines when transcribing Spanish for educational purposes, such as Spanish language textbooks, examination papers, grammar and phrase books, bilingual dictionaries, etc. Spanish within an English context such as a novel should use the guidelines given in 1.1 of the Australian Braille Authority, Guidelines for Foreign Language Material, 2019. A more complete Spanish code may be requested for higher education or by a native reader. This is not covered in detail in this document. Relevant DBT codes are however given in the section on DBT. Kathy Riessen, Editor May 2019 Contractions Transcribe Spanish text uncontracted using the accents as listed below. Punctuation Use UEB punctuation, indicators and numeral conventions except for the exclamation and question marks. Spanish places inverted question and exclamation marks at the beginning of questions and exclamatory sentences and the standard marks at the end of the sentence. The UEB symbols for the inverted question and exclamation marks are 3 cells, so the Spanish conventions are used for these punctuation signs. 1 ABA: Spanish and UEB May 2019 Quotation signs in Spanish may sometimes be shown as the standard quotation signs as used in English and sometimes as guillemets (angled quotation signs). Generally these should be transcribed as the UEB single cell non-specific quotation signs.
    [Show full text]
  • The TILDE File Naming Scheme
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Purdue E-Pubs Purdue University Purdue e-Pubs Department of Computer Science Technical Reports Department of Computer Science 1985 The TILDE File Naming Scheme Douglas E. Comer Purdue University, [email protected] Thomas P. Murtagh Report Number: 85-507 Comer, Douglas E. and Murtagh, Thomas P., "The TILDE File Naming Scheme" (1985). Department of Computer Science Technical Reports. Paper 428. https://docs.lib.purdue.edu/cstech/428 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. The TLide File Naming Scheme Douglas Comer Thomas P. Murtagh CSD-TR-507 Department of Computer Sciences Purdue University West Lafayette, IN 479fJ7 ABSTRACf Hierarchical directory systems permit users of a computer system to organize files into meaningful sets. A key part of the directory's usefulness lies in the way it partitions names. A hierarchical directory system supports short mnemonic names by keeping the names of files in one directory independent of those in other directories. Most hierarchical directory systems, which were designed for single-processor time sharing systems, place files in a sin­ gle directory tree. The disadvantages of the single tree scheme become apparent when two such systems are interconnected by a computer network. or when soft\\·are is transported from one machine to another. This report proposes an alternative to the single-tree naming scheme in which files are organized in a forest of tree-structwed directories and each user chases his own view of the forest.
    [Show full text]
  • Proposal for "Swedish International" Keyboard
    ISO/IEC JTC 1/SC 35 N 0748 DATE: 2005-01-31 ISO/IEC JTC 1/SC 35 User Interfaces Secretariat: Association Française de Normalisation (AFNOR) TITLE: Proposal for "Swedish International" keyboard SOURCE: Karl Ivar Larsson, Swedish Expert STATUS: FYI DATE: 2005-01-31 DISTRIBUTION: P and O members of JTC1/SC35 MEDIUM: E NO. OF PAGES: 8 Secretariat ISO/IEC JTC 1/SC 35 – Nathalie Cappel-Souquet – 11 Avenue Francis de Pressensé 93571 St Denis La Plaine Cedex France Telephone: + 33 1 41 62 82 55; Facsimile: + 33 1 49 17 91 29 e-mail: [email protected] LWP Consulting R 04/0-3 Notes: 1. This document was handed out in the SC35 Stockholm meeting 2004-11-24. 2. The proposal contained in the document relates to Swedish standardization, and at present not to any SC35 activities. Contents 1 Scope ...................................................................................................................................................................3 2 Characters added ...............................................................................................................................................3 2.1 Diacritical marks.................................................................................................................................................3 2.2 Special-shape letters..........................................................................................................................................3 2.3 Other characters.................................................................................................................................................3
    [Show full text]
  • Accents Over Spanish Letters
    Accents Over Spanish Letters Wendel snuggles his teslas ensure likewise or flimsily after Kip diabolizes and guided lot, hyetographic and proportional. Hank Grecizing extemporarily if rhizogenic Georg overruling or encapsulated. Bigger Julie sometimes rebaptizing his opportunity homologically and halloos so synchronously! Do the faroese accented one of the way a few benefits to another point at spanish accents And are then open trunk like in start date is a closed A gone in attention The spirit just a nasal closed A. It often a glyph generally placed above them under certain characters of an alphabet Thus post can inside that accents marks are orthographic symbols used on letters that. The mark go the n means that secret letter text be pronounced nya like. What is a spanish letters on over vowels form of tasks. How everything Make Spanish Accents Pronto Spanish Services LLC. If someone are confused about when doctor put accents on Spanish words this lesson. Translations Spanish Classes Cultural Consulting Voice or Learn Spanish. Enter its national boundaries between vowels. What wrongdoing the accents on letters mean? How spanish letters are speaking differ from other letter you are. Has timed out on over time more convincing and letter. Spanish alphabet SpanishDict. Over plenty of surgery other options you lost use local type characters with Spanish. Each tax in Spanish contains an accent a spine that is stressed but these don't. Spanish Accent Marks Tildes & More Basic Rules. FAQ Item The Chicago Manual of Style. In Spanish is an accented letter pronounced just its way a repair Both and a hollow like create The accent indicates the stressed syllable in words with irregular.
    [Show full text]
  • LNCS 7381, Pp
    Multi-Tilde-Bar Derivatives Pascal Caron, Jean-Marc Champarnaud, and Ludovic Mignot LITIS, Universit´e de Rouen, 76801 Saint-Etienne´ du Rouvray Cedex, France {pascal.caron,jean-marc.champarnaud,ludovic.mignot}@univ-rouen.fr Abstract. Multi-tilde-bar operators allow us to extend regular expres- sions. The associated extended expressions are compatible with the structure of Glushkov automata and they provide a more succinct repre- sentation than standard expressions. The aim of this paper is to examine the derivation of multi-tilde-bar expressions. Two types of computation are investigated: Brzozowski derivation and Antimirov derivation, as well as the construction of the associated automata. 1 Introduction Regular expression word derivatives have been introduced in [5] by Brzozowski in order to compute language quotients via expression derivatives: for any word w, the language denoted by the derivative of a regular expression E w.r.t. w is the left quotient of the language denoted by E w.r.t. w. Regular expression derivation plays a fundamental role in theory of automata. In particular, under the assumption that the set D of all the derivatives of a regular expression E is finite, it is possible to construct a FA (finite automaton) with D as a set of states that recognizes the language denoted by E. Word derivatives handle unrestricted regular expressions; they are themselves expressions and they provide a DFA (deterministic finite automaton), as far as the ACI (associativity, commutativity and idempotence) properties of the sum of two expressions are used. Alternative types of derivation have been designed since Brzozowski’s seminal work.
    [Show full text]