Text Analysis with Lingpipe 4

Total Page:16

File Type:pdf, Size:1020Kb

Text Analysis with Lingpipe 4 Text Analysis with LingPipe 4 Text Analysis with LingPipe 4 Bob Carpenter Breck Baldwin LingPipe Publishing New York 2011 © LingPipe Publishing Pending: Library of Congress Cataloging-in-Publication Data Carpenter, Bob. Text Analysis with LingPipe 4.0 / Bob Carpenter, Breck Baldwin p. cm. Includes bibliographical references and index. ISBN X-XXX-XXXXX-X (pbk.) 1. Natural Language Processing 2. Java (computer program language) I. Carpenter, Bob II. Baldwin, Breck III. Title QAXXX.XXXXXXXX 2011 XXX/.XX’X-xxXX XXXXXXXX All rights reserved. This book, or parts thereof, may not be reproduced in any form without permission of the publishers. Contents 1 Getting Started1 1.1 Tools of the Trade............................... 1 1.2 Hello World Example ............................. 9 1.3 Introduction to Ant ..............................11 2 Handlers, Parsers, and Corpora 19 2.1 Handlers and Object Handlers .......................19 2.2 Parsers......................................20 2.3 Corpora .....................................25 2.4 Cross Validation ................................29 3 Tokenization 33 3.1 Tokenizers and Tokenizer Factories....................33 3.2 LingPipe’s Base Tokenizer Factories....................37 3.3 LingPipe’s Filtered Tokenizers........................40 3.4 Morphology, Stemming, and Lemmatization...............46 3.5 Soundex: Pronunciation-Based Tokens ..................53 3.6 Character Normalizing Tokenizer Filters.................56 3.7 Penn Treebank Tokenization ........................57 3.8 Adapting to and From Lucene Analyzers.................64 3.9 Tokenizations as Objects...........................71 4 Suffix Arrays 75 4.1 What is a Suffix Array? ............................75 4.2 Character Suffix Arrays............................76 4.3 Token Suffix Arrays..............................78 4.4 Document Collections as Suffix Arrays ..................81 4.5 Implementation Details............................84 5 Symbol Tables 85 5.1 The SymbolTable Interface .........................85 5.2 The MapSymbolTable Class.........................86 5.3 The SymbolTableCompiler Class .....................89 6 Character Language Models 93 6.1 Applications of Language Models......................93 6.2 The Basics of N-Gram Language Models .................94 6.3 Character-Level Language Models and Unicode .............95 v 6.4 Language Model Interfaces..........................95 6.5 Process Character Language Models....................98 6.6 Sequence Character Language Models...................101 6.7 Tuning Language Model Smoothing ....................104 6.8 Underlying Sequence Counter........................107 6.9 Learning Curve Evaluation..........................107 6.10 Pruning Counts.................................112 6.11 Compling and Serializing Character LMs . 112 6.12 Thread Safety..................................113 6.13 The Mathematical Model...........................113 7 Tokenized Language Models 119 7.1 Applications of Tokenized Language Models . 119 7.2 Token Language Model Interface......................119 8 Spelling Correction 121 9 Classifiers and Evaluation 123 9.1 What is a Classifier?..............................123 9.2 Kinds of Classifiers ..............................125 9.3 Gold Standards, Annotation, and Reference Data . 129 9.4 Confusion Matrices ..............................130 9.5 Precision-Recall Evaluation..........................140 9.6 Micro- and Macro-Averaged Statistics ...................144 9.7 Scored Precision-Recall Evaluations ....................147 9.8 Contingency Tables and Derived Statistics . 155 9.9 Bias Correction.................................167 9.10 Post-Stratification ...............................168 10 Naive Bayes Classifiers 169 10.1 Introduction to Naive Bayes.........................169 10.2 Getting Started with Naive Bayes......................173 10.3 Independence, Overdispersion and Probability Attenuation . 175 10.4 Tokens, Counts and Sufficient Statistics . 177 10.5 Unbalanced Category Probabilities.....................177 10.6 Maximum Likelihood Estimation and Smoothing . 178 10.7 Item-Weighted Training............................181 10.8 Document Length Normalization......................183 10.9 Serialization and Compilation........................185 10.10 Training and Testing with a Corpus ....................187 10.11 Cross-Validating a Classifier.........................192 10.12 Formalizing Naive Bayes ...........................199 11 Tagging 205 11.1 Taggings.....................................205 11.2 Tag Lattices...................................208 11.3 Taggers......................................210 11.4 Tagger Evaluators ...............................211 12 Tagging with Hidden Markov Models 215 13 Conditional Random Fields 217 14 Latent Dirichlet Allocation 219 14.1 Corpora, Documents, and Tokens .....................219 14.2 LDA Parameter Estimation..........................220 14.3 Interpreting LDA Output...........................224 14.4 LDA’s Gibbs Samples .............................227 14.5 Handling Gibbs Samples ...........................228 14.6 Scalability of LDA ...............................233 14.7 Understanding the LDA Model Parameters . 238 14.8 LDA Instances for Multi-Topic Classification . 239 14.9 Comparing Documents with LDA......................244 14.10 Stability of Samples..............................245 14.11 The LDA Model.................................249 15 Singular Value Decomposition 251 16 Sentence Boundary Detection 253 A Mathematics 255 A.1 Basic Notation .................................255 A.2 Useful Functions................................255 B Statistics 259 B.1 Discrete Probability Distributions .....................259 B.2 Continuous Probability Distributions ...................261 B.3 Maximum Likelihood Estimation ......................261 B.4 Maximum a Posterior Estimation......................261 B.5 Information Theory ..............................261 C Java Basics 267 C.1 Generating Random Numbers........................267 D Corpora 271 D.1 Canterbury Corpus ..............................271 D.2 20 Newsgroups.................................272 D.3 MedTag......................................272 D.4 WormBase MEDLINE Citations........................273 E Further Reading 275 E.1 Algorithms ...................................275 E.2 Probability and Statistics...........................276 E.3 Machine Learning ...............................276 E.4 Linguistics....................................277 E.5 Natural Language Processing ........................277 F Licenses 279 F.1 LingPipe License ................................279 F.2 Java Licenses ..................................280 F.3 Apache License 2.0...............................287 F.4 Common Public License 1.0 .........................288 F.5 X License.....................................290 F.6 Creative Commons Attribution-Sharealike 3.0 Unported License . 290 Preface LingPipe is a software library for natural language processing implemented in Java. This book explains the tools that are available in LingPipe and provides examples of how they can be used to build natural language processing (NLP) ap- plications for multiple languages and genres, and for many kinds of applications. LingPipe’s application programming interface (API) is tailored to abstract over low-level implementation details to enable components such as tokenizers, fea- ture extractors, or classifiers to be swapped in a plug-and-play fashion. LingPipe contains a mixture of heuristic rule-based components and statistical compo- nents, often implementing the same interfaces, such as chunking or tokeniza- tion. The presentation here will be hands on. You should be comfortable reading short and relatively simple Java programs. Java programming idioms like loop boundaries being inclusive/exclusive and higher-level design patterns like visi- tors will also be presupposed. More specific aspects of Java coding relating to text processing, such as streaming I/O, character decoding, string representa- tions, and regular expression processing will be discussed in more depth. We will also go into some detail on collections, XML/HTML parsing with SAX, and serialization patterns. We do not presuppose any knowledge of linguistics beyond a simple under- standing of the terms used in dictionaries such as words, syllables, pronuncia- tions, and parts of speech such as noun and preposition. We will spend consid- erable time introducing linguistic concepts, such as word senses or noun phrase chunks, as they relate to natural language processing modules in LingPipe. We will do our best to introduce LingPipe’s modules and their application from a hands-on practical API perspective rather than a theoretical one. In most cases, such as for logistic regression classifiers and conditional random field (CRF) taggers and chunkers, it’s possible learn how to effectively fit complex and useful models without fully understanding the mathematical basis of LingPipe’s estimation and optimization algorithms. In other cases, such as naive Bayes classifiers, hierarchical clusterers and hidden Markov models (HMM), the models are simpler, estimation is a matter of counting, and there is almost no hand- tuning required. Deeper understanding of LingPipe’s algorithms and statistical models re- quires familiarity with computational complexity analysis and basic probability theory including information theory. We provide suggested readings in algo- rithms, statistics, machine learning, and linguistics in Appendix
Recommended publications
  • The Algebra of Open and Interconnected Systems
    The Algebra of Open and Interconnected Systems Brendan Fong Hertford College University of Oxford arXiv:1609.05382v1 [math.CT] 17 Sep 2016 A thesis submitted for the degree of Doctor of Philosophy in Computer Science Trinity 2016 For all those who have prepared food so I could eat and created homes so I could live over the past four years. You too have laboured to produce this; I hope I have done your labours justice. Abstract Herein we develop category-theoretic tools for understanding network- style diagrammatic languages. The archetypal network-style diagram- matic language is that of electric circuits; other examples include signal flow graphs, Markov processes, automata, Petri nets, chemical reaction networks, and so on. The key feature is that the language is comprised of a number of components with multiple (input/output) terminals, each possibly labelled with some type, that may then be connected together along these terminals to form a larger network. The components form hyperedges between labelled vertices, and so a diagram in this language forms a hypergraph. We formalise the compositional structure by intro- ducing the notion of a hypergraph category. Network-style diagrammatic languages and their semantics thus form hypergraph categories, and se- mantic interpretation gives a hypergraph functor. The first part of this thesis develops the theory of hypergraph categories. In particular, we introduce the tools of decorated cospans and corela- tions. Decorated cospans allow straightforward construction of hyper- graph categories from diagrammatic languages: the inputs, outputs, and their composition are modelled by the cospans, while the `decorations' specify the components themselves.
    [Show full text]
  • IT Project Quality Management
    10 IT Project Quality Management CHAPTER OVERVIEW The focus of this chapter will be on several concepts and philosophies of quality man- agement. By learning about the people who founded the quality movement over the last fifty years, we can better understand how to apply these philosophies and teach- ings to develop a project quality management plan. After studying this chapter, you should understand and be able to: • Describe the Project Management Body of Knowledge (PMBOK) area called project quality management (PQM) and how it supports quality planning, qual ity assurance, quality control, and continuous improvement of the project's products and supporting processes. • Identify several quality gurus, or founders of the quality movement, and their role in shaping quality philosophies worldwide. • Describe some of the more common quality initiatives and management sys tems that include ISO certification, Six Sigma, and the Capability Maturity Model (CMM) for software engineering. • Distinguish between validation and verification activities and how these activi ties support IT project quality management. • Describe the software engineering discipline called configuration management and how it is used to manage the changes associated with all of the project's deliverables and work products. • Apply the quality concepts, methods, and tools introduced in this chapter to develop a project quality plan. GLOBAL TECHNOLOGY SOLUTIONS It was mid-afternoon when Tim Williams walked into the GTS conference room. Two of the Husky Air team members, Sitaraman and Yan, were already seated at the 217 218 CHAPTER 10 / IT PROJECT QUALITY MANAGEMENT conference table. Tim took his usual seat, and asked "So how did the demonstration of the user interface go this morning?" Sitaraman glanced at Yan and then focused his attention on Tim's question.
    [Show full text]
  • Computer Algebra and Mathematics with the HP40G Version 1.0
    Computer Algebra and Mathematics with the HP40G Version 1.0 Renée de Graeve Lecturer at Grenoble I Exact Calculation and Mathematics with the HP40G Acknowledgments It was not believed possible to write an efficient program for computer algebra all on one’s own. But one bright person by the name of Bernard Parisse didn’t know that—and did it! This is his program for computer algebra (called ERABLE), built for the second time into an HP calculator. The development of this calculator has led Bernard Parisse to modify his program somewhat so that the computer algebra functions could be edited and cause the appropriate results to be displayed in the Equation Editor. Explore all the capabilities of this calculator, as set out in the following pages. I would like to thank: • Bernard Parisse for his invaluable counsel, his remarks on the text, his reviews, and for his ability to provide functions on demand both efficiently and graciously. • Jean Tavenas for the concern shown towards the completion of this guide. • Jean Yves Avenard for taking on board our requests, and for writing the PROMPT command in the very spirit of promptness—and with no advance warning. (refer to 6.4.2.). © 2000 Hewlett-Packard, http://www.hp.com/calculators The reproduction, distribution and/or the modification of this document is authorised according to the terms of the GNU Free Documentation License, Version 1.1 or later, published by the Free Software Foundation. A copy of this license exists under the section entitled “GNU Free Documentation License” (Chapter 8, p. 141).
    [Show full text]
  • CAS, an Introduction to the HP Computer Algebra System
    CAS, An introduction to the HP Computer Algebra System Background Any mathematician will quickly appreciate the advantages offered by a CAS, or Computer Algebra System1, which allows the user to perform complex symbolic algebraic manipulations on the calculator. Algebraic integration by parts and by substitution, the solution of differential equations, inequalities, simultaneous equations with algebraic or complex coefficients, the evaluation of limits and many other problems can be solved quickly and easily using a CAS. Importantly, solutions can be obtained as exact values such as 5−1, 25≤ x < or 4π rather than the usual decimal values given by numeric methods of successive approximation. Values can be displayed to almost any degree of accuracy required, allowing the user to view, for example, the exact value of a number such as 100 factorial. The HP CAS The HP CAS system was created by Bernard Parisse, Université de Grenoble, for the HP 49g calculator. It was improved and adapted for inclusion on the HP 40g with the help of Renée De Graeve, Jean-Yves Avenard and Jean Tavenas2. The HP CAS system offers the user a vast array of functions and abilities as well as an easy user interface which displays equations as they appear on the page. It also includes the ability to display many algebraic calculations in ‘step-by-step’ mode, making it an invaluable teaching tool in universities and schools. Functions are grouped by category and accessed via menus at the bottom of the screen. Copyright© 2005, Applications in Mathematics Learning to use the CAS Learning to use the CAS is very easy but, as with any powerful tool, truly effective use requires familiarity and time.
    [Show full text]
  • Undergraduate Catalog 14-16
    UNDERGRADUATE CATALOG: 2014-2016 Connecticut State Colleges and Universities ACADEMIC DEPARTMENTS, PROGRAMS, AND Accreditation and Policy COURSES Message from the President Ancell School of Business Academic Calendar School of Arts & Sciences Introduction to Western School of Professional Studies The Campus School of Visual and Performing Arts Admission to Western Division of Graduate Studies Student Expenses Office of Student Aid & Student Employment Directory Student Affairs Administration Academic Services and Procedures Faculty/Staff Academic Programs and Degrees Faculty Emeriti Graduation Academic Program Descriptions WCSU Undergraduate Catalog: 2014-2016 1 CONNECTICUT STATE COLLEGES & UNIVERSITIES The 17 Connecticut State Colleges & Universities (ConnSCU) provide affordable, innovative and rigorous programs that permit students to achieve their personal and career goals, as well as contribute to the economic growth of Connecticut. The ConnSCU System encompasses four state universities – Western Connecticut State University in Danbury, Central Connecticut State University in New Britain, Eastern Connecticut State University in Willimantic and Southern Connecticut State University in New Haven – as well as 12 community colleges and the online institution Charter Oak State College. Until the state’s higher education reorganization of 2011, Western was a member of the former Connecticut State Unviersity System that also encompassed Central, Eastern and Southern Connecticut state universities. With origins in normal schools for teacher education founded in the 19th and early 20th centuries, these institutions evolved into diversified state universities whose graduates have pursued careers in the professions, business, education, public service, the arts and other fields. Graduates of Western and other state universities contribute to all aspects of Connecticut economic, social and cultural life.
    [Show full text]
  • Access to Communications Technology
    A Access to Communications Technology ABSTRACT digital divide: the Pew Research Center reported in 2019 that 42 percent of African American adults and Early proponents of digital communications tech- 43 percent of Hispanic adults did not have a desk- nology believed that it would be a powerful tool for top or laptop computer at home, compared to only disseminating knowledge and advancing civilization. 18 percent of Caucasian adults. Individuals without While there is little dispute that the Internet has home computers must instead use smartphones or changed society radically in a relatively short period public facilities such as libraries (which restrict how of time, there are many still unable to take advantage long a patron can remain online), which severely lim- of the benefits it confers because of a lack of access. its their ability to fill out job applications and com- Whether the lack is due to economic, geographic, or plete homework effectively. demographic factors, this “digital divide” has serious There is also a marked divide between digital societal repercussions, particularly as most aspects access in highly developed nations and that which of life in the twenty-first century, including banking, is available in other parts of the world. Globally, the health care, and education, are increasingly con- International Telecommunication Union (ITU), a ducted online. specialized agency within the United Nations that deals with information and communication tech- DIGITAL DIVIDE nologies (ICTs), estimates that as many as 3 billion people living in developing countries may still be In its simplest terms, the digital divide refers to the gap unconnected by 2023.
    [Show full text]
  • SMT Solving in a Nutshell
    SAT and SMT Solving in a Nutshell Erika Abrah´ am´ RWTH Aachen University, Germany LuFG Theory of Hybrid Systems February 27, 2020 Erika Abrah´ am´ - SAT and SMT solving 1 / 16 What is this talk about? Satisfiability problem The satisfiability problem is the problem of deciding whether a logical formula is satisfiable. We focus on the automated solution of the satisfiability problem for first-order logic over arithmetic theories, especially using SAT and SMT solving. Erika Abrah´ am´ - SAT and SMT solving 2 / 16 CAS SAT SMT (propositional logic) (SAT modulo theories) Enumeration Computer algebra DP (resolution) systems [Davis, Putnam’60] DPLL (propagation) [Davis,Putnam,Logemann,Loveland’62] Decision procedures NP-completeness [Cook’71] for combined theories CAD Conflict-directed [Shostak’79] [Nelson, Oppen’79] backjumping Partial CAD Virtual CDCL [GRASP’97] [zChaff’04] DPLL(T) substitution Watched literals Equalities and uninterpreted Clause learning/forgetting functions Variable ordering heuristics Bit-vectors Restarts Array theory Arithmetic Decision procedures for first-order logic over arithmetic theories in mathematical logic 1940 Computer architecture development 1960 1970 1980 2000 2010 Erika Abrah´ am´ - SAT and SMT solving 3 / 16 SAT SMT (propositional logic) (SAT modulo theories) Enumeration DP (resolution) [Davis, Putnam’60] DPLL (propagation) [Davis,Putnam,Logemann,Loveland’62] Decision procedures NP-completeness [Cook’71] for combined theories Conflict-directed [Shostak’79] [Nelson, Oppen’79] backjumping CDCL [GRASP’97] [zChaff’04]
    [Show full text]
  • Programming for Computations – Python
    15 Svein Linge · Hans Petter Langtangen Programming for Computations – Python Editorial Board T. J.Barth M.Griebel D.E.Keyes R.M.Nieminen D.Roose T.Schlick Texts in Computational 15 Science and Engineering Editors Timothy J. Barth Michael Griebel David E. Keyes Risto M. Nieminen Dirk Roose Tamar Schlick More information about this series at http://www.springer.com/series/5151 Svein Linge Hans Petter Langtangen Programming for Computations – Python A Gentle Introduction to Numerical Simulations with Python Svein Linge Hans Petter Langtangen Department of Process, Energy and Simula Research Laboratory Environmental Technology Lysaker, Norway University College of Southeast Norway Porsgrunn, Norway On leave from: Department of Informatics University of Oslo Oslo, Norway ISSN 1611-0994 Texts in Computational Science and Engineering ISBN 978-3-319-32427-2 ISBN 978-3-319-32428-9 (eBook) DOI 10.1007/978-3-319-32428-9 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2016945368 Mathematic Subject Classification (2010): 26-01, 34A05, 34A30, 34A34, 39-01, 40-01, 65D15, 65D25, 65D30, 68-01, 68N01, 68N19, 68N30, 70-01, 92D25, 97-04, 97U50 © The Editor(s) (if applicable) and the Author(s) 2016 This book is published open access. Open Access This book is distributed under the terms of the Creative Commons Attribution-Non- Commercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, a link is provided to the Creative Commons license and any changes made are indicated.
    [Show full text]
  • A Survey of User Interfaces for Computer Algebra Systems
    J. Symbolic Computation (1998) 25, 127–159 A Survey of User Interfaces for Computer Algebra Systems NORBERT KAJLER† AND NEIL SOIFFER‡§ †Ecole des Mines de Paris, 60 Bd. St-Michel, 75006 Paris, France ‡Wolfram Research, Inc., 100 Trade Center Drive, Champaign, IL 61820, U.S.A. This paper surveys work within the Computer Algebra community (and elsewhere) di- rected towards improving user interfaces for scientific computation during the period 1963–1994. It is intended to be useful to two groups of people: those who wish to know what work has been done and those who would like to do work in the field. It contains an extensive bibliography to assist readers in exploring the field in more depth. Work related to improving human interaction with computer algebra systems is the main focus of the paper. However, the paper includes additional materials on some closely related issues such as structured document editing, graphics, and communication protocols. c 1998 Academic Press Limited 1. Introduction There are several problems with current computer algebra systems (CASs) that are interface-related. These problems include: the use of an unnatural linear notation to enter and edit expressions, the inherent difficulty of selecting and modifying subexpressions with commands, and the display of large expressions that run off the screen. These problems may intimidate novice users and frustrate experienced users. The more natural and intuitive the interface (the closer it corresponds to pencil and paper manipulations), the more likely it is that people will want to take advantage of the CAS for its ability to do tedious computations and to verify derivations.
    [Show full text]
  • The Process of Urban Systems Integration
    THE PROCESS OF URBAN SYSTEMS An integrative approach towards the institutional process of systems integration in urban area development INTEGRATION MSc Thesis Eva Ros MSc Thesis November 2017 Eva Ros student number # 4188624 MSc Architecture, Urbanism and Building Sciences Delft University of Technology, department of Management in the Built Environment chair of Urban Area Development (UAD) graduation laboratory Next Generation Waterfronts in collaboration with the AMS institute First mentor Arie Romein Second mentor Ellen van Bueren This thesis was printed on environmentally friendly recycled and unbleached paper 2 THE PROCESS OF URBAN SYSTEMS INTEGRATION An integrative approach towards the institutional process of systems integration in urban area development 3 MANAGEMENT SUMMARY (ENG) INTRODUCTION Today, more than half of the world’s population lives in cities. This makes them centres of resource consumption and waste production. Sustainable development is seen as an opportunity to respond to the consequences of urbanisation and climate change. In recent years the concepts of circularity and urban symbiosis have emerged as popular strategies to develop sustainable urban areas. An example is the experimental project “Straat van de Toekomst”, implementing a circular strategy based on the Greenhouse Village concept (appendix I). This concept implements circular systems for new ways of sanitation, heat and cold storage and greenhouse-house symbiosis. Although many technological artefacts have to be developed for these sustainable solutions, integrating infrastructural systems asks for more than just technological innovation. A socio-cultural change is needed in order to reach systems integration. The institutional part of technological transitions has been underexposed over the past few years.
    [Show full text]
  • Modeling and Analysis of Hybrid Systems
    Building Bridges between Symbolic Computation and Satisfiability Checking Erika Abrah´ am´ RWTH Aachen University, Germany in cooperation with Florian Corzilius, Gereon Kremer, Stefan Schupp and others ISSAC’15, 7 July 2015 Photo: Prior Park, Bath / flickr Liam Gladdy What is this talk about? Satisfiability problem The satisfiability problem is the problem of deciding whether a logical formula is satisfiable. We focus on the automated solution of the satisfiability problem for first-order logic over arithmetic theories, especially on similarities and differences in symbolic computation and SAT and SMT solving. Erika Abrah´ am´ - SMT solving and Symbolic Computation 2 / 39 CAS SAT SMT (propositional logic) (SAT modulo theories) Enumeration Computer algebra DP (resolution) systems [Davis, Putnam’60] DPLL (propagation) [Davis,Putnam,Logemann,Loveland’62] Decision procedures NP-completeness [Cook’71] for combined theories CAD Conflict-directed [Shostak’79] [Nelson, Oppen’79] backjumping Partial CAD Virtual CDCL [GRASP’97] [zChaff’04] DPLL(T) substitution Watched literals Equalities and uninterpreted Clause learning/forgetting functions Variable ordering heuristics Bit-vectors Restarts Array theory Arithmetic Decision procedures for first-order logic over arithmetic theories in mathematical logic 1940 Computer architecture development 1960 1970 1980 2000 2010 Erika Abrah´ am´ - SMT solving and Symbolic Computation 3 / 39 SAT SMT (propositional logic) (SAT modulo theories) Enumeration DP (resolution) [Davis, Putnam’60] DPLL (propagation) [Davis,Putnam,Logemann,Loveland’62]
    [Show full text]
  • Easy-To-Use Chinese MTEX Suite Hongbin Ma
    The PracTEX Journal, 2012, No. 1 Article revision 2012/06/25 Easy-to-use Chinese MTEX Suite Hongbin Ma Email cnedu.bit.mathmhb@ Address School of Automation, Beijing Institute of Technology, Beijing 100081, P. R. China 1 Motivation of Developing MTEX As the main developer of Chinese MTEX Suite , or simply MTEX [1], I started to fall in love with TEX[2] and LATEX[3] in 2002 when I was still a graduate student major- ing in mathematics and cybernetics at the Academy of Mathematics and Systems Science, Chinese Academy of Sciences. At that time, recommended by some se- nior students, I started to use Chinese CTEX Suite , or simply CTEX[4], which was maintained by Dr. Lingyun Wu[5], a researcher in our academy, and is roughly a collection of pre-configured MiKTEX system[6] packaged with other tools such as customized WinEdt[7] for Chinese TEXers. CTEX brings significant benefits to China TEX users and it helps much to popularize the use of LATEX in China, es- pecially in the educational and academic areas with large requirement on mathe- matics typesetting. Furthermore, at that time, CTEX provides one way to typeset Chinese documents easily with LATEX and CCT [8](Chinese-Typesetting-System ), which was initially developed by another researcher Prof. Linbo Zhang[9] in our academy since 1998 for the purpose of typesetting Chinese with LATEX. Besides CCT system, another system called TY (Tian-Yuan ) system [10] was invented by a group in Eastern China Normal University so as to overcome the difficulties of typesetting Chinese with LATEX using different idea.
    [Show full text]