Spell Checker in CET Designer

Linköping University | Department of Computer Science Bachelor thesis, 16 ECTS | Datateknik 2016 | LIU-IDA/LITH-EX-G--16/069--SE Spell checker in CET Designer Rasmus Hedin Supervisor : Amir Aminifar Examiner : Zebo Peng Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin- istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam- manhang som är kränkande för upphovsmannenslitterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/. Copyright The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum- stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con- sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni- versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/. c Rasmus Hedin Abstract A common feature of text input tools is spell checking. It exists in search engines, email clients and of course in word processors like Microsoft Word. By having a spell checker when you are typing you can be more efficient than if you had to check the spelling with a separate proofing tool. Spell checking is a common request by the users of the room planning software CET Designer R which is developed by Configura. In this thesis Windows spell checking API is evaluated and compared to alternative spell checkers. A prototype of an integrated spell checker in CET Designer R text tool is then implemented with Windows spell checking API. Contents Abstract iii Acknowledgments iv Contents iv List of Figures v List of Tables 1 1 Introduction 2 1.1 Motivation . 2 1.2 Aim............................................ 2 1.3 Research questions . 2 1.4 Scope . 3 1.5 Background . 3 2 Theory 4 2.1 Spelling error detection . 4 2.2 Spelling error correction . 5 2.3 Evaluation metrics . 5 3 Method 7 3.1 Pre-study . 7 3.2 Requirement specification . 8 3.3 Implementation . 9 4 Results 15 4.1 Pre-study . 15 4.2 Implementation . 18 5 Discussion 20 5.1 Results . 20 5.2 Method . 21 5.3 The work in a wider context . 22 6 Conclusion 23 6.1 Research questions . 23 6.2 Requirement analysis . 24 6.3 Future work . 24 Bibliography 26 iv List of Figures 3.1 Communication between C++ dll, C# exe and CM . 9 3.2 Windows language bar. Current input language is English. The language can be switched by clicking it or by pressing Windows key+Space. 12 4.1 Words/second for test file 1 and 2 . 17 4.2 Overall harmonic mean for test file 1 and 2 . 18 4.3 Misspelled words marked with wavy underline . 19 4.4 Right-click drop down menu . 19 v List of Tables 3.1 Specs of computer used for evaluation . 7 4.1 Spell checkers performance and accuracy . 16 4.2 Spell checkers performance and accuracy . 17 1 1 Introduction 1.1 Motivation CET Designer is a tool for room planning where the end user often adds information text and labels to the reports that they create. The text tool does not have a spell checker and this is a common request from the end users. A spell checker would make text handling easier and faster for end users. Today, almost every software where users input text has the ability to check the spelling. This will result in text that feels much more professional than if spelling errors were present. Being professional is important to make clients feel that they can trust in a company. If a user of the current version of CET Designer wants to spell check a text a separate software needs to be used. When the text have been checked it can then be inserted into the CET Designer reports. This method takes some time and effort from the user. If the users feels that it is not worth the time and effort to check the spelling in a separate software, unchecked texts may be inserted into the reports. If a spell checker is integrated into CET Designer the texts can easily be checked. This will lead to more efficient spell checking compared to using a separate spell checking program. 1.2 Aim The goal of this thesis is to investigate if Windows built-in spell checker is good enough for CET Designer or if there exists another suitable spell checker. The most suitable alternative will then be used for the spell checker that is integrated in the CET Designer text tool. 1.3 Research questions With this thesis work, we would also like to answer the following research questions: • Can Windows spell checking API be used to check spelling in CET Designer’s text tool? • What alternatives exist to check spelling of a text in CET Designer text tool? • Which alternative gives the best performance and accuracy? 2 1.4. Scope 1.4 Scope For this thesis we will only implement a prototype of an integrated spell checker. Only the performance of Windows spell checking API and other suitable solutions will be evaluated and not the techniques used to check spelling. 1.5 Background 1.5.1 Configura Configura is a company operating globally with its headquarter in Linköping, Sweden and commercial operation in Grand Rapids, Michigan, USA and Kuala Lumpur, Malaysia. The company, which is privately owned, was founded in 1990 and has over 110 employees world- wide. Parametric Graphical Configuration (PGC) software solutions is created by Configura for leading international industries. PGC is a development framework for implementation of fast, efficient and intuitive software for graphical configuration. The software is suitable for office furniture but can also be used for kitchen & bath, material handling and industrial ma- chinery. Software developed by Configura includes CET Designer R , Configura R (original software platform) and InstantPlanner R . 1.5.2 CET Designer CET Designer R is a space-planning software that makes it easier to specify and sell products in a variety of industries. The software is a complete solution that handles every step of the sales and order process quickly and with accuracy. In the 2D and 3D virtual environments one can simply drag and drop components and the software will calculate pricing to prevent the user from making calculation mistakes. 1.5.3 CM The dissatisfaction with C++ and the need to restart and recompile the application after any source code change that leads to long work cycles motivated the development of CM (Con- figura Magic). CM is a object-oriented programming language that supports extensible syntax and in- cremental development. Allocated memory for objects and values is automatically reclaimed by a garbage collector. The source files is compiled to machine code but it is just-in-time (JIT) compiled. This means that the compiler will only translate into machine code when it can not delay it anymore. Code can be changed and added during application runtime since the compilation is interleaved with the execution. This leads to shorter developing cycles with- out losing performance and the developer can get feedback from the code they are working on faster. CET Designer is programmed in CM and therefore CM will be used for the integration of a spell checker. CM has good possibilities to integrate with dll’s and Windows API. 3 2 Theory In this chapter we will introduce the concept of spell checkers and spell correctors. We will also describe some metrics that can be used to evaluate spell checkers. 2.1 Spelling error detection Liang [7] and Peterson [11] define two types of spelling programs: Spell checkers and spell correctors. spell correctors will be explained in the next section. A spell checker will simply be given an input text and detect the words which are incorrectly spelt in some given language. A word is defined as a continuous string of characters that exists in the given language. A non-word on the other hand is defined as a word that is not found in a given word-list or dictionary or if the word form is incorrect. Liang [7] describes two main techniques that are used for spell checking: Dictionary lookup and n-gram analysis. Spell checkers can have a combination of techniques to check the spelling of words.

Spell Checker in CET Designer

Automatic Correction of Real-Word Errors in Spanish Clinical Texts

NLP Commercialisation in the Last 25 Years

Intellibot: a Domain-Specific Chatbot for the Insurance Industry

Unified Language Model Pre-Training for Natural

Spell Checker

Grammar Checker for Hindi and Other Indian Languages

Finite State Recognizer and String Similarity Based Spelling

Spell Checking in Computer-Assisted Language Learning: a Study of Misspellings by Nonnative Writers of German

Exploiting Wikipedia Semantics for Computing Word Associations

Chatbot in English Classrooms Encourage Negotiations of Meaning

Words in a Text

Basic Version of Multilingual Semantic Text Analysis