A collaborative system for building and maintaining

Tomasz Naskr˛et G4.19 Research Group, Department of Computational Intelligence Wrocław University of Science and Technology, Wrocław, Poland [email protected]

Abstract types are placed radially around the central ele- ment, e.g. sloWTool or WordTies (Pedersen et al., A collaborative system for con- 2012). GernEdiT (Henrich and Hinrichs, 2010) struction and maintenance is presented. Its offers visualisation of the wordnet structure in the key modules include WordnetLoom editor, range selected by the user, but it is hierarchical and Wordnet Tracker and JavaScript Graph. focused mainly on hypernymy. Moreover, the vi- They offer a number of functionalities that sual presentation does not allow for direct editing allow solving problems on every stage of of the structures. WordnetLoom stands out of the building, editing and aligning wordnets by remaining tools, because it offers a graph-based teams of lexicographers working in par- visualisation of wordnet data and provides entirely allel. The experience collected in recent different workflow based on the direct interaction years has allowed us to refine applications with graph nodes. In this paper, we will present and add new modules to provide the best the most recent development of WordnetLoom and user experience in a reliable and easily progress in relation to earlier releases. We have maintainable way. improved the graphic design for better user expe- rience and implemented the lexical unit graph vi- 1 Introduction sualisation. Wordnet is not yet another electronic . Both dictionary making and wordnet building It is a complex lexico-semantic network. Its con- are usually carried out by teams of lexicographers struction, especially when done manually by a and/or developers. Collaborative work, especially team of lexicographers, and its further editing in distributed teams working from , requires con- and/or aligning with other resources requires very trol tools to provide quality assurance and devel- advanced and flexible tools. Such tools should of- opment progress. In-built auditing/change back- fer the possibility of simultaneous work of many log feature is often absent in these systems and team members on the same (a wordnet for data versioning is handled by external VCS1 soft- a particular language), simultaneous work of dif- ware or done manually. The newest version of ferent teams on different , and the sub- WordnetLoom is interconnected with the Wordnet sequent manual or semi-automated linking of the Tracker module which provides additional feed- constructed resources. back channel for lexicographers to enrich their Dictionary compiling tools are mostly designed workflow. Every activity of each lexicographer is as complex XML editors such as, for instance, registered and can be monitored by a senior lexi- Lexonomy (Mechura,ˇ 2017). This approach may cographer. This paper will showcase how auditing not be beneficial in working with graph-like struc- and monitoring can be handled. tures. Therefore, several dedicated tools have been We will also present a new web-related mod- designed and are currently used by different - ule, namely JavaScript Graph. JavaScript Graph net teams e.g DEBVisDic (Horák et al., 2006), module is an answer to user needs and provides sloWTool (Fišer and Novak, 2011). Visualisa- the possibility of embedding graph visualization tion of wordnet graphs in most tools follows a to existing websites or applications. radial pattern: a synset in focus is presented in the middle and all links, irrespectively of their 1VCS - Version Control System e.g. Git, Subversion In this work, we will present the key modules itor level, the user with administrator privileges that are part of a collaborative system for word- can modify and add elements to dictionary enti- net construction and maintenance including Word- ties such as: part of speech, domain, register (see netLoom editor, Wordnet Tracker and JavaScript editing Fig. 1c) and adding or edit- Graph. ing types of semantic relations (see editing rela- tion types Fig. 1d). The main advantage of the 2 WordnetLoom Demo application is the possibility of working with vi- sualization in the form of a graph, which provides Up to version 1.68, WordnetLoom was a stan- quick and easy navigation and simplifies the cre- dalone java fat client application directly connect- ative process. Due to the fact that the Editor has ing to its database with all logic contained on the recently undergone a major architectural transfor- client side. Such approach ensures that scaling mation, it has allowed for even simpler modifica- of the application could only be possible by scal- tions and easier addition of new components, such ing the database server, in this case MySQL2. In as in the case of implementing the extended se- order to meet the growing numbers of users and mantic description panel for Dictionary of Polish challenges in providing dedicated endpoints not Borrowings in Yiddish7 (see Fig. 1e). Also within only for the client editor application, but also for this project we have created a graph visualization other external applications, web pages or mobile of lexical units which has become the part of a core applications, all business logic was extracted to a application. separate application built on top of JEE83 frame- work. The application is responsible for data val- 3 Wordnet Tracker Demo idation, data auditing, user activity monitoring, user management and data processing. It provides An important aspect of the process of building a communication channel via REST API (Field- and maintaining wordnet is the ability to moni- ing, 2000) in the form of Siren4-like hypermedia tor changes made by team members. It is made specification. Scaling of the application itself is possible by the Wordnet Tracker module which done by docker-compose5 replicas, while database provides tracking of user activity (see Wordnet scaling can be achieved by replication configu- Tracker dashboard Fig. 2) in terms of the num- rations where at least two databases are avail- ber of lexical units, synsets and semantic relations able. Master database configuration is optimized entered (see Fig. 3 for synset relation changes). for writing and slave databases have configuration Through this application, the lexicographer has optimized for high performance reading. Further also access to the full history of changes that have scaling can be ensured by introducing new slave been made within a given lexical unit (see Fig. 4 database nodes for each distinct consumer such as for current changes of lexical units). All changes a mobile application or a web page. in the synset structure are presented in Fig. 5 The main consumer of the API is a thick client where the left side column displays the current in the form of WordnetLoom Editor java appli- synset state, while the right side column shows all cation (main application workspace presented at changes in the synset elements. The user as well as Fig. 1a) which has been slimmed down and does the coordinator have access to current changes in not contain essential business logic which reduces real time for constant monitoring. This function- it to the role of a simple REST client. It enables ality turned out to be particularly valuable when advanced search functionalities and basic CRUD6 working with new, inexperienced lexicographers. operations on typical core objects being part of The application administrator has the possibility to the semantic structure such as synset, sense (see create diagnostic queries within lexicons or even sense editing properties Fig. 1b), sense relation, within the entire dataset, as well as to create statis- synset relation, and relation type. From the ed- tic queries. In both cases the generated query re- sults are available for download in the form of 2 https://www.mysql.com/ files. Wordnet Tracker also provides basic user 3Java Enterprise Edition 8 specification https://javaee.github.io/javaee-spec/ management panel where the privileged user can 4https://github.com/kevinswiber/siren add new users, reset passwords or restrict user ac- 5https://docs.docker.com/compose/overview/ cess to chosen lexicons. 6CRUD are four basic functions of persistent storage (such as create, read, update and delete) 7https://polonjid.wn.uw.edu.pl/?lang=en 4 JavaScript Graph Module Demo the Polish Ministry of Science and Higher Educa- tion and the project funded by the National Sci- Presenting work results in the form of a graph vi- ence Centre, Poland under the grant agreement No sualization outside WordNet Loom editor environ- UMO-2015/18/M/HS2/00100. ment is possible now by a created javascript mod- ule. The module tries to faithfully preserve the References navigation functions as in the WordNet editor, but at the same time gives the possibility to adjust the Roy Thomas Fielding. Architectural Styles and color scheme and nodes style to the host appli- the Design of Network-based Archi- cation/page design. The presentation data model tectures. PhD thesis, 2000. AAI9980887. is fetched from the WordnetLoom server via the Darja Fišer and Jernej Novak. Visu- REST endpoint and the D3.js8 library with custom alizing sloWNet. In Proceedings of modifications handles graph visualization and user eLex, pages 76–82, 2011. URL http: interaction. The module is constructed in such a //elex2011.trojina.si/Vsebine/ way so as to allow easy embedding in other ap- proceedings/eLex2011-9.pdf. plications, such as a mobile application or a web- Verena Henrich and Erhard Hinrichs. GernEdiT – site. A very good example can be the main page the GermaNet editing tool. In Nicoletta Calzo- 9 of plWordNet where the module is used in the lari (Conference Chair), Khalid Choukri, Bente form of a pop-up window or as a full scale cen- Maegaard, Joseph Mariani, Jan Odijk, Ste- tral element of the website presented at the online lios Piperidis, Mike Rosner, and Daniel Tapias, 10 Dictionary of Polish Borrowings in Yiddish (see editors, Proceedings of the Seventh Interna- Fig. 6). Simple library import and basic configura- tional Conference on Language Resources and tion will allow to present wordnet lexicon as graph Evaluation (LREC’10), Valletta, Malta, May visualization on every platform where JavaScript 2010. European Language Resources Associa- is supported. tion (ELRA). ISBN 2-9517408-6-7. 5 Conclusions and Further Works Aleš Horák, Karel Pala, Adam Rambousek, and Martin Povolný. DEBVisDic — first version of This concludes our brief description of each new client-server wordnet browsing and editing module. We have seen that the combination of tool. In Proceedings of the Third International presented tools offers solutions to common tasks WordNet Conference — GWC 2006, pages 325– and problems encountered while building word- 328. Masaryk University, 2006. nets particularly by distributed teams. We will M. B Mechura.ˇ Introducing Lexonomy: an continue to be open-source software licensed un- open-source dictionary writing and publishing der GNU LGPL 3.0. The source code is hosted in system. In Electronic in the GitHub repository(https://github.com/CLARIN- 21st Century: Lexicography from Scratch. Pro- PL/wordnetloom). ceedings of the eLex 2017 conference, 19- We will continue to actively develop presented 21 September 2017, Leiden, The Netherlands., tools over the next years focused on adding new 2017. URL https://www.lexonomy. functionalities based on the needs of users. We eu/docs/elex2017.pdf. will also direct our development towards a reli- able, fully-flagged web-based system and we will B.S. Pedersen, L. Borin, M. Forsberg, K. Lindén, strive to continue to simplify system deployment H. Orav, and E. Rögnvalssson. Linking and val- by an extensive use of docker11 containers. idating nordic and baltic wordnets – a multilin- gual action in META-NORD. In Proceedings of Acknowledgment 6th International Global Wordnet Conference, pages 254–260., Matsue, Japan., 2012. The work co-financed as part of the investment in the CLARIN-PL research infrastructure funded by

8D3.js is a JavaScript library for manipulating documents based on data 9http://plwordnet.pwr.edu.pl/wordnet/ 10http://polonjid-dictionary.clarin-pl.eu 11https://www.docker.com/ Figure 1: Key windows in WordnetLoom

(a) Application main workspace.

(b) Sense properties window.

(c) Dictionaries window.

(e) Extended semantic description for Polish Borrowings in Yiddish dictionary.

(d) Relation types window. Figure 2: Tracker dashboard.

Figure 3: Synset relations changes history view.

Figure 4: Senses changes view. Figure 5: Selected synset history view.

Figure 6: Example of embedded java script visualization module.