N Icjw KN <3L3nj 8Ra QNICN3 `3C3 A,@ Br8js A3 Irric CN HCN<Nccjc,C

0SJHJOBMMZQVCMJTIFEJO-àOHFO )BSBME,VQJFU[ .BSD#BʤTLJ 1JPUS#BSCBSFTJ "ESJFO$MFNBUJEF 4JNPO1JTFUUB *OFT &ET 1SPDFFEJOHTPGUIF8PSLTIPQPO$IBMMFOHFTJOUIF.BOBHFNFOUPG-BSHF$PSQPSB $.-$ -JNFSJDL +VMZ 0OMJOF&WFOU o.BOOIFJN-FJCOJ[*OTUJUVUGàS%FVUTDIF4QSBDIF 1Q %0*https://doi.org/10.14618/ids-pub-10467

H3ccRNc I3aN30 CN [nICjw KN<3L3Nj 8Ra QNICN3 `3c3a,@ bR8jsa3 iRRIc CN HCN

MCIc /C3sI0 N0 2ICy Ka<a3j@ N0 Ka, FnUC3jy H3C$NCy BNcjCjnj3 7Ra i@3 ;3aLN HN

$cja,j `383a3N,3 +RaUnc /3`3FR VFnUC3jy 3j IY. lzSz. lzS4W 8Ra $aR0 cU3,janL R8 ,RaUnc ICN

5IJTXPSLJTMJDFOTFEVOEFSB$SFBUJWF$PNNPOT 20 "UUSJCVUJPO*OUFSOBUJPOBM-JDFOTF %0*https://doi.org/10.14618/ids-pub-10469 bundle with the research findings. not without reason. Therefore, we have published An additional aspect when providing online tools most of the KorAP modules under the very liberal for (scientific) work is that operation must be guar- BSD 2-clause license4 as open-source software on anteed to be secure, uninterrupted and failure-free our Gerrit server5 and on GitHub6. The biggest (in the best case), since on the one hand users de- concern we had with our license choice was that pend on them for their research and on the other the BSD licence does not exclude the subsequent hand the validity of the results depend on their cor- removal of externally developed code. Our com- rectness. promise solution to this consisted of pointing out in the licence notes (certainly not completely legally 3 Reproducibility secure) that externally contributed code would also In order to make scientific studies with computer- automatically fall under the BSD licence. For the assisted methods reproducible, not only prerequis- case that substantial code parts were contributed by ites have to be fulfilled by the study design and the external developers, we also planned to introduce underlying data, but also the software should be Contribution License Agreements (CLA) independ- designed and developed in such a way that it can be ently via corresponding hooks in GitHub and Gerrit. run on other systems in the form used for the initial So far, we have not had any bad experiences with study. This poses numerous challenges (Ivie and our licensing policy. Thain, 2018), especially for online tools, but first The decision on Cosmas II licensing was made and foremost, it requires against opening up the source code in the mid- 1990s. Therefore all aspects of reproducibility were 1. licensing that is as open as feasible, in the responsibility of the project owner.

2. transparent versioning of the software, and 3.2 Versioning 3. a high degree of portability, so the software As software continues to be developed, there are dif- can be run independent of its environment and ferences that may disrupt reproducibility. At times time. backwards incompatibility is also inevitable. This is where versioning plays an important role. Ver- It should be noted, of course, that reproducibility sioning ensures consistent behaviour of a software can basically only be achieved according to the best by identifying and recording immutable states of effort principle. Full control over and complete a software (i.e. that are called versions) over time. documentation of the environment can rarely be By using version control systems, older versions of guaranteed (i.e. full control over the hardware, the a software can be rebuilt. We use Git for versioning operating system, the compiler or interpreter used, KorAP and SVN for Cosmas II. Moreover, we use etc.). GitHub for hosting the KorAP Git repository. A version number or hierarchy (e.g. “1.5.2”) is 3.1 Licensing often used to communicate changes between states. To enable autonomous reproduction of a study using While the different levels of a version number can computer-assisted methods, the software used must indicate different forms of change (compare with se- be accessible for everyone without restriction in the mantic versioning7), it is crucial that the behaviour best case. In order for the implementation to be of a released version of software is immutable, and completely transparent, publication as open source that it can be restored at any time. In the case of is essential (Hasselbring et al., 2020). This not KorAP, different components are in play (Diewald only helps with reproducibility, but can also reveal et al., 2016), which are operated and released inde- problems in the analysis, originating from errors in pendently of each other. In order to provide users the software used (Goldacre et al., 2019). with information about the whole software stack The decisive criterion in the selection of soft- in use, a central API endpoint was designed that ware licences for the publication of KorAP mod- returns the individual version numbers of the com- ules was to restrict their use as little as possible and 4Also called Simplified BSD License or FreeBSD License 3We follow the terminology by the Association for Com- 5https://korap.ids-mannheim.de/gerrit/admin/ puting Machinery (2020). Please note, that the definitions for repos “reproducibility” and “replicability” were revised in version 6https://github.com/KorAP 1.1; see Plesser (2018) for an overview on the terminology. 7https://semver.org/

21 ponents involved. ability of the system while ensuring equivalence of Hashing and tagging can be used for identifying the final result. As a single criterion for equivalent and naming particular changes respectively (Ivie behaviour, we consider the error-free run of all test and Thain, 2018). Git uses a hash function to cre- suites of the system – including their dependencies ate a unique identifier for each change or commit (see sec. 4.3). To test the error-free operation in dif- (Chacon and Straub, 2014). An accurate versioning ferent environments, we use Continuous Integration system would involve the transparent communica- for some components via GitHub (see sec. 4.3). To tion of git commit hashes. Tags on these commits facilitate full building of the overall system locally, are often used to mark releases, but they are not we provide both Vagrant10 and Docker11 files as necessarily unique. In KorAP, we take advantage our way to enable a virtualization approach (Howe, of Git commit hashes as commit references, and 2012). Gerrit change-id (see sec. 4.1) to group commits At the beginning of the development of COS- belonging to the same review. Moreover, we use MAS II (Bodmer, 1996), the only requirement in tags to mark releases both in KorAP and Cosmas terms of portability was the use of GNU-C, so it II. was usually necessary to access the existing envir- Releases can be made citable by archiving them onment to reproduce behaviour.12 in Zenodo, an open access repository for deposit- ing research resources, as supported by GitHub8. 3.4 Replicability Zenodo will issue a Digital Object Identifiers (DOI) To additionally enable replicability of a computer- for each release in a GitHub repository connected assisted study (i.e. re-implementation of the design to it. We have published recent KorAP releases in by a different team using methods developed inde- Zenodo. pendently; see Footnote 3), further detailed docu- Beside software versioning, it is important to mentation of the methods used is necessary. In the maintain API versioning to support clients using case of research software tools, this is sometimes older APIs, especially when there are breaking part of the official documentation and thus does changes in the newer APIs such as changes in the re- not require repeated explanation in publications. In quest or response formats and types. API versioning many cases, such as the use of collocation meas- is commonly achieved by including the API version ures, independently implemented methods already number in the service URL (i.e. URI versioning), allow for the replicability of results (at least from adding a custom header or using the Accept header the software point of view). indicating the API version number. For KorAP API, In KorAP, to facilitate replicability of studies, we support API versioning by including the API different query languages were implemented to al- version number within its service URL path. low comparison of results across multiple corpus analysis platforms. The use of virtual corpora en- 3.3 Portability ables the replicability of studies with different data Exact repetition of a computer-assisted scientific bases (for example, considering comparable cor- study would require “building the same program pora; Kupietz et al., 2020); APIs, URL-encoded with the same compiler running on the same hard- queries and various client libraries should help fa- ware and the same operating system” (Ivie and cilitate this. The functionality of KorAP and Cos- Thain, 2018, p. 63:4), which is rarely possible9. In mas II is documented in scientific publications, in the case of online tools, this is further complicated manuals, in GitHub Readmes and Wikis, and com- because the server architecture (both hardware and mented in the code. software) is seldom communicated to make attacks more difficult (see sec. 4). This requirement is even 4 Dependability and Security more ambitious to meet if a study is to be repro- Avizienis et al. (2004) provide a taxonomy of de- duced far in the future, when common hardware pendable and secure computing, whose individual and software have changed to a great extent. Therefore we instead aim at a high degree of port- 10https://github.com/KorAP/KorAP-Vagrant 11https://hub.docker.com/u/korap 8https://guides.github.com/activities/ 12Only due to multiple migrations of the software, for in- citable-code/ stance from Solaris to a modern 64-bit Linux architecture, did 9This may still lead to different results in case of non- the aspect of portability come to the front, albeit not in the deterministic behaviour. context of facilitating reproducibility.

22 parts can be seen as cornerstones of quality manage- in recent years.16 Gerrit is an open-source team ment in the provision of software. One definition of collaboration tool that is typically operated via a dependability is “the ability of a system to avoid ser- web interface. Developers can use it to review oth- vice failures that are more frequent or more severe ers’ changes to their source code and comment, im- than is acceptable”, attributed with the Availabil- prove, augment, approve or reject those changes. It ity, Reliability, Safety, Integrity and Maintainabil- is tightly integrated with Git and can be considered ity of the system. When taking security concerns an interface layer on top of Git. into account, the confidentiality of the system is The multiple-eyes principle not only helps to another important attribute (Avizienis et al., 2004, avoid errors and to be able to make design decisions ch. 2.3).13 collaboratively, but also ensures that code knowledge is distributed among several people, even if 4.1 Availability and Reliability only individuals are responsible for a code base. In Availability is defined as the “readiness for correct this way, personnel failures or absences do not ne- service” and reliability as its “continuity”. With cessarily lead to serious disruptions in operations, respect to security, this means a limitation to au- maintenance and development. thorized actions only. Admittedly, the use of Gerrit means an increase In order to keep the availability of KorAP at in the entry threshold, especially if the users are not a high level, we use the service monitoring tool yet very experienced in dealing with Git either. In Icinga14, which monitors the availability of the web addition, the review effort is certainly not to be neg- services themselves and the status of the servers lected and the maintenance of Gerrit also involves involved in order to be able to recognise emerging a certain additional effort. Nevertheless, in view problems early on. To indicate planned downtimes, of the direct comparison with projects running in we currently use the start page of KorAP’s web in- parallel without code review and the advantages terface. In order to also be able to notify API users already mentioned above, we are convinced that in in the future, a corresponding message is planned the case of research tools, the use of a code review- for the functions for establishing connections in ing system is worthwhile at least from a project size KorAP’s client libraries. A fail-safe server struc- of 2-3 people. The initial and recurring costs in- ture with load-balancing and automatic switching curred are more than made up for by the avoidance between servers is not yet implemented for KorAP. of errors and the distributed code knowledge. In This is also because with limited resources and in our experience, another positive side effect of code the context of research tools, we do not prioritise reviews in terms of reliability is that commits are availability over reliability. typically smaller and that pieces of parallel develop- Concerning reliability in corpus linguistic re- ment strains can be more often combined without search, in particular, it is a commonplace that inter- conflicting merges. This also increases the readab- esting corpus findings are often initially artefacts of ility and traceability of the commit history. corpus composition and that corpus sampling and To prevent unauthorized activities, we use integ- analysis cycles should therefore be regarded as an rate the OAuth 2.0 framework (Hardt, 2012) allow- iterative process (Kupietz, 2016). One could add ing users to grant their applications access to the that the findings that remain after the elimination of KorAP APIs17. These applications may thus per- confounders may also not represent true properties form operations within the scope of their grants, of the language domain under study, but also results e.g. searching and retrieving annotations, on behalf of software bugs. of the users. A proven means of reducing software errors is the use of code review (McIntosh et al., 2016), which 4.2 Safety, Confidentiality, and Integrity in the context of research tools can also often take Safety is defined as the “absence of catastrophic on the role of a classic peer review, at a fine granu- consequences on the user(s) and the environment”, larity level. Among assisting systems, Gerrit Code confidentiality – as an attribute to security only –as Review15 has become particularly well established “the absence of unauthorized disclosure of inform- 13The following definition quotes are taken from Avizienis 16Gerrit is used by several prominent companies and pro- et al. (2004). jects, such as Android, SAP, LibreOffice, Volvo, Skia, TYPO3, 14https://icinga.com/ ARM, and Wikimedia. 15https://www.gerritcodereview.com/ 17https://github.com/KorAP/Kustvakt/wiki

23 ation” and integrity as the “absence of improper cies, but also automatically makes merge requests system alterations”, which in regard to security in- to update, in the case of Java19, Maven or Gradle cludes unauthorized alterations. project files. Following GitHub’s acquisition of Such security risks are in particular a threat to Dependabot in May 2019, the feature was added unmaintained online tools and can bring a very natively to GitHub. In the meantime, there is also quick end to their operation. When potentially ser- an open-source project based on the original De- ious security vulnerabilities of an online tool be- pendabot core, that makes Dependabot available come known and there are no longer any respons- for GitLab. ible parties, an academic institution usually has no We have been using Dependabot with GitHub choice but to take the tool offline immediately. Espe- since July 2020 for the KorAP component Kustvakt cially since, unlike research data management plans, and have since received an average of 15 update research software management plans are probably pull requests per month. These update requests are a rarity. But even with tools that are still in devel- particularly useful and easy to handle with corres- opment or maintenance, it is not obvious how to ponding continuous integration workflows, which identify security problems with reasonable effort. can be used to automatically check whether the In the case of KorAP, however, the publication of software is still buildable and operational with the the source code on GitHub already helped us a lot updated library (see 4.3, below). without further ado. GitHub has an integrated security scanner that is enabled by default for public 4.3 Maintainability repositories. It detects so called Common Vulnerab- Maintainability is the “ability to undergo modifica- ilities and Exposures (CVE) in used dependencies tions, and repairs” of a system. Continuous main- and notifies the repository owners. Similar code tenance is necessary not only to fix bugs, to accom- scanner plugins for IDEs can serve as a supplement modate changes in demand and to address security or alternative. There also seem to be open source issues (sec. 4.2), but also to accommodate changes approaches for such scanners, but we have no ex- in the behaviour of client or server environments. perience with them. Modularity has proved to be useful to simplify a After having received a notification about CVE complex system by breaking it down into smaller of a library and there is already an update to this independent modules or components. Smaller mod- that resolves the vulnerability, a common problem ules are easier to maintain and reuse than a complex is that the update often also requires the update of system, since they are easier to understand, test and other libraries, which again depend on other library restructure independently of others. KorAP is com- updates and so on. This can mean that fixing a se- posed of small independent components, both the curity vulnerability in one used library ultimately service (Diewald et al., 2016) as well as the prepro- requires significant work to adapt the code to all the cessing pipeline. interface and behaviour changes of all the neces- Due to the increased complexity of the system, sarily updated libraries (see sec. 4.3). Doing this the maintenance of software dependency trees re- quickly without taking the tool offline in the mean- quires a great deal of effort, so that a constant trade- time can be a major challenge, even for software off must be made between reuse and reimplement- that is still under active maintenance. ation of functions (i.e. re-inventing the wheel vs Unless the use of external libraries is dropped, dependency hell; see Abdalkareem et al., 2020). In which however causes other costs and issues (see KorAP, we decided to use both approaches, namely following section), the only secure option is to up- reusing functions and libraries as far as possible (as date library dependencies regularly and to factor long as indirect dependencies are manageable) and this in from the outset as permanent maintenance re-implementing only when necessary (e.g. when costs for the operation of the online tool. existing functions are not adequate to cope with Tools that can help with the continuous updating new requirements). of library dependencies have proven useful for cer- All KorAP components are equipped with com- tain projects (Mirhosseini and Parnin, 2017; Wessel prehensive test suites. The test suites help us on the 18 et al., 2018). Dependabot , a so called dependency one hand to check new functions for proper beha- scanner, not only informs about updated dependen- 19Besides Java, various other programming languages are 18https://dependabot.com/ also supported, such as JavaScript and Python.

24 viour, and on the other hand to automatically en- as well, whereby we would add another pillar for sure that program changes do not alter any previous maintaining and operating the system. behaviour (cf. Rafi et al., 2012). It is also signific- While the perception of the importance of soft- ant for checking if updated dependencies break or ware for scientific work is growing, development, modify the system flow. We also use the automatic maintenance, and operation is rarely associated detection of test coverage to identify deficiencies with gaining scientific merit: “many activities are and gaps in the test suites. It should be noted that software maintenance – new functionalities or end- the maintenance of the test suites involves signific- less bug fixing – and hardly publishable” (Goble, ant extra costs. In some areas, we recently employ 2014, p. 6). Anzt et al. (2021) therefore propose additional fuzzing techniques (Miller et al., 1990) to accept contributions to open source projects (so- to test unexpected input to address shortcomings in called “pull requests”) as a new form of academic manually crafted tests. contributions to conferences22, in order to increase An important automatable instrument to control the motivation to participate in the development of the functionality of software in the last instance are research software tools, which is beneficial for the continuous integration (CI) tests. We now apply wider scientific community. such CI test workflows to the production branches Our remarks regarding quality management to of almost all KorAP components, using GitHub ensure reproducibility, dependability and security Actions20. The workflows check if the tool can be of online research software tools should in no way built and apply all its available tests partly in differ- be misunderstood as best practices. They are only ent operating system environments (see sec. 3.3). meant to reflect our choices and experiences in Our CI workflows are usually configured in sucha these areas running the corpus analysis platforms way that they are also automatically apply merge KorAP and Cosmas II, whereby all these decisions requests submitted via GitHub, so that it is imme- were based on a cost-benefit calculation. Especially diately apparent if these affect the functionality of when creating research software, the effort to run a the software 21. practicable quality management is often not compat- ible with the circumstances. We are however con- 5 Discussion vinced that the aforementioned aspects are worth to Developing, maintaining, and operating online re- be considered in the development, operation, and search software tools presents numerous challenges maintenance – maybe even in the planning – of including ensuring reproducibility, dependability online research software tools in general. and security, as it requires knowledge and skills in many different areas. In fact, however, like research Acknowledgements software in general, these tools are predominantly We thank our four anonymous reviewers for helpful developed by individuals from academia – and less comments on earlier drafts of the manuscript. with a background in software development (Co- hen et al., 2021). In our case, this background is in linguistics. References Cohen et al. (2021) introduce a model of four pillars considered essential to develop sustainable Rabe Abdalkareem, Vinicius Oda, Suhaib Mujahid, research software in such an environment, with soft- and Emad Shihab. 2020. On the impact of using trivial packages: an empirical case study on npm ware development being only one of the pillars. and PyPI. Empirical Software Engineering (EMSE), As additional pillars, they introduce community in- 25:1168–1204. volvement for collaborative problem solving, train- ing of researchers in software development tech- Hartwig Anzt, Eileen Kuehn, and Goran Flegar. 2021. niques, and policy development for institutional Crediting pull requests to open source research software as an academic contribution. Journal of Com- support. We consider these points to be essential for putational Science, 49:101278. the development of online research software tools Association for Computing Machinery. 2020. Artifact 20https://docs.github.com/en/actions 21Consisting of multiple components developed in vari- review and badging version 1.1. Technical report, ous programming languages and frameworks having distinct Association for Computing Machinery. formats and structures, KorAP is too complex and not suitable for uniform code and comment styles and conventions. 22With a focus on High Performance Computing.

25 Algirdas Avizienis, Jean-Claude Laprie, Brian Randell, Marc Kupietz, Nils Diewald, Beata Trawiński, Ruxan- and Carl Landwehr. 2004. Basic concepts and tax- dra Cosma, Dan Cristea, Dan Tufiş, Tamás Váradi, onomy of dependable and secure computing. IEEE and Angelika Wöllstein. 2020. Recent develop- Trans. Dependable Secur. Comput., 1(1):11–33. ments in the European Reference Corpus (EuReCo). In Translating and Comparing Languages: Corpus- Franck Bodmer. 1996. Aspekte der Abfragekompon- based Insights, Corpora and Language in Use, pages ente von COSMAS II. LDV-INFO, 8:142–155. 257–273, Louvain-la-Neuve. Presses universitaires de Louvain. Scott Chacon and Ben Straub. 2014. Pro Git, 2nd edi- tion. Apress, USA. Marc Kupietz, Harald Lüngen, Paweł Kamocki, and Andreas Witt. 2018. The German Reference Jeremy Cohen, Daniel S. Katz, Michelle Barker, Neil Corpus DeReKo: New Developments – New Chue Hong, Robert Haines, and Caroline Jay. 2021. Opportunities. In Proceedings of the Eleventh The four pillars of research software engineering. International Conference on Language Resources IEEE Software, 38(1):97–105. and Evaluation (LREC’18), pages 4353–4360, Miyazaki/Paris. European Language Resources Nils Diewald, Michael Hanl, Eliza Margaretha, Association (ELRA). http://www.lrec-conf.org/ Joachim Bingel, Marc Kupietz, Piotr Banski, and An- proceedings/lrec2018/pdf/737.pdf. dreas Witt. 2016. KorAP architecture – Diving in the deep sea of corpus data. In Proceedings of the Tenth Shane McIntosh, Yasutaka Kamei, Bram Adams, and International Conference on Language Resources Ahmed E Hassan. 2016. An empirical study of and Evaluation (LREC’16), pages 3586–3591, the impact of modern code review practices on Portorož/Paris. European Language Resources software quality. Empirical Software Engineering, Association (ELRA). http://www.lrec-conf.org/ 21(5):2146–2189. . proceedings/lrec2016/pdf/243_Paper.pdf Barton P. Miller, Louis Fredriksen, and Bryan So. 1990. An empirical study of the reliability of UNIX utilit- Carole Goble. 2014. Better Software, Better Research. ies. Communications of the ACM, 33(12):32–44. IEEE Internet Computing, 18(5):4–8. Conference Name: IEEE Internet Computing. Samim Mirhosseini and Chris Parnin. 2017. Can automated pull requests encourage software developers Ben Goldacre, Caroline E. Morton, and Nicholas J. De- to upgrade out-of-date dependencies? In 2017 32nd Vito. 2019. Why researchers should share their ana- IEEE/ACM International Conference on Automated lytic code. BMJ (Clinical research ed.), 367:l6365. Software Engineering (ASE), pages 84–94. Dick Hardt. 2012. The OAuth 2.0 Authorization Frame- Hans Plesser. 2018. Reproducibility vs. replicability: A work. RFC 6749. brief history of a confused terminology. Frontiers in Neuroinformatics, 11. Wilhelm Hasselbring, Leslie Carr, Simon Hettrick, Heather Packer, and Thanassis Tiropanis. 2020. Dudekula Mohammad Rafi, Katam Reddy Kiran Open source research software. Computer, pages Moses, Kai Petersen, and Mika V. Mäntylä. 2012. 84–88. Benefits and limitations of automated software test- ing: Systematic literature review and practitioner sur- Bill Howe. 2012. Virtual appliances, cloud computing, vey. In 2012 7th International Workshop on Automa- and reproducible research. Computing in Science En- tion of Software Test (AST), pages 36–42. gineering, 14(4):36–41. Victoria Stodden and Sheila Miguez. 2014. Best prac- Peter Ivie and Douglas Thain. 2018. Reproducibility tices for computational science: Software infrastruc- in scientific computing. ACM Computing Surveys, ture and environments for reproducible and extens- 51:1–36. ible research. Journal of Open Research Software, 2:1–6. Marc Kupietz. 2016. Constructing a corpus. In Philip Durkin, editor, The Oxford Handbook of Lexico- Mairieli Wessel, Bruno Mendes de Souza, Igor Stein- graphy, pages 62 – 75. Oxford University Press, Ox- macher, Igor S. Wiese, Ivanilton Polato, Ana Paula ford. Chaves, and Marco A. Gerosa. 2018. The power of bots: Characterizing and understanding bots in Marc Kupietz, Cyril Belica, Holger Keibel, and oss projects. Proceedings of the ACM on Human- Andreas Witt. 2010. The German Reference Computer Interaction, 2(CSCW). Corpus DeReKo: A Primordial Sample for Lin- guistic Research. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), pages 1848–1854, Val- letta/Paris. European Language Resources Asso- ciation (ELRA). http://www.lrec-conf.org/ proceedings/lrec2010/pdf/414_Paper.pdf.

N Icjw KN &lt;3L3nj 8Ra QNICN3 `3C3 A,@ Br8js A3 Irric CN HCN&lt;Nccjc,C

N Icjw KN <3L3nj 8Ra QNICN3 `3C3 A,@ Br8js A3 Irric CN HCN<Nccjc,C