MASARYK UNIVERSITY

FACULTY OF INFORMATICS

Information system for Proteomics laboratory

DIPLOMA THESIS

Martin Bednařík

Brno, 2015 DECLARATION

Hereby I declare that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Martin Bednařík

Advisor: Mgr. Aleš Křenek, Ph.D.

ACKNOWLEDGEMENT

I would like to thank my advisor, Mgr. Aleš Křenek, Ph.D., for his patience, guidance and all the practical advice. I would also like to thank doc. RNDr. Zbyněk Zdráhal, Dr., Mgr.

David Potěšil, Ph.D. and Mgr. Michal Obořil for their time, support and willingness to coop- erate.

KEYWORDS

LabKey Server, Proteomics laboratory, Laboratory information and management system,

Perun

ABSTRACT

The main goal of this thesis is to analyze the requirements of Proteomics laboratory for a laboratory information and management system, choose available open-source framework and design and implement a working prototype for this laboratory to use. Another goal is to study the options of external authentication and its usability with chosen framework. This thesis also discusses possible connection to external data storage. CONTENTS

1. Introduction ...... 1 2. Proteomics Core Facility ...... 3 2.1 Introduction...... 3 2.2 Involved people and roles ...... 3 2.3 Request processing workflow...... 4 2.4 Laboratory requirements ...... 7 2.4.1 Expected user roles ...... 8 2.4.2 System requirements ...... 9 3. Choice of technology ...... 13 3.1 Laboratory Information and Management Systems...... 13 3.1.1 Proteomics laboratory’s reasons for LIMS ...... 13 3.1.2 LIMS packages ...... 14 3.2 LabKey Server ...... 16 3.2.1 About ...... 16 3.2.2 Main features ...... 17 3.3 Perun ...... 20 3.3.1 Introduction ...... 20 3.3.2 Used and supported technologies ...... 21 3.3.3 Features ...... 21 3.4 Data Storage ...... 23 4. Design and implementation ...... 24 4.1 Introduction...... 24 4.2 Data model ...... 25 4.2.1 Introduction ...... 25 4.2.2 Schemas and queries ...... 26 4.2.3 Used data model ...... 27 4.3 Site structure ...... 28 4.3.1 Homepage ...... 28 4.3.2 Projects vs. folders ...... 31 4.4 Users and groups ...... 32 4.5 Authentication ...... 33 4.6 File storing ...... 35 4.7 Collaboration system ...... 36 4.8 Applications ...... 38 4.8.1 Introduction ...... 38 4.8.2 General issues ...... 39 4.8.3 Types of applications...... 39 5. Interaction with implemented system ...... 46 5.1 Introduction...... 46 5.2 Typical system use case scenarios ...... 47 6. Conclusion ...... 50 Bibliography ...... 52 Attachment ...... 54

1. INTRODUCTION

Proteomics Core Facility is a successful facility based under CEITEC. It receives large amount of orders from various customers, its employees successfully publish in scientific magazines and it manages to keep a very good reputation among the academic community.

However, this facility could work even more effective.

Keeping track of laboratory work requires non-trivial effort. Tens or even hundreds of ex- periments are performed every year and each one requires documentation about who re- quested the measurement, what exactly was requested, how was the measurement per- formed, who was responsible for what, what were the problems that came up during the whole process, etc.

Besides paperwork, there are machines that produce data which also need to be put into context, preferably in a way that allows effective look up in case of need. Employees need to know the state of active orders, be able to quickly check their own responsibilities.

Doing all this without an information system makes things rather slow and hard to man- age.

For years now, an idea of deploying an information system to speed things up has been alive. But the large amount of work, small amount of employees, no employees with either enough IT experience or time to invest to developing this information system resulted into a state when this idea never got realized.

The main goal of this thesis is to make the firsts steps and build a foundation for such sys- tem because it is expected that deploying of something that complex will take years.

In the second chapter we introduce the Proteomics Core Facility. We describe the way they work and then we present the result of our meetings where we discussed their requirements on laboratory information and management system.

The third chapter is more theoretical. We describe laboratory information and manage- ment systems in general, list available open-source packages and present main features of the package (LabKey Server) we decided to use. The next section of this chapter is about

Perun, an identity and access management system that we decided to use to test possibilities

1 for authentication to the LabKey Server. At last, we present brief information about the Pro- teomics laboratory’s external data storage solution.

In the fourth chapter we write about design and implementation. We compare features of

LabKey Server with particular requirements of Proteomics laboratory. We discuss what so- lutions that are already features of LabKey Server are sufficient for our purposes and what needed further customization and development. In the end we present technical issues of developing applications that use LabKey Server’s JavaScript API and examples of created applications.

The fifth chapter contains use case scenarios to show how some basic tasks are performed.

At last we summarize our achievements and discuss the direction that work on the imple- mentation could go.

2

2. PROTEOMICS CORE FACILITY

2.1 Introduction

Proteomics1 is one of core facilities belonging to Central-European Institute of Technology

(CEITEC).

It is an academic community consisting of professional employees with access to high- quality instrumentation. It participates in several research projects and it is capable of pro- cessing demands of the research community.

It focuses mainly on mass spectrometry-based proteomics, covering all steps of proteomic analysis including separation and quantification of protein mixtures, characterization of pro- teins and their modifications by mass spectrometry or bioinformatic data processing.

2.2 Involved people and roles

There is a small group of core facility employees lead by a manager ready to accept orders.

Orders are mostly submitted by external customers but some are for internal purposes only

(publishing activities).

Customers are aggregated to workgroups and each workgroup has a group leader. The group leader usually formulates their requirements and any member of a customer group can provide samples for evaluation.

Not all core facility members have identical functions. Employees who have access to la- boratory machines work with samples and run measurements. There are also specialists who focus mainly on report creation or administrative. The whole people and roles schema is depicted in figure 1.

1 http://www.ceitec.eu/ceitec-mu/proteomics-core-facility/z8 3

Figure 1: People and roles

2.3 Request processing workflow

Proteomics Core Facility workers have established a workflow for tasks they perform. This workflow is the same for all the orders that are submitted. The visualization of this work- flow can be found in figure 2.

1. Request form filling

Customers specify their requests and requirements by filling in a form which con-

tains prescribed inputs such as contact details or type of an experiment, along with a

space for a commentary. Although requests usually have a lot in common they vary in

details, which is why the processing method might not be clear from the beginning and

therefore the free-form commentary is important.

4

2. Processing method defining

After the form is submitted the information it contains is discussed in the research

group. The group takes several aspects into consideration. The most important are the

type of measurement, possibilities of the laboratory and the experience of employees.

The result should be a consensus on the processing method.

3. Sample preprocessing

A given sample needs to be prepared for measurement. This preprocessing is of a

chemical character. There are several basic types of processing. If a new type is required,

it is similar to the default ones, only with slight alterations.

4. Measurement

The measurement itself takes place in the laboratory on a machine that was chosen in

step 2. As a result we are given raw Mass Spectrometry (MS) data.

5. Evaluation/analysis

The raw MS data are used as an input for sample analysis. The data are loaded to the

application Proteome Discoverer 2 where they are automatically processed according to a

predefined plan. The plan is defined by employees directly in Proteome Discoverer ap-

plication and includes (among other things):

- Getting the MS data from the raw MS data file,

- Value filtration of chosen MS fields,

- Processing filtered MS data with Mascot3 (the search engine) which includes an

assignment of peptides to individual MS/MS spectra according to the values in a

chosen protein database),

- Results of a search are logged and further processed for statistics,

- It is possible that information taken from raw data give us additional information

about the identified peptides.

2 http://www.thermoscientific.com/en/product/proteome-discoverer-software.html 3 http://www.matrixscience.com/server.html 5

After the evaluation, Proteome Discoverer saves results to a .msf4 file that also con-

tains information relevant to the processing method, which makes it possible to repeat

the method (provided that the same conditions are preserved) or to backtrack the opera-

tions. However, it is not always possible to reconstruct the conditions (e.g. deleting a

record from Mascot server’s protein database might make the process unrepeatable).

The .msf file is then loaded with another Proteome Discoverer module for reporting.

The loaded file contains excessive amount of information which might indicate what

displayed items are not important and therefore could be filtered out. For example, it

could be desirable to display only peptides that are statistically relevant. Proteome Dis-

coverer then builds probable proteins from identified peptides.

The results are furthermore filtered. It depends on what we consider important and

appropriate to reveal in a final report.

6. Visualization

The chosen information (a list of proteins present in a given sample) is exported to a

TSV file and consequently to .xlsx file which represents the final report. Proteome Dis-

coverer doesn’t provide exporting to .xlsx in a needed extension.

Keeping track of individual steps completion is done manually. It is unnecessary work that slows down laboratory’s activities.

4 A file format for storing mass spectrometry data used by Proteome Discoverer 6

Figure 2: Workflow

2.4 Laboratory requirements

Proteomics Core Facility employees require a web based information system that will facil- itate and speed up work and bring automation to most of the administrative tasks.

However, human factor is still important in this area of work and we need to decide which tasks can be covered by the system. There are some pieces of software facility members al-

7 ready use and they want to keep them internal, not to be stored outside of core facility ma- chines. For example, when creating final reports for customers, employees use their own software that produces .xlsx files. The methods this software uses are considered sensitive information and it is required that laboratory has full control over it and wouldn’t upload it to external server.

Therefore the integration of these items that are already in use might not be done in full ex- tension and system shouldn’t take care of everything relevant to core facility workflow. It is required that users still do some actions manually and use the system to store only results of these actions.

Security and not much time consuming usage are important requirements.

2.4.1 Expected user roles

When discussing the information system user roles, we used information about current situation, actions individual employees perform and things they take care of.

A) Core facility users

 Administrator – administrative role for an information system. His main responsibil-

ity is configuration and maintenance but it is also expected this is the role taking care

of developing new features.

 Manager – a single user role with access to unique applications which are all build

around full access reading and summary features. Manager has an ability to display

the state of active orders, display a summary showing which users are assigned to

what (this comes with an ability to assign a content administrator to a particular or-

der and to change priorities of current tasks). He also should have a permission to

confirm validity and admission of requests submitted by customers.

 Content administrator – a supervising role for selected core facility workers. Content

administrator creates and administrates customer workgroups, customer group and

8

orders and grants permissions to these orders. He takes care of externally authenti-

cated users, confirms their authorization. In general it is also his responsibility to re-

pair mistakes of base users.

 Base user – a role that represents core facility worker who is assigned to particular or-

ders. Base user has an overview to-do list, collaborates with other workers via mes-

sage boards and it is also in his competence to publish results. His access is restricted

only to orders assigned to him.

 Base user trusted – this is a special variant of base user with whom he shares the same

functions but has access to all orders under selected workgroups (not only the orders

he is involved in).

B) External users

 Customer base user – this is a role for customers. He has access to his customer group

and corresponding orders, is able to view published files or submit new order re-

quests, actively discuss all his orders.

 Customer group leader – this is a specific role of a customer group member that has

reading access to all orders within a customer group. If granted, he should also have

the right to administrate members of his group, invite new users and revoke access

to expired users.

2.4.2 System requirements

This section describes functional requirements on a LIMS. It reflects current actions of both core facility users and customers but also desired features users picked from other web based systems.

Figure 3 depicts overall system diagram based on requirements.

9

A) System features

 Data organization – there are supposed to be two layers, first representing a customer

workgroup and under each workgroup are located individual orders. Each order

will have a private part for core users to work in and a shared part for customer to

view. The complete solution can be found in section 4.3.

 Authentication – there is a demand for users from Masaryk University to have an op-

tion to log in with their id number and university password. It would be welcome for

users from other institutions to be able to log in using the credentials they use inside

the institution they belong to. More about this can be found in section 4.5.

 Data storage – it is required to have an access to an external storage system based un-

der the Masaryk University where the data organization corresponds to the organi-

zation on LIMS. Currently, huge raw biological data are not stored internally but on

external data storage units. This data are connected to orders and should be available

from information system. More detailed information about this issue is in section 4.6.

B) User-friendly interaction

It is generally expected that users mostly have little experience with using web based la- boratory information systems, system should be easy to use, give only options the users ac- tually need. If the system is highly customizable and offers a lot of options it might be con- fusing. The goal is to trim the displayed content and interaction buttons to a needed mini- mum.

 Form submitting – customers should have an access to a form for a new request sub-

mission. Brand new customer needs to wait for a request acceptation by manager;

trusted customers should be able to proceed automatically. After fulfilling the form,

an email is sent to the manager who revises the request and if decides to accept, new

order is created.

10

 Summary features – core facility user should be able to revise most recent activity and

view summary of current issues, easily select what is yet to be done, what is he as-

signed to and how long does the order processing take.

 Collaboration system - It would be helpful to have a collaboration system for research

groups to discuss current problems and particular orders. There should be one ag-

gregating message board for each workgroup bringing together commentaries from

all underlying orders.

 Labelling – customer should be able to label his orders for better searching and view

filtering options and core users will have an overview of labels and use them for

summary and revising purposes and to aggregate information about similar biologi-

cal problems.

 Customer view – customer should have permission to view reports only if they are re-

lated to his orders.

 Automatic notifications – it is requested to have a system that sends notification emails

when update on a problem is presented or when order stays idle for a long time.

Manager should receive an email if new request for order is placed.

11

Figure 3: System interaction

12

3. CHOICE OF TECHNOLOGY

3.1 Laboratory Information and Management Systems

A Laboratory information and management system (LIMS) is in general a piece of software that helps laboratories manage the data they work with. However, laboratories differ signifi- cantly from one another what makes defining a LIMS a tricky thing. LIMS should reflect setting and needs of a particular laboratory so there is an implicit need for flexibility in ar- chitecture to support diverse data tracking needs, ability to rapidly implement detailed workflow, also to have features and functions supporting its use in regulated environments and to enable implementation of interfaces for data import and export [1].

The basic reasons for creating such software tools have emerged from the needs of large research organizations executing complex scientific studies. Researchers demand bringing together different types of data from varied sources at various stages of research. To achieve these goals it is necessary to use software tools for data management, data integration, anal- ysis and sharing.

Among basic operations LIMS should be able to take care of sample tracking, data storing, issues assigning, data operations approving, application integration and others.

Using LIMS should mainly improve efficiency, productivity, accuracy, security and quality control [9].

3.1.1 Proteomics laboratory’s reasons for LIMS

Currently, the laboratory uses information technologies mainly during analysis and report- ing phase in the form of separate pieces of software. However, growing amount of orders makes it necessary to automatize administrative operations. A lot of these must be done manually, using mainly paperwork and email communication, which makes it difficult to organize information.

There is currently no unifying system that brings together summary information about ac- tive and past orders, worker assignment, and order status.

13

There is no single data storage system for raw scientific data, measurement output files and accessible to all the core users along with final reports accessible also to corresponding customers.

Communication between workers is mainly personal or via emails so it’s not easy to effec- tively look back at a biological problem from past, revise what could have been done better and use it for a similar current problem.

The auditing of processing conditions associated with specific samples is currently not cen- tralized. Reconstructing takes additional effort.

To sum up, the main goal is to deploy a suitable LIMS. At first place to facilitate adminis- trative functions but with perspective of adding up functions specific for proteomics such as sample management or peptide identification search engine integration.

3.1.2 LIMS packages

Nowadays, the list of available LIMS packages is huge and there are many things to con- sider when choosing one for your laboratory [4]. While there are many commercial products available we wanted something open source, something we could test and customize our- selves. We didn’t want to be dependent on support from outside.

The first option to consider was LabKey Server. Workers of the facility have tried this one before and it looked promising although they never took time to study its features deeply.

They also had good references from their partner laboratory from Wien that has been using

LabKey Server for years.

We took an amount of time to study its capabilities and tried it ourselves for some basic operations. Then we did a research for available alternatives to LabKey Server and made a list of considered open source LIMS alternatives which includes:

 CAISIS - an open source, web-based, patient data management system that inte-

grates research with cancer patient care and improves effectivity of processes

necessary to document and summarize patient histories [12].

http://www.caisis.org/

14

 i2b2m - a framework, software suite, for clinical researchers to use existing clini-

cal data for discovery research and facilitate the design of targeted therapies for

patients with diseases [15].

https://www.i2b2.org/

 SIMBioMS - a customizable web-based open source software system capable of

collecting, storing, managing, importing and exporting data and information in

biomedical studies [13].

 ISA - the open source ISA metadata tracking tools for managing environmental

and biomedical experiments providing support for description of experimental

metadata with a goal to make resulting data reproducible and reusable [18].

http://www.isa-tools.org/index.html

 Intermine – an extensible biological data warehouse that integrates and makes

use of the diversity and volume of current biological data, makes them accessible

by web querying tools and open to analyze [14].

http://intermine.github.io/intermine.org/

 MISO – a LIMS specially designed for tracking next-generation sequencing ex-

periments. Offering all the basic features including user-centric access control,

monitoring and reporting of analytical processes, data visualization and notifica-

tions of status changes.

http://www.tgac.ac.uk/miso/

 Bika – an open source solution that combines web content management with

workflow processing.

http://www.bikalabs.com

The illustrative comparison of functionality of some of these platforms can be found at [3].

15

It didn’t seem these alternatives offer anything extra to the capabilities of LabKey Server.

Furthermore, one of great advantages of LabKey Server is the availability of documentation that takes into consideration the needs of developers, not only end users [2]. The more spe- cific characteristics of LabKey Server are described in the next section.

3.2 LabKey Server5

3.2.1 About

LabKey Server is a Laboratory Information and Management System web application im- plemented in Java that runs on the Apache Tomcat web server and stores its data in a rela- tional database engine. It is freely available open source software with documentation and source code available under Apache 2.0 license and is supported on computers running var- ious operation systems, including Microsoft Windows, Mac OSX and Linux.

While existing software have limitations, e.g. difficult usage outside of organizations that designed them, limited extensibility, requirement of commercial licenses, and also usually lack key features such as role-based permissions, document sharing, full-text search, dynam- ic interaction with external data sources, integration with analysis tools etc., it is welcome to have software that can be customized to meet the needs of different organizations. LabKey

Server provides APIs both for customizing interfaces and for querying data which makes it highly customizable.

Many organizations have adopted LabKey Server. To give an example, one of the largest and most significant installations is called Atlas 6 managed by the Statistical Center for

HIV/AIDS Research and Prevention (SCHARP) at the Fred Hutchinson Cancer Research

Center.

There is a list of other institutions (mainly research centers and universities) that use Lab-

Key Server available at [6] and links to showcase installations available at [7].

5 This section uses information from a general use article [16]. 6 https://atlas.scharp.org/cpas/project/home/begin.view 16

3.2.2 Main features

LabKey was primarily designed to integrate, analyze and share biomedical research data.

It provides a secure data repository and allows web-based querying, reporting and collab- orating and also supports operations related to common processes of research laboratories.

Requesting Users are able to submit requests for specimens or for experiment execution via forms and then track operations (performed by them or other users that are automatically informed by email) relevant to their initial request.

Customizable data types and data integration The diversity of scientific data types presents a challenge for integration. LabKey Server deals with this by combining rich metadata capabilities of RDF (the semantic web's Resource

Description Framework7) with the query mechanisms of a SQL database.

Resources in a semantic web are interconnected and individually described by a set of property values and uniquely identified by an URI (Unified resource locator). Data items are stored in LabKey Server according to semantic web model.

There is a set of basic data types (lists, assays, study datasets, specimens) that can be ex- tended with newly defined fields or properties related to experimental aspects of internal operations, often modified during the evolution of the project, all via graphical assay design tool. Fields may be standard data types but can also be associated with out-of-range values, indicators for missing values, validators or hyperlinks to resources. Administrators can de- fine lookup properties that allow joining of related data (similarly to foreign keys). Data type extension allows easier quality control or visualization.

Query service and visualization Visualization options are rich. LabKey Server query service can be called on by LabKey

Server's APIs or web-based interface and allows users to browse, sort and filter stored data, create custom data views, use built-in tools for chart or views. The query service supports executing of SQL queries and ability to export tabular data into formats for further analysis

7 http://www.w3.org/standards/techs/rdf 17 with external tools or for easier reporting. The query service reflects the permissions of logged users.

Search and collaboration tools A full text search is supported for most types of data, with an optional configuration to display searches from external websites. Useful web based tools for collaboration are im- plemented, including message boards, issue tracker and wiki pages.

Role-based security model Access to sensitive data is controlled by a role-based security model. Users are assigned permissions, which can be modified outside of their permissions groups, allowing detailed control of data. Sensitive information is protected no matter how it is accessed, by full text search or via LabKey Server API. Updates to settings and other actions are logged. Authen- tication of users is done either by core authenticating system or by external authenticating mechanisms. It is possible to connect to LDAP (Lightweight Directory Access Protocol) serv- er to authenticate users from within an organization or authenticate users from a partner website through OpenSSO (Single Sign-On).

Folder structure Data organization on LabKey Server is based on a tree folder hierarchy. The root level rep- resents the whole site. Then follows the project level and under each project there are al- lowed multiple folders and subfolders. Security model uses this folder structure to grant permissions.

Projects have different settings options than folders. Projects might be looked at as web- sites while folders they contain resemble webpages.

External datasets External data repositories can be dynamically accessed directly from LabKey Server, so that users work with distant data sources just the same way they work with any other data on a server. Changes in external datasets are immediately viewable in an associated LabKey

Server and for most data sources (including PostgreSQL and MySQL) changes can be also made directly using the LabKey interface.

18

Shared datasets are viewable within LabKey Server in a grid user interface and views can be customized, filtered or sorted as usual.

Extensibility As shown in figure 4, LabKey Server consists of core services (data storage, file manage- ment, security…) and specialized modules intended for specific scenarios. Modules are units of add-on functionality containing a characteristic set of data tables and user interface ele- ments.

Figure 4: LabKey Server modular architecture [5]

Modules can be added, upgraded or removed independently and can be kept private with- in the institution or contributed to the open source project. They add new functionality, support for new data types and provide integration (encapsulating) of application logic, user interfaces and data. It is possible to share data across modules.

Software exception reporting is automatically sent to LabKey Server developers without the necessity for reporting problems manually. There is also a live community forum with active users.

19

Available APIs There are client APIS available for developing scripts or programs and extending existing installation of LabKey Server.

Python and APIs offer only basic query commands (insertRows, selectRows, updat- eRows, deleteRows) while Java and JavaScript APIs are larger and give developers more options.

Java API is intended for more complicated applications and generally for applications that run on the server, not the client.

It seems like expected that for basic querying and viewing tasks developers would mostly use the JavaScript API, which contains vast functionality and, among other languages, is documented with far most precision. A lot of commented JavaScript examples and tutorials can be found in main documentation page. JavaScript API serves the purpose of interacting with LabKey Server through HTML pages embedded with JavaScript functionality. For base users this is the most easily understandable way of using LabKey Server web application.

All applications using API functions run under security level of currently logged user which, in practice, means that user can use only data he has access to even though the appli- cation is written in a general way.

There is more to the LabKey Server. Its features (specific to our solution) and problems about their usage can be found in chapter 4 where they are written about in context of func- tional requirements.

3.3 Perun8

3.3.1 Introduction

Perun9 is an identity and access management system that manages users and groups in vir- tual organizations. It was created as an answer to need for having control over access to ser-

8 This section uses information from article [17]. 9 http://perun.cesnet.cz/web/ 20 vices and to a need of assurance that users who are given rights to use services are being managed by a trusted access rights issuer.

One of its main advantages against other identity management systems is that Perun doesn’t only manage identities but manages users and their access rights to services.

It was designed to be integrated with existing systems, either systems with no access man- agement, or systems that have some, cooperate with their identity management, use existing data about users, groups and their rights, synchronize them and provide an interface for responsible users to take care of access right management.

This system is being developed under Masaryk University and has been successfully de- ployed in over 50 organizations [8]. It currently supports external authentication mecha- nisms for members of Masaryk University itself plus many Czech universities and also is capable of handling users from commercial sphere.

3.3.2 Used and supported technologies

Perun is a Java application that uses Spring10 framework for building simple and flexible

JVM-based applications, Google Web Toolkit11 for web user interface building and Jenkins12 for internal monitoring and continuous project building and testing [10].

As a backend it uses SQL database engine (Oracle and PostgreSQL are supported).

It is capable of synchronization of users and groups with external sources that use LDAP servers (or various other propagation mechanisms).

3.3.3 Features

Perun is quite developed, advanced in terms of functionality, but still represents a relative- ly simply usable general tool.

10 spring.io 11 http://www.gwtproject.org/ 12 http://jenkins-ci.org/ 21

Virtual organizations Perun doesn’t operate just with user identity; it gathers additional user information and uses it to aggregate users to groups or virtual organizations (VO). These organizations con- sist of users and assigned VO manager and have a set of rules for users to be members of a particular VO.

Manager takes care of assigning rights to use services and serves as a representative for all

VO members. This way, in case of service usage interest, users don’t need to communicate with service providers because managers take care of it for them. Perun is able to manage unlimited amount of VOs with thousands of members and services in use.

Rights delegation There are supported different manager roles, each takes care of area he is assigned to (in- troduces users, removes them, assigns lower-level managers) and decreases the amount of higher-level manager responsibilities.

Perun administrator creates VOs and declares VO managers. Every VO has a VO manager, groups inside VOs have group managers and there are also facility managers who take care of services. This rights delegation makes it possible for Perun to work on an international level.

User life cycle One of tasks that are not usually supported in simple identity management services is tak- ing care of users from a long term perspective. Perun has a mechanism that watches user status from enrollment until the account deletion. It can notify users in a defined period of time and request confirmation of active membership to VO or demand change of password.

Immediate removal of users who are no longer members of groups and don’t necessary break the rules for group membership is a responsibility of group manager.

Push mechanism Perun has implemented a push mechanism that is used to deliver new configurations to the end services. They have their own copy of configuration file stored locally and new are delivered only in a case of change. This means that services are not interrupted if Perun happens to be.

22

Perun should offer us all the functionality we need and is natively developed under Masa- ryk University. We can communicate with developers rather comfortably so we decided to try to use it for our solution. The problems that came up are written about in section 4.6.

3.4 Data storage

For raw mass spectrometry data, machine output and report files Proteomics laboratory uses Medium-sized data storage provided by Institute of Computer Science.

The capacity of this data unit is hundreds of TB. Physically, it is located in server room in

UKB, which is situated near the Proteomics Laboratory location. It is therefore accessible on rather reliable and fast network.

It is implemented as an amount of disk arrays with cluster of 3 frontends on a GPFS file system.

User and machine access to data is enabled via CIFS protocol. User identity and group management is done by Perun. For an end user, this storage behaves as a mounted disk on his computer and it is very easy to use.

23

4. DESIGN AND IMPLEMENTATION

4.1 Introduction

When considering which way to go, if trying to build a completely new system out of scratch or using existing framework, we decided that there are no capacities to create a new system in a reasonable time.

LabKey Server seemed sufficiently documented and open to developers with long history of development and vast examples of success stories.

It was also important that members of the LabKey community forum always reacted to our questions and provided rather long answers usually with multiple solutions with links to official documentation pages or, if something completely new was discussed, ideas how to approach the current problem. They also recognized when something couldn’t be done.

What seemed overwhelming was the range of functionality. We knew from the start that we wouldn’t need most of the features, because the idea was to use this server for manage- ment purposes mainly – the specialization of LabKey Server in the proteomics area is a promise to the future. After studying the documentation we came to conclusion that most of basic laboratory actions are already supported on a basic level so there is a good chance that if we decide to include them to our LIMS and modify for our needs, it wouldn’t be hard to do.

What turned out to be a subject of discussion is the extension to which tasks should be au- tomated, what should be logged and whether it automatically speeds up the laboratory pro- cess. The need to spend large amount of time in front of a computer screen clicking and fill- ing forms seemed unnecessary because lot of the required information would be used most- ly for archiving and statistical purposes.

LabKey Server (version 14.2) is installed on a virtual machine at Masaryk University. It runs on Ubuntu (an operating system fully supported by Institute of Computer Science) which brought the need to install the server manually. There is an installation package available for Windows operating system but under Linux all the parts must be installed and

24 configured separately. It is also important that LabKey doesn’t support all the versions of

Java or Apache Tomcat and it doesn’t come (unlike Windows version) preinstalled with third-party components (R, Graphviz).

LabKey configuration file needed to be configured to enable sending emails for Lab-

Key via SMTP server and SSL needed to be set up. We currently use only self-signed certifi- cate, because it is unclear what aliases would be used (in general, what would the structure of Proteomics Core Facility server machines be).

Running LabKey Server on a virtual machine brings several advantages. For example, it is relatively secured against crashing because virtual servers can be moved to a different piece of physical hardware and run there. In case of need, hardware resources might be increased.

Virtualization layer also offers backup (whole disk partitions backup is supported), in addi- tion to standard file backup using operating system.

4.2 Data model

4.2.1 Introduction

LabKey Server offers various predefined datatypes for storing data. The only problem with this is that the more advanced are specialized for certain tasks and using them for our needs would be unnecessarily complicated. For example, assays are designed for saving infor- mation from runs on machines, which is a part of proteomics laboratory workflow but it takes place outside of LabKey Server and including these results into LabKey is not expected in this early stage of server deployment.

Another predefined datatype, studies, are designed for collaborating between various labs and keeping track of changing states of subjects in time, which is again something we don’t really need.

The best in the meaning of customizability are basic lists. We can use them in a similar fashion to tables in databases and they are ideal for storing simple information, easy to up- date, modify and connect to other lists with lookup values (references to primary key of an- other list).

25

One of the problems we came into is that after defining list design it is impossible to fully edit it, even after deleting all table rows. It is possible to add validators or format options.

However, to redefine a column it is necessary to delete it and add it again with the proper- ties desired. Column redefining in not a common operation but it is necessary if we decide to change a column to lookup column (to restrict accepted values).

Otherwise, manipulation with lists via web interface or JavaScript API is quite simple and turned out to be the best solution for data to be stored in.

4.2.2 Schemas and queries

LabKey Server comes with predefined sets of lists (in LabKey terminology: queries) that are aggregated into schemas. These are important for the running the whole application. For example, there is a core schema that includes users query (a list of users), groups query; an- nouncement schema with announcement query (that stores messages) etc. These predefined queries can’t be redefined (if there are exceptions, then we are only able to add columns to these queries, never to modify or delete existing columns).

The most interesting for developers is a list schema that doesn’t include anything by de- fault, but is open to defining new lists. This is the schema where we defined our very own lists.

Visibility of schemas and queries is a specific feature of LabKey Server. It works with pro- jects and folder structure (see section 3.2.2) and each folder has its own schemas and queries

(depending on folder type), meaning that query defined in folder A might not even be visi- ble in folder B or subfolder A1, if not explicitly allowed (even if they all belong to the same project). There is lists schema in folder A and there is a lists schema in folder B, but some query X is defined only in folder A.

As a developer, you need to decide where to put your queries, because if you put it into a folder with sensitive information, users that have access only to other folders might not be able to even read it and your written applications and data views might not work correctly.

The location of our queries is presented in section 4.3.

26

4.2.3 Used data model

This section describes what queries we explicitly use for our applications and for storing data. For most basic usage purposes, predefined queries are defined sufficiently. Users, groups, announcements, logs and some other are being used as they are.

We needed to create lists for customer requests that hold information about customer con- tact details and desired type of procedure. There are three procedure types available for cus- tomers to request: Electrophoresis, Digestion and MS Analysis.

Each of this type has defined its own query which holds more detailed information about the procedure and is connected to the initial request.

Next we created a query for orders, which are connected to customer requests and are cre- ated after request acceptation. Other than that, orders contain information about order lead- er, information about samples, machines, deadlines, current status an order has and other.

Figure 5 shows ERD of requests and orders.

To provide data integrity, we turned some query columns into ‘lookup columns’ (for ex- ample column Used Machine) for users not to be able to insert incorrect option into a row.

Each of these lookup columns is represented with its own query.

This way, it is easy to maintain the available options simply by modifying query some lookup column points to.

27

Figure 5: Requests and orders data model

4.3 Site structure

This section is about the structure of LabKey Server web application from the point of view of a logged user.

4.3.1 Homepage

The homepage is available to all users. As any other folder on LabKey Server, it contains web parts and folder tabs (see figure 6). However, the content shown depends on a group the logged user belongs to. All unnecessary features (available lists web part, uploaded files web part, etc.) are hidden to customer for better orientation within the page (see figure 7).

28

The homepage serves as a portal for announcements, overall discussion, for most recent messages summary and application for customers to request new orders.

It also stores submitted requests to which core users have complete reading access while customers are able to see only requests they submitted.

Figure 6: Homepage view for administrator

29

Figure 7: Homepage view for customers

Whole project includes two subfolders:

1. Administration – a section that is only available to core facility users. It contains sum-

mary of pending requests, accepted orders summary. It is also a folder where internal

core facility users discussion takes place. At last, manager applications are also situated

here. Orders are stored here.

2. Collaboration – a section containing customer group subfolders and order subfolders

(see figure 8), granting access for customers to their corresponding folders. Customers

are hereby given an option to discuss their orders and view report files published by

core facility workers. In order folder, core facility users also see links to data linked to

the particular order and are allowed to discuss orders.

30

Figure 8: Collaboration folder organization

4.3.2 Projects vs. folders

LabKey Server differentiates between projects and folders and the main thing to keep in mind is that data in one project aren’t meant to be available to another project. After testing we came into conclusion that cross-project data are only available to site administrators what makes them practically unusable for our situation. Another thing is that each project has inner defined project groups also not visible outside of it.

In general, data in one project are accessible from all the folders of that project (if not speci- fied otherwise). Projects settings focus mainly on visual aspects while folder settings focus on permissions, available modules and web parts.

We considered making Administration and Collaboration each into a separate project

(mostly for data safety reasons and for look and feel settings) but the separation of admin-

31 istration and customer data and groups turned out to be a big disadvantage because they are both connected.

Sensitive data could be stored in a folder not accessible by customers (in our case the Ad- ministration folder). This works because from a point of view of anyone who isn’t a core facility user, Administration folder doesn’t exist.

There is also a predefined Shared project available to all users, so we thought about storing all data there, but we dismissed this idea for safety reasons (some files are not meant to be even read by customers) so only JavaScript libraries used for applications that are not packed as modules are stored here.

4.4 Users and groups

As mentioned in section 2.4.1 there are two basic groups of users – core facility users and customers. Core facility users are a relatively stable set that doesn’t change its members very often. On the other hand, customers come and go – a contract with new group might be es- tablished, members of particular groups end might their activity, new members join existing groups. We expect tens of customer groups. Each customer group should have a group leader who is responsible for managing his group which mainly means that he grants access to the customer group folder or to specific order folder. The responsibility for granting per- missions to customer data is delegated from core facility users to customer group leader.

LabKey Server differentiates groups on three levels:

- Global groups: Site administrator, Developer, Site user, Guest;

- Site groups: any defined group on a site level;

- Project groups: groups visible only inside corresponding project.

This works fine. It allows us to create all the groups with all the rights we need. Rights and group memberships can be checked also via JavaScript API so we are able to write an HTML page with sections hidden or not loaded to users who don’t have the rights to see them.

We found out it is necessary to keep track of external users’ customer group membership.

This lead to the only case we added a column to the query in core schema. To be specific, we added a column ‘customer group’ to user query because we expect that external user be- longs to not more than one customer group.

32

4.5 Authentication

The amount of LabKey Server users is not final. It is expected that new customers will come, either to join existing workgroups or to create a new one. Students might want to log in to system as observers just for study reasons. Therefore it is good to have an external au- thentication system that is capable of identifying users based on their existing accounts and taking away the necessity to add each one manually.

We expect to give this option to users from Masaryk University, other universities of Czech

Republic, The Academy of Sciences of the Czech Republic, European universities and also users from commercial sphere.

The idea is for new users to be automatically given a membership to New External Users group. For security reasons no further automated action is performed. It is a responsibility of a content administrator to consequently grant permissions to these users. To do this, addi- tional information about users is required, at least their membership to groups in existing institutions. It is not expected this action might be automatized because there usually are no rules upon which to decide where the external user belongs to. It doesn’t necessarily means that a person who actually works in a firm we registered as a customer group on a LabKey

Server, must automatically belong to the corresponding customer group on our server. He usually belongs but there is no rule.

For start, only users from Masaryk University are given the option of external authentica- tion.

This solution that puts responsibility for group management on content administrators is secure but isn’t ideal because content administrators need to take care of each externally logged user individually.

We decided to delegate customer group management to customer group leaders. Only they will be responsible for whom they give access to their data (they have only read access to their folders anyway so there would be no modification) and content administrator will still have right to intervene though he wouldn’t be expected to.

This is a better way to go for, but with this new problems arise. We created a simple appli- cation that uses JavaScript API for customer group leader to be able to manage their groups.

33

The API functions we used are createUser() and createGroup() and removeGroupMembers() and as it turned out, only administrators can use them. This solution is not usable for a sim- ple reason – we cannot give administrator privileges to outer users, even if they are custom- er group leaders.

The next idea is based on a fact that we can use Perun and that LabKey Server gives an op- tion of external authentication using LDAP server. As mentioned in section 3.3, identity and access management with Perun is quite developed and should offer us everything we need.

The solution is designed like this: User group management is taken care of on a Perun side and new configurations are pushed to the LabKey Server for synchronization.

To be more specific, every customer group leader is responsible for granting permissions for his coworkers and does this on Perun side via Perun web interface. He decides whom he exposes the data that belong to customer group to, who he invites to his group and who he deletes from his group. Which side takes care of what is displayed in table 1.

Perun LabKey

 LabKey represents a single virtual or-  Authentication is done via Perun LDAP,

ganization,  A slave script needs to be set up; this

 To accept user into LabKey, name, email script takes care of creating, modifying

and home organization is required, and deleting users and groups in Lab-

 User authentication via eduID.cz 13 or Key.

social identities,

 Every user needs to setup a username

and password for logging into LabKey

Server.

Table 1: Collaboration of Perun and LabKey on authentication process

13 eduID.cz is an academic identity federation that provides its users a framework for mutual identity usage for network service access while respecting personal information protection. 34

The problem with this solution is that LabKey doesn’t provide API for remote user or group creation. The only way that seemed usable is to use Python API function inser- tRows(), deleteRows() and updateRows() to manually manipulate with core schema queries

(users and groups) but LabKey has these queries locked on read-only mode and enables manipulation only via their API functions. After consultation with one of the LabKey Server developers, manipulating with users table using Python is possible, but not via the functions mentioned. It is possible if we copy the pattern the Java and JavaScript API functions for creating users, updating group memberships, etc. are implemented. They communicate with

LabKey Server with passing JSON objects. This option seems doable and worth testing.

The next option would be to rewrite parts of LabKey source code written in Java. The source code is publicly available also with instruction how to but this is a very complicated process that will take a long time to set up. This should be considered as a last resort.

To sum up, for start, external users must be dealt with individually, but we believe that the way of managing users and groups on Perun side and pushing changes to LabKey Server with newly designed functions written in Python will require minor development effort.

4.6 File storing

Laboratory machines produce large raw data files and this data are saved on data storage device in a poor organization style with large amount of folders and files and practically no access rights management. There is currently no way to access this data from LabKey Serv- er’s web user interface.

The idea is to synchronize structure of stored data with the Collaboration folder structure on LabKey Server (see figure 8). Users would be able to access the data from LabKey Server via a link while all the data stays stored externally. This means large raw biological data.

There is a possibility that this data will be looked at some time in the future so it should be stored at least for several years. Report files may stay saved on a server’s file system, this depends on their amount and size in the nearest future.

35

LabKey supports externally-defined schemas integration provided they are managed on a database server (MySQL, PostgreSQL...). Synchronization with external source shouldn’t be a problem.

By default files are saved in LabKey installation folder according to their absolute site loca- tion but destination can be overridden to other location on a file system or externally to mounted disks.

There are several ways to upload files to LabKey Server. The most common is uploading via graphical user interface. Files can be uploaded individually but there is also an option of

Drag-and-drop multiple files to target area.

LabKey Server also supports WebDAV file transfer after synchronizing local repository with LabKey Server’s repository. To do this, you need to use third-party software or, after proper configuration, you can also use certain file browsers like Windows Explorer or Mac

Finder.

The last option is to set up a data processing pipeline. This option is meant for large or re- mote data stores. The pipeline can be pointed to a desired location and there it scans changes and uploads new content if there is any. It is also capable of automatic importing of files

(usually produced by machines).

There is a difference between stored files and imported ones. Stored files can be viewed, deleted or downloaded but their content cannot be manipulated with. To do this, user needs to import them first. LabKey is able to import .xlsx, TSV or CSV files into its internal data- base and display them in a grid and make them accessible with API functions.

This is useful when loading an amount of data, for example when moving to LabKey Serv- er for the first time with all the data gathered for years.

4.7 Collaboration system

Communication between users is realized by LabKey modules: message boards, issues list and in some cases it is possible to use wiki pages.

36

Wiki pages are good mostly for one time announcements or reports because of their rather static character.

Message boards are simple but effective for creating discussion threads and inserting new messages. They are customizable. User can define whether he wants to receive email notifi- cation about new messages, set priority, set mailing list, formatting.

Issues are similar to messages but designed with priority and responsible user indicator, history of operations and statuses whether the issue is open or resolved. However, there is a limited amount of user defined fields and these fields cannot be used as lists (no lookup available) so if we cannot link them to existing lists.

We decided to include priority, status and responsible user fields into our orders list. It is very similar to system that worked in proteomics facility before. The history of operations with individual lists (in our case orders list) is being logged and we only need to extract this information from audit table with JavaScript application.

We developed an application (depicted in figure 9) that gathers information about discus- sions across more folders and aggregates them into one view grid with links to folders where messages were posted in. This works very well as long as user doesn’t discuss lists or list items or even pages. This is a feature of LabKey Server and in this case the link doesn’t point to the corresponding item, but includes unique entity id which needs to be further resolved by developed application.

Figure 9: Most recent messages

37

We also implemented email notification system into some JavaScript applications and we are able to send emails after events e.g. send a mail to manager after new request was placed.

4.8 Applications

4.8.1 Introduction

When browsing LabKey Server folders via graphical user interface, user sees web parts that offer him desired functionality.

As written in section 3.2.2, LabKey Server provides a well-documented JavaScript API. We decided to use it to build applications in form of enhanced HTML webpages. It is sufficient, because API functions are so well designed that this way we are capable of creating various types of applications. This webpages are then displayed as web parts and there is also an option to include a chosen web part in a webpage. One of great advantages is that permis- sions within webpages can be controlled with API functions, which means that pages be- have dynamically depending on a logged user. LabKey Server only supports permission settings on whole web parts and simply hides web part or shows it depending on user rights. No other option. With our approach, we can display text showing why desired func- tionality cannot be provided or offer him something else (e.g. trimmed version of applica- tion).

To get even more developer and user comfort we also included jQuery library and for some special functionality (datepicker widget popup when user chooses deadline for an order) also jQuery UI library.

The second way we went when creating applications was to pack them into modules.

These modules in most cases also contain webpages plus needed libraries or other metadata.

Applications packed as modules are more portable than ordinary HTML pages but for our purposes there isn’t a great difference. We don’t expect to port our applications. We simply put all needed libraries to the homepage folder of our project where they are accessible. If porting turns out to be necessary, conversion to modules is rather simple. Modules can be

38 activated as web parts so for an end user there is no visible difference between using en- hanced HTML pages or modules.

4.8.2 General issues

Developing in JavaScript brought some problems. The biggest was the fact, that after writ- ing code that uses LabKey’s API there was no comfortable way to run the code or to check whether it works.

There is an internal editor available as a part of web user interface but its features are poor.

We used third party software for developing. Because there is no way to connect to LabKey remotely that we know of, code writing was done in an external editor while code running must have been done in an internal editor.

Another complication was that many API functions included asynchronous calls, which is a JavaScript concept not very easy to get used to. If a series of asynchronous calls were exe- cuted in a row, it made the code hard to read.

4.8.3 Types of applications

Forms

For defined lists, LabKey Server provides an interface for new record adding. User fills in columns and if puts an incorrect value, he gets validation errors. Successful record adding or error reporting are the only actions that are performed after submitting desired values.

What if we want to send an email with information about added record to a user that is re- sponsible for particular lists managing? What if we want to hide some columns or fill select- ed columns automatically with a value that depends on a logged user (e.g. his email ad- dress)? To provide enhanced functionality it was necessary to build form applications.

A) Entry form

Entry form is the form that customer needs to fill in if he wants to send an official request to the Proteomics laboratory. As depicted in figure 10, fields containing user name and email are prefilled.

39

User needs to choose the type of procedure he wants the laboratory to perform. After click- ing on the checkbox with the procedure type (in this case, user chose digestion), new form with additional information appears and the ‘Types’ field is automatically populated accord- ing to user’s choices of types.

The submit button doesn’t only submit filled information, it also sends email with new re- quest submission to manager and redirects user to a confirmation page where he is notified about the succession of his actions.

More detailed order request use case can be found in section 5.2.

Figure 10: Entry form

40

B) New order placement

New order placement is a form intended for managers. It is used after accepting customer request, when manager wants to create a new order, assign it to core facility user, set a dead- line or any other details (see figure 11).

After placing an order, it becomes open and assigned user gets an email notification to know that he should start working on it.

Figure 11: Place new order

41

For working with forms we have developed a JavaScript library that uses LabKey Server’s

JavaScript API and supports various operations. We are able to hide or prepopulate inputs, attach a jQuery UI datepicker to inputs that are of a date type, extract input fields from user- defined lists, add records to these lists, or only update existing records.

Folder types

There are several folder types available from the beginning. Folder type defines what web parts, what and how many tabs will be prepared after folder creation and what modules will be available to use in that folder.

For example, one of predefined folder types is called MS 2. In figure 12 we can see the web parts included in MS2 folder type.

Figure 12: MS2 folder type

When creating a ready-to-use folder programmatically it is not welcome to create a prede- fined type and then modify it manually. The desired folder should be prepared right after creation in a form we want.

For example, if we want to create a folder for new order, we want to include web part for discussion and web part for current problems following. Figure 13 depicts what user should see after navigating to a folder that was created after accepting an order with number 7.

42

Figure 13: Order folder type.

Folder types must be defined as modules. Folder type modules don’t use HTML. The most important file in the module is a formatted xml file that defines the folder type itself, which means that it defines what web parts are to be present, which wouldn’t be removable, or which other modules are to be available for that folder type.

Overviews

When user wants to work with a tabular view of a list, he usually works with a query view web part. LabKey Server offers predefined module for query viewing that displays data in a grid with typical buttons for row manipulating (row inserting, deleting or updating) or with filtering and sorting options. This is sufficient for basic use only.

If we want to have a custom button to which we attach our own function capable of any- thing JavaScript API offers, we need to create our own view grid with buttons we defined.

Creating of user defined query views isn’t complicated, thanks to rich API.

There is a set of applications available only to manager. This includes mostly various over- view applications (for request acceptation/rejection, order assigning, order priority updating etc.).

43

A) Manage pending requests

Pending requests overview application (see figure 14) has a defined buttons for re- quest/accept rejection and selecting a row with the request and rejecting/accepting it exe- cutes a sequence of steps.

In case of request rejection, at first the status of selected request is checked, whether it wasn’t already rejected. If not, then its status is set to ‘rejected’. The next step is sending an email to the email address linked to the selected request. After that, a pop-up alert about successful rejection appears. At last, the data grid refreshes and new state of request is dis- played.

Request acceptation is described more detailed in section 5.2.

Figure 14: Manage pending requests

B) Manage orders

This is another manager application (depicted in figure 15) for manipulating with orders.

Manager can see a list of current orders, send alert email about upcoming deadline to a user responsible for particular order, he can also change priority of an order, display history of operations with orders and he can also initiate new order by completing the order automati- cally generated after request accepting.

44

Figure 15: Manage orders

The Most recent messages summary mentioned in section 4.7 is an example of overview with no custom buttons but it’s a summary we wouldn’t be able to put together just with a predefined query view web part because we needed to define what folder to start the extrac- tion of messages from and in this particular case, it’s not even a folder from inside of a pro- ject which wouldn’t be otherwise possible because query view web part’s scope is the cur- rent project.

To remove all standard buttons from an overview and use our own is a good way to keep displayed data secure or to have all operations under control (thanks to defined attached functions).

45

5. INTERACTION WITH IMPLEMENTED SYSTEM

5.1 Introduction

Users interact with LabKey Server via web application. After successful login they are redi- rected to homepage. After that they can perform various actions depending on their role in the system. Figure 16 shows a diagram of user interaction with LabKey Server. All of these actions take place inside LabKey Server.

Figure 16: Use case diagram

46

5.2 Typical system use case scenarios

A) Request order

Object: Customer wants to submit a request

Primary actor: User (customer)

Precondition: User has an account to log in to LabKey Server

Postcondition: New request record is created

Basic flow of events:

1. User logs in to LabKey Server

2. User is redirected to homepage

3. User navigates to the project folder

4. User selects ‘Request an Order’ tab

5. User is redirected to page with entry form

6. System prefills identification fields (name, email) based on the logged user

7. User fills in basic info (address, phone)

8. User chooses the type of procedure (multiple choice allowed)

9. System shows additional input fields based on user’s choice

10. User fills in additional fields. E.g. for digestion these fields include concentration, hy-

dration, temperature, etc. (see figure 10)

11. User clicks on submit button

12. System creates a request record in database

13. System sends email to manager

14. User is redirected to ‘Your order was accepted’ page

Alternate flows

Condition: At least one requested field is not filled OR at least one field is filled incorrectly

11.1 System alerts user about the field that didn’t pass the conditions

11.2 User stays on the page and input fields are filled in the way he filled them

Condition: Record creation failed

47

12.1 System alerts about the problem

12.2 User stays on the page and input fields are filled in the way he filled them

Condition: Email sending failed

13.1 System alerts about the problem

13.2 User stays on the page and input fields are filled in the way he filled them

B) Accept request

Object: Manager wants to accept a request

Primary actor: User (manager)

Precondition: User needs to be manager

Postcondition: Request is accepted and new order is created

Basic flow of events

1. User logs in to LabKey Server

2. User is redirected to homepage

3. User navigates to the project folder

4. User clicks on Administration subfolder shortcut

5. User is redirected to Administration subfolder

6. User clicks on ‘Manage Pending Requests’ item from list of pages (other pages repre-

sent different manager applications)

7. User is redirected to the page with a list of requests (requests that customers submit-

ted)

8. System displays only requests with pending status

9. User selects a row with a request

10. User clicks on a button labelled ‘Accept’

11. System sends an email to the email address linked to the request

12. System creates a customer group folder based on the user linked to the request (system

gets his customer group membership)

13. System creates an order subfolder (the name of the subfolder equals the number of or-

der)

48

14. System changes status of request to ‘Accepted’

15. System alerts user about successful acceptation with a pop-up window

Alternate flows

Condition: Request has already been accepted

10.1 System alerts user about the problem

10.2 System unselects selected row

10.3 System reverts to state in step 8

Condition: Order subfolder already exists

13.1 System alerts user about the problem

13.2 System reverts to state in step 8

49

6. CONCLUSION

Deploying a laboratory information and management system is very complex work. Lot of time was spent discussing with different groups of people. We held meetings with members of Proteomics laboratory to establish their situation and their expectations. We also needed to maintain contact with members of Institute of Computer Science about setting up the vir- tual machine where LabKey Server now runs. We talked with developers of Perun about possibilities of external authentication. All this resulted into a vision of a system for labora- tory to be content with.

We managed to analyze the needs of Proteomics laboratory and came up with a detailed report that includes required functionality and expected user roles. Then the necessary tech- nology to achieve a success in fulfilling those requirements was described. A list of usable laboratory information and management software solutions was created and then we fo- cused on the one we considered the most suitable (LabKey Server).

We studied its rich features, used it for testing purposes and presented design decisions that would make it even more suitable for our situation. These decisions include data model, application development and system customization.

LabKey Server was installed on a virtual machine and then it was configured according to the results of the analysis. We implemented a number of applications with additional func- tionality to facilitate work with LabKey Server.

Then we dedicated our time to learn about user and group management possibilities and determined what we consider the best possible way the integration with identity and access management system should go. Now it’s time to continue with the collaboration with Perun developers.

Integration with external data storage, adapting its folder structure with folder structure on LabKey Server as we designed it, requires cooperation with workers responsible for tak- ing care of laboratory machines. The access management to this data can be resolved with

Perun. We expect integration with Perun for the authentication purposes so once that would be done, synchronizing user rights with external data storage should be a matter of rather simple configuration.

50

The applications we created to facilitate work with LabKey Server were presented and problems that arose during development using JavaScript API were talked about. We pre- sented use case diagram for common actions that users would perform and use case scenar- ios to describe the examples of such actions in more detail.

LabKey Server supports including search engines like Mascot or Sequest [11]. Considering the fact that Proteomics laboratory already uses Mascot, it would be interesting to discuss its integration into the information system.

It is also interesting to consider the possibilities of implementing the sample evidence mechanism or to study the option of including another LabKey Server’s feature, Data Pro- cessing Pipeline, which automatically uploads and imports files of unified format to the

LabKey Server.

There is still work to be done but we believe we made a successful first step in deploying a system that has the potential of being a great help for future achievements.

51

BIBLIOGRAPHY

[1] 2011 Laboratory information management: So what is a LIMS? [online]. [Cited: 27

Dec, 2014]. Available at: http://sapiosciences.blogspot.com/2010/07/so-what-is-lims.html.

[2] Development [online]. [Cited: 2 Jan, 2015]. Available at:

https://www.labkey.org/wiki/home/Documentation/page.view?name=dev].

[3] Feature tradeoffs between platforms [image]. [Cited: 10 Nov, 2014]. Available at:

http://www.biomedcentral.com/1471-2105/12/71/table/T1.

[4] How Do I Find the Right LIMS — And How Much Will It Cost? [online]. [Cited: 27

Dec, 2014]. Available at: http://sapiosciences.blogspot.com/2010/07/so-what-is-lims.html.

[5] LabKey Server's modular architecture [image]. [Cited: 10 Nov, 2014]. Available at:

http://www.biomedcentral.com/1471-2105/12/71/figure/F2.

[6] Platform Users [online]. [Cited: 10 Nov, 2014]. Available at:

https://www.LabKey.org/wiki/home/Documentation/page.view?name=LabKeyServerUsers.

[7] Showcase of Installations [online]. [Cited: 10 Nov, 2014]. Available at:

https://www.LabKey.org/wiki/home/Documentation/page.view?name=showcase.

[8] Success stories [online]. [Cited: 16 Dec, 2014.]. Available at:

http://perun.cesnet.cz/web/success-stories.shtml.

[9] Summary [online]. [Cited: 6 Jan, 2015]. Available at:

http://www.bikalabs.com/whylims/summary.

[10] Technical details [online]. [Cited: 30 Dec, 2014]. Available at:

http://perun.cesnet.cz/web/techdocs.shtml.

[11] Eckels, J., Hussey, P., Nelson, E. K., Myers, T., Rauch, A., Bellew, M., Connolly, B.,

Law, W., Eng, J. K., Katz, J., McIntosh, M., Mallick, P. and Igra, M. 2011. Installation

and Use of LabKey Server for Proteomics. Current Protocols in Bioinformatics. 36:13.5.1–

13.5.25

[12] Fearn P, Sculli F. The CAISIS Research Data System. In: Biomedical Informatics for Can-

cer Research. Springer US; 2010:215-225.

[13] Krestyaninova M, Zarins A, Viksna J, Kurbatova N, Rucevskis P, Neogi SG, Gostev

M, Perheentupa T, Knuuttila J, Barrett A, Lappalainen I, Rung J, Podnieks K, Sarkans

52

U, McCarthy MI, Brazma A. A System for Information Management in BioMedical Stud-

ies—SIMBioMS. Bioinformatics 2009, 25:2768-2769.

[14] Lyne R, Smith R, Rutherford K, Wakeling M, Varley A, Guillier F, Janssens H, Ji W,

Mclaren P, North P, Rana D, Riley T, Sullivan J, Watkins X, Woodbridge M, Lilley K,

Russell S, Ashburner M, Mizuguchi K, Micklem G: FlyMine: an integrated database for

Drosophila and Anopheles genomics. Genome Biol 2007, 8:R129.

[15] Murphy SN, Mendis M, Hackett K, Kuttan R, Pan W, Phillips LC, Gainer V,

Berkowicz D, Glaser JP, Kohane I, Chueh HC. Architecture of the Open-source Clinical

Research Chart from Informatics for Integrating Biology and the Bedside. AMIA Annu

Symp Proc 2007, 2007:548-552

[16] Nelson EK, Piehler B, Eckels J, Rauch A, Bellew M, Hussey P, Ramsay S, Nathe C,

Lum K, Krouse K, Stearns D, Connolly B, Skillman T, Igra M. LabKey Server: An open

source platform for scientific data integration, analysis and collaboration. BMC Bioinfor-

matics 2011 Mar 9; 12(1): 71.

[17] Procházka M, Licehammer S, Matyska L. Perun – Modern Approach for User and Service

Management. Mauritius: IIMC International Information Management Corporation

Ltd, 2014.

[18] Rocca-Serra P, Brandizi M, Maguire E, Sklyar N, Taylor C, Begley K, Field D, Harris

S, Hide W, Hofmann O, Neumann S, Sterk P, Tong W, Sansone S. ISA software suite:

supporting standards-compliant experimental annotation and enabling curation at the com-

munity level. Bioinformatics 2010, 26:2354-2356.

53

ATTACHMENT

Online archive

Online archive consists of:

 Diploma thesis in .pdf,

 readme.txt – a text file with instructions for using attached source code files,

 Source code files:

o .js libraries,

o LabKey Server modules,

o HTML pages enhanced with JavaScript.

54