USOO8856169B2
(12) United States Patent (10) Patent No.: US 8,856,169 B2 Zhang et al. (45) Date of Patent: Oct. 7, 2014
(54) MULTI-MODALITY, MULTI-RESOURCE, USPC ...... 707/622, 732–734, 783 784, 941; INFORMATION INTEGRATION 705/2,3; 706/47, 50 ENVIRONMENT See application file for complete search history. (75) Inventors: Guo-Qiang Zhang, Orange, OH (US); Remo Sebastian Wolfgang Mueller, (56) References Cited Cambridge, MA (US); Jacek U.S. PATENT DOCUMENTS Szymanski, Brooklyn, OH (US); Adam Troy, Bothell, WA (US); David L. 7,006,881 B1* 2/2006 Hoffberg et al. ... 700.83 Wilson, Cleveland Heights, OH (US); 7.054,823 B1* 5/2006 Briegs et al...... TOS/2 r s 7,493.265 B2 * 2/2009 Fagan et al...... TOS/3 Chris A. Flask, Avon Lake, OH (US); 8.332.517 B2* 12/2012 Russell ...... TO9,226 Raymond F. Muzic, Jr., Mentor, OH 8,516.266 B2 * 8/2013 Hoffberg et al...... 713/189 (US) 2002/0007294 A1* 1/2002 Bradbury et al...... 705/7 (73) Assignee: Case Western Reserve University, (Continued) Cleveland, OH (US) OTHER PUBLICATIONS (*) Notice: Subject to any disclaimer, the term of this Szymanski, Jacek, Troy, Adam, Zhang, Q-Q, MIMI: Integrated patent is extended or adjusted under 35 Multi-modality Information Management for Biomedical Cores, U.S.C. 154(b) by 71 days. Case Western Reserve University, p. 1. (21) Appl. No.: 13/548,752 (Continued) (22) Filed: Jul. 13, 2012 Primary Examiner — Frantz Coby (65) Prior Publication Data (74) Attorney, Agent, or Firm — McDonald Hopkins LLC US 2013/OO9117OA1 Apr. 11, 2013 (57) ABSTRACT Related U.S. Application Data A multi-modality, multi-resource, information integration (60) Provisional application No. 61/507,408, filed on Jul. environment system is disclosed that comprises: (a) at least 13, 2011 s Y Y-1s one computer readable medium capable of securely storing s and archiving system data; (b) at least one computer system, (51) Int. Cl. or program thereon, designed to permit and facilitate web G06F 7/30 (2006.01) based access of the at least one computer readable medium G06F2L/30 (2013.01) containing the Secured and archived system data; (c) at least (52) U.S. Cl one computer system, or program thereon, designed to permit CPC G06F 2 1/30 (2013.01) and facilitate resource scheduling or management; (d) at least USPC. 707f769.707,772.707,770.70783. one computer system, or program thereon, designed to moni ------707/784. 707/709. 705/2. 7053. 706/47. tor the overall resource usage of a core facility; and (e) at least s s s 70650 one computer system, or program thereon, designed to track (58) Field of Classification Search regulatory and operational qualifications. CPC ...... G06F 17/30643; G06F 17/30873; G06F 19/32: G06F 19/3487 28 Claims, 12 Drawing Sheets
Sleep Researcher. Domain. Expert
Query Builder seriorkspace Query Explorer : to login w8 Sis: all 38: y US 8,856,169 B2 Page 2
(56) References Cited Mueller, Remo, Sahoo, Satya, Dong, Xiao, Redline, Susan, Arabandi, Sivaram, Luo, Lingyun, Zhang, GQ, Mapping multi-insti U.S. PATENT DOCUMENTS tution data sources to domain ontology for data federation: the Physi oMIMI approach, pp. 1-2, Division of Medical Informatics, Case 2006/0184493 A1* 8, 2006 Shiffman et al...... TO6/47 Western Reserve University, Division of Sleep Medicine, Harvard 2008/O133270 A1* 6/2008 Michelson et al...... 705/2 University. 2010/0228699 A1* 9, 2010 Webber et al...... 707,622 Mueller, Remo Sebastian, Ontology-Driven Data Integration for 2011/02251 14 A1* 9, 2011 Gotthardt ...... TO6.50
Clinical Sleep Research, pp. 1-196, Case Western Reserve Univer 2013/0185096 A1* 7, 2013 Giusti et al...... 705/3 sity. 2013/0304878 A1* 11, 2013 Russell ...... 709.220 Zhang, GQ, Mueller, Remo, Johnson, Nate, Luo, L., Dong, Xiao, 2014/0089241 A1* 3/2014 Hoffberg et al...... TO6/14 Redline, Susan, Sahoo, Satya; pp. 1-2, Division of Medical Informat OTHER PUBLICATIONS ics, Case Western Reserve University, Division of Sleep Medicine, Harvard University. Mueller, Remo, Tran, Van Anh, Zhang, Guo-Qiang, A Scalable Para Szymanski, Jacek. An Integrated Data Management System for Bio metric-RBAC Architecture, Case Western Reserve University, p. 1. medical Core Facilities, Jul. 27, 2006, pp. 1-20, Case Western Sandberg, Neil, Implementation of the Clinical Research Query Reserve University. Interface Visage, May 2010, pp. 1-138, Case Western Reserve Uni versity. * cited by examiner U.S. Patent Oct. 7, 2014 Sheet 1 of 12 US 8,856,169 B2
search for profile Ce profile exists request periding profile profile exists
display profile - - - - - profile - - - - -
pending profile
criteria met
criteria not met
FIGURE I U.S. Patent Oct. 7, 2014 Sheet 2 of 12 US 8,856,169 B2
FIGURE 2 U.S. Patent Oct. 7, 2014 Sheet 3 of 12 US 8,856,169 B2
no selection request pending project
project selected
criteria met approve project
criteria not met
FIGURE 3 U.S. Patent Oct. 7, 2014 Sheet 4 of 12 US 8,856,169 B2
completed sessions scheduled sessions
\?omeone compile statistics /display schedule)N N - N -1 -- : :
?chedule- sessionN tempty time-slot (valida session)session / fill-out billing N \ / \ details / invalid session
-- A. w ( cancel session) ; N / :
y
------> session ------
FIGURE 4 U.S. Patent Oct. 7, 2014 Sheet 5 of 12 US 8,856,169 B2
internet PC Data serverC 12 (5)
Workstation PC (3)
year H-month-day - P ->system--time ------i
FIGURE 5 U.S. Patent Oct. 7, 2014 Sheet 6 of 12 US 8,856,169 B2
Clinical Clinical Investigator Investigator
ata Analyst Data Manager Database
2 : 2 Od Data Analyst Database Data Manager
FIGURE 6 U.S. Patent Oct. 7, 2014 Sheet 7 of 12 US 8,856,169 B2
Italitical rtir rary if 3rtriggest
FIGURE 7 U.S. Patent Oct. 7, 2014 Sheet 8 of 12 US 8,856,169 B2
is first. . . . is
FIGURE 8 U.S. Patent Oct. 7, 2014 Sheet 9 of 12 US 8,856,169 B2
Query WSAGE 22 Web Ciet # of clicks time (sec.) clicks time (sec.) 1 5 13 14 59 2 6 16 25 119 3. 2O 52 37 16)
FIGURE 9 U.S. Patent Oct. 7, 2014 Sheet 10 of 12 US 8,856,169 B2
Sleep Researcher) Domain Expert) Informatician
Query Builde -
to workspace Query Manager x DB-Ontology Mappe Query Explorer eason Login 88 Secure
airy
& & s & & &
&:38.8888. 8&ssess is assassiss & 88.8.8& .38 S3 isiaevia issi: siasis:
FIGURE 10 U.S. Patent Oct. 7, 2014 Sheet 11 of 12 US 8,856,169 B2
is ::::::::::::::::::::::::::::: is segs see: sees s.si. . . . .
FIGURE 11 U.S. Patent Oct. 7, 2014 Sheet 12 of 12 US 8,856,169 B2
document... observe ("dom: loaded", function () { war container = S (document body) if (cottainer) { container ... observe ( click , function (e) { war el = e. element () if (el. match ( , pagination-aijax a.)) { new Ajax. Request (el. hire f {method: post
parameters: S(7 search-form ' ). serialize () }) e stop () } }) } })
FIGURE 12 US 8,856,169 B2 1. 2 MULTI-MODALITY, MULTI-RESOURCE, tent data. When data are kept in disconnected systems, infor INFORMATION INTEGRATION mation Such as a principal investigator's profile and projects ENVIRONMENT may have to be reentered multiple times to multiple systems, making it difficult to maintain and update. Repetition in data RELATED APPLICATION DATA entry not only requires additional effort, but it also opens more room for errors and inconsistencies: the same entities The present application is a claims priority to U.S. Provi may have been entered using different names in different sional Patent Application No. 61/507,408, filed Jul. 13, 2011, systems, and changes made in one system may not automati the entirety of which is hereby incorporated by reference in its cally propagate to other systems; and (iv) lack of Support for entirety herein. 10 the integration of information from disparate resources. Access to data and knowledge is often labor-intensive, repeti FIELD OF THE INVENTION tive, disorganized, and burdensome; project management and This application relates to a multi-modality, multi-re data analyses are tasks relegated to individual investigators 15 without a common framework or standard for record keeping Source, information integration environment. or for sharing and collaboration using intermediate results. BACKGROUND OF THE INVENTION The root cause for these deficiencies can be summarized as a lack of a holistic approach to infrastructure Support. Given Modern biomedical research is inherently multi-leveled the challenges encountered by imaging and other kinds of and multi-disciplinary. To facilitate this research, core facili core facilities, an approach that captures a vision for a long ties bring the latest imaging and Scanning technologies to the term solution and addresses some of the immediate needs is research community and Support many projects simulta desirable. The present multi-modality multi-resource infor neously. However, they often do so in the midst of significant mation integration environment (“MIMI) not only addresses information management challenges unforeseen at their some of the needs and provides a flexible and expandable inception, such as: (a) effective and efficient distribution of 25 Solution to the challenges mentioned above, but also provides acquired scientific data from a core facility to its investiga a foundation for a more advanced system that Substantially tors; (b) timely sharing of raw, primary, and curated data for integrates existing knowledge with analyses and curation of collaborative activities; (c) optimized scheduling and experimental data. resource usage; (d) management of experimental workflow, The query interface is increasingly recognized as a bottle e.g., multiple related steps in one-time or longitudinal studies; 30 neck for the rate of return for investments and innovations in (e) management of administrative workflow, Such as tracking clinical research. Improving query interfaces to clinical data of material cost, staff times spent on sample preparation and bases can only result from an approach that centers around the data acquisition, and billing and accounting, (f) monitoring of work requirements and cognitive characteristics of the end the overall resource usage of a core facility, by compiling, user, not the structure of the data. To date, few interfaces are e.g., a profile of usage statistics of equipment and types of 35 usable directly by clinical investigators, with the i2b2 web involved projects; and (g) coherent and common access point client a possible exception. Aspects of query interface design for data analysis workflow, linking raw data and/or primary that facilitate its use by investigators include query-by-ex data with results from analyses, reports, images, and refer ample, tree-based construction, being database structure ences, and comparing with related results from existing data agnostic, obtaining counts in real time before the query is bases and literature. 40 finished and executed, and saving queries for reuse. There are currently no comprehensive software systems Unlike previous art Phyiso-MIMI develops informatics addressing these challenges as a whole (Siemens’ MIPortal tools to be used directly by researchers to facilitate data focuses on improving the management of experimental work access in a federated model for the purposes of hypothesis flow for proteomics research and does not address adminis testing, cohort identification, data mining, and clinical trative issues). Deficiencies with the existing infrastructure 45 research training. In order to accomplish this goal a new are often manifested in: (i) Substantial administrative and approach to the query interface was necessary. personnel overhead. This exists in pen-and-paper-based record keeping aided by disconnected spreadsheet programs, BRIEF DESCRIPTION OF THE DRAWINGS manual management of scheduling on a common off-the shelf calendar system that operates in isolation, using por 50 Embodiments of the present disclosure are described table media for data transport, and relying on e-mail commu herein with reference to the drawings wherein: nication to gather a variety of project related information. FIG. 1 illustrates a user profile model; Some centers operate under an information technology (IT) FIG. 2 illustrates a UML object diagram of Plone's objects infrastructure resulting from adopting/adapting existing and their inheritance relationships; open-source/in house/commercial Software for managing a 55 FIG. 3 illustrates a project model; variety of data, although this only reduces the problem to the FIG. 4 illustrates a session model; equally, if not more, challenging issues of information inte FIG. 5 illustrates a summary of the data flow process; gration, interoperability, and resource for IT personnel Sup FIG. 6 illustrates the evolution of the data access paradigm: port; (ii) lack of Support for collaboration among researchers. FIG. 7 illustrates an example of a Query Builder interface; The disintegration of administrative and scientific data makes 60 FIG. 8 illustrates an example of a Query Explorer interface; it difficult to access data and find information about related FIG. 9 illustrates a table detailing a preliminary evaluation prior studies. Collaborating researchers must then rely on ad performed on the efficiency of VISAGE for query construc hoc mechanisms such as email communication to share data tion; and results. This not only makes the bookkeeping of data a FIG. 10 illustrates one embodiment of the conceptual chore, but it also lacks a uniformly enforceable standard for 65 architecture of Physio-MIMI; the safety of valuable data and results from analyses; (iii) FIG. 11 illustrates a branching strategy for production significant amount of redundant, disintegrated, and inconsis environment; and US 8,856,169 B2 3 4 FIG. 12 illustrates an example of how the use of Unobtru semiautomated data flow, and resource Scheduling to mini sive JavaScript creates cleaner HTML documents. mize overhead after deployment, it employs the latest meth odologies and tools in IT and software engineering in Soft DETAILED DESCRIPTION OF THE PREFERRED ware development. EMBODIMENTS The choice of an appropriate open-source developmental environment not only saves developmental cost, but also Reference will now be made in detail to embodiments of ensures that the system is modifiable and expandable without the invention, examples of which are illustrated in the accom proprietary restrictions. The potential downside of a steeper panying drawings. It is to be understood that other embodi learning curve and the stability of the Supporting community ments may be utilized and structural and functional changes 10 may be overcome by a careful scrutiny of the available open may be made without departing from the respective scope of Source packages and Suitable training of the programmers. In the invention. one embodiment, MIMI uses Plone, which is an open-source The rapid expansion of biomedical research has brought content management system, as its main developmental envi Substantial Scientific and administrative data management ronment, but other similar open-source content management challenges to modern core facilities. Scientifically, a core 15 systems that meet the desired requirements may be used Such facility must be able to manage experimental workflow and as, but not limited to, Ruby on Rails. the corresponding set of large and complex scientific data. It In one embodiment, Plone is chosen for its web-based must also disseminate experimental data to relevant research interface for development and its built-in web-server incor ers in a secure and expedient manner that facilitates collabo porating the latest techniques for content-management, Such ration and provides Support for data interpretation and analy as version control and cascading style sheets (CSS). Plone's sis. Administratively, a core facility must be able to manage object-oriented framework allows rapid development the scheduling of its equipment and to maintain a flexible and through code reuse and extension of proven functional mod effective billing system to track material, resource, and per ules. The object oriented paradigm allows objects placed Sonnel costs and charge for services to Sustain its operation. It insider other objects (such as folders) to inherit and reuse their must also have the ability to regularly monitor the usage and 25 parents attributes, contents, and functions. Plone's object performance of its equipment and to provide Summary statis oriented framework extends to the storage level, allowing tics on resources spent on different categories of research. To developers to conceptually organize information in a logical address these informatics challenges, we introduce a compre manner that in turn speeds-up development. The Plone dis hensive system called MIMI (multi-modality, multi-resource, tribution is available for major operating systems such as Mac information integration environment) that integrates the 30 OS, Windows, and Linux, so a developer can select a pre administrative and Scientific Support of a core facility into a ferred environment for development. In another embodiment, single web-based environment. In one embodiment the Ruby on Rails in place of Plone. design, development, and deployment of a baseline MIMI Fully integrating the end-user into the developmental team system may be used at an imaging core facility. In addition, ensures usability, relevance, and impact to the targeted appli the general applicability of the system may be used in variety 35 cation domain. Although neither consciously nor strictly fol of other types of core facilities. MIMI is a unique, cost lowing the extreme programming practice, we find it effective approach to addressing the informatics infrastruc extremely important to engage the end-user into all steps in ture needs of core facilities and similar research laboratories. the software development process. The engagement of the The present multi-modality multi-resource information end-user helps realize two of the core values of extreme integration environment ("MIMI) not only addresses some 40 programming immediately: communication and feedback. of the needs and provides a flexible and expandable solution Through regular meetings, ongoing changes to loosely speci to the challenges mentioned above, but also provides a foun fied requirements occur as a natural process. The adaptability dation for a more advanced system that Substantially inte to changing requirements is a more realistic and better grates existing knowledge with analyses and curation of approach than attempting to define all requirements at the experimental data. The MIMI system comprises: (a) effec 45 beginning of a project, because the developer and the end tive, efficient and secure data storage and archiving of a user rarely have complete foresight of the desired end product Variety of imaging data (e.g., digital imaging and communi at its inception. Rather, the ongoing discussions become a cation in medicine); (b) web-based access of acquired imag cooperative activity that helps define, refine, and deepen the ing data by researchers unconstrained by time and location; understanding of what is desired. However, discussions alone (c) sharing of raw and primary imaging data among collabo 50 without a concrete system would not be effective. rators; (d) resource scheduling and management; (e) moni This leads to the second aspect related to extreme program toring of the overall resource usage of a core facility, by ming: test-driven development. Although the goal of test compiling, e.g., a profile of usage statistics of equipment and driven development is to make Sure that current code meets types of Supported projects; and (f) built-in mechanism for requirements, we use these informal tests as a way to dem tracking regulatory and operational qualifications e.g., Insti 55 onstrate the features and functionalities of the system togen tutional Animal Care and Use Committee (IACUC). erate in-depth, timely, and specific feedback to the developer. In one embodiment, the MIMI system comes with a web Of course, any unusual behavior of the system will show as based interface to Support core membership and project infor bugs or defects to be corrected for the next iteration of dem mation management. It features an expandable and modifi onstration. Depending on the workload and available man able framework that can adapt to the needs of imaging and 60 power, these live demos of partial working systems can hap other kinds of core facilities. pen on a weekly or monthly basis. In one embodiment MIMI adheres to the following set of The remaining three principles of web-interface, decen guiding principles: it uses an open-source environment for tralized content management, and employing the latest tech development, it fully integrates the end-user into the devel nology are: the web interface provides uniform and wide opmental team, it maintains uniformly web-based, menu 65 accessibility; menu-driven interaction provides more control driven, friendly user interface, it decentralizes data and infor over data input, output, and presentation; and decentralized mation management tasks with role-based access control, content management reduces the overall management over US 8,856,169 B2 5 6 head after the system is deployed. However, achieving these core facility by allowing the user to enter data, which will be requires a long-term vision and knowledge in several related validated by a manager in a core facility for it to become fields. effective. The first action indicated by the model is searching In one embodiment the baseline MIMI comprises two main for an existing user profile for a specific user. If the user components: the Meta Server and the Data Server. profile does not exist, then it must be requested by the user as The Meta Server is the common front-end for MIMI's a pending user profile. Otherwise, if the user profile exists, functionality. It is called “Meta Server' due to its role in then it will be displayed. The user profile model then proceeds managing all relevant alphanumeric data: user profiles, to define actions for a pending user profile. A pending user project information, scheduling information, data storage profile that does not meet the criteria for approval needs to be address information, access control, etc. It Supports a web 10 modified by the user or a core facility manager. A pending interface for data downloading after experimental data is user profile that meets the criteria for approval can be acquired, using the client-server paradigm. Administrative approved by a core facility manager. functionalities are also supported by the Meta Server, such as The Plone implementation of the profile model uses the validating user-supported information, assigning access profile object, which stores details about core facility users. privileges, and confirming requested Scanning sessions. 15 The profile object resides at the top level of the Plone object In one embodiment, in a manager's role, a user can launch hierarchy as shown in FIG. 2. It may capture information the usage-statistics program to monitor resource usage and using, but not limited to, any of the following string attributes: generate statements for fees for the core. The Meta Server is first name, last name, e-mail address, institution, department, also involved in the final step of data flow: after imaging data phone, fax, address, city, state, Zip code, country, login ID, are acquired, a Java program, or any other similar program, and status. In one embodiment, the last two attributes store a can be launched from the scanner work station (usually a PC), user's ID forlogging into Plone and a value for pending (P) or which receives input about the address of a local folder con approved (A) status, respectively. The profile object may also taining the acquired data and a redundant array of indepen comprise a roles attribute that stores a list of user roles. Plone dent disks (RAID) directory path on the Data Server repre may access the value of the roles attribute to determine a senting the location where the data will be stored. The RAID 25 user's access privileges. path consists of metadata automatically generated by the In one embodiment, the four possible user roles may be Meta Server to represent the unique, readable, and in one Principal Investigator (PI), Coinvestigator (CI), Operator, and embodiment humanly readable, directory path on the Data Manager (a user can assume multiple roles). Users with the PI Server. role are researchers who have active research projects. Users The Data Server is the backend for storage management of 30 with the CI role are collaborators who work with other acquired data Such as, but not limited to, image data and other researchers. Operators represent users who are qualified to experimental data. It uses a standard folder hierarchy for operate equipment. Managers are core facility staff members storage. To safeguard data from network viruses and prevent with “superuser privileges, i.e., they have access to all of unauthorized access, the Data Server operates behind a hard MIMI's functionalities. ware firewall with communication permitted only with the 35 In one embodiment, when a user is granted the privilege to Meta Server and with the local area network (LAN) PCs create a profile object, a profile request form is presented with attached to scanners. The Meta Server and the Data Server input fields to capture information Such as, but not limited to, together achieve common functionalities of a data ware a user's e-mail address and phone number. Once a user Sub house. mits the profile request form, Plone creates a pending profile Design and Implementation 40 object with its status attribute that, in one example, may be set MIMI is designed to support a core facility’s administra to “P” Core facility staff members with profile objects that tive and Scientific workflows in a single system. In one contain “manager as a value for the roles attribute are ulti embodiment, the administrative workflow comprises manag mately responsible for approving all pending profile objects. ing profile data on users and research projects, scheduling The main criterion for approval is verifying that a profile scanning sessions, billing services, and compiling perfor 45 objects login ID is associated with the right contact informa mance statistics to monitor resource usage. The Scientific tion Such as a user's e-mail address and phone number. workflow comprises managing Scientific data and dissemi Approving profile objects through Plone guards against mali nating them to the relevant researchers through a common cious users who attempt to pose as others to gain access to web-interface. private information. Three data models may be used for the administrative 50 Project Model workflow (FIGS. 1, 3, and 4). The description of these data The project model in FIG. 3 specifies the behavior of the models follows the activity diagram specification of the Uni project information management segment of the administra fied Modeling Language (UML). A solid dot represents the tive workflow. In one embodiment, the initial state of the initial state. Rectangular boxes and round-corner boxes model consists of a decision node that returns “Yes” or “No” denote activities and objects, respectively. Solid arrows 55 depending on whether an existing project is selected. If'Yes'. specify transitions between activities. Dashed arrows repre then the information about the selected project is displayed. If sent interactions with objects, i.e., dashed arrows entering or 'No', then a user can request a new project or a pending new leaving an object represent modification/creation or retrieval, project. The project model then specifies actions for both respectively. pending and approved projects. A pending project that does In one embodiment each data model Supports the admin 60 not meet the criteria for approval must be modified by the user istrative workflow, each data model is implemented using or a manager, whereas a pending project that meets the crite Plone, and the scientific workflow is addressed through the ria for approval can be approved by a manager. In one data-flow model (FIG. 5). embodiment, an approved project may be modified by its User Profile Model owner to grant privileges for specified collaborators, among The user profile model in FIG. 1 specifies the behavior of 65 the existing users, to access the associated experimental data. the user management segment of the administrative work The project model is implemented in Plone using a project flow. The model is aimed to ease the data entry burden of a object that captures information about a specific research US 8,856,169 B2 7 8 project. The project object may use, but is not limited to, any information such as a PI's name will be automatically of the following attributes to capture the associated informa retrieved and displayed on the session schedule. The operator tion: name, PI, CIs, IACUC number, grant number, account attribute stores the ID of a profile object that has “operator as number, and description. In one embodiment, the name a value for its roles attribute. The operator attribute may also attribute stores the title of an active grant or a pilot study. The track the user who operates an imaging system during a PI attribute, which may store the ID of a profile object that session. The scanned items attribute stores the IDs of entities contains “PI as a value for its roles attribute, links a project Such as, but not limited to, Small animals, large animals, cell object with a principal investigator. The CIs attribute specifies plates, cells, other cellular material or any other entities that a projects collaborating users by Storing the IDs of existing are used during a session and the status attribute stores a value profile objects that contain “CI as a value for their roles 10 of scheduled (S) or completed (C). attribute. In one embodiment, MIMI features a web-based schedul In one embodiment, a project request form is implemented ing interface for imaging systems. The scheduling interface in Plone to allow a user to request a project object which, if uses a combination of DHTML and AJAX to approximate the approved, will have the user as its owner. This form contains response speed and the look and feel of a desktop application. input fields to capture project details as mentioned in the 15 Users can create a new session object by dragging the mouse previous paragraph. It also presents a checkbox interface to cursor over an open time-slot that spans at least one allow a user to select user profile objects for inclusion as 30-minute interval and then selecting a research project values for CIS. Once a user Submits a project request form, object. The interval size may also be 15-minutes, 1-hour or Plone creates a pending project object with its status attribute any other multiple of 15-minutes. A new session contains set, but not limited, to “P” values for, but not limited to, the following attributes: imaging In one embodiment, core facility staff members with pro system name, date, time-slot, project, and status (S). A user file objects that contain “manager as a value for the roles can then use MIMI's scheduling interface to perform cancel attribute are ultimately responsible for approving all pending lations or to access the Supplemental billing form to choose project objects. The criteria for approval includes, but is not values for the remaining attributes. limited to, checking that a project object’s grant number and 25 In one embodiment, MIMI's supplemental billing form account number are valid. contains fields that capture billing details such as time dura In one embodiment, a user can also use Plone to view tion and cost information. It also allows a user to select the approved project objects that the user is associated with, i.e., profile object for inclusion into the operator attribute. A user the user is the project PI or a collaborator. In the case that new may also use the Supplemental billing form to select Scanned collaborators arrive, or old ones depart, a user may modify the 30 items by choosing group objects. A group object represents a list of collaborating users for these approved project objects collection of entities that have similar characteristics. In one through a web-based checkbox interface. In one embodiment, example, female mice with the same vendor and strain may for security purposes, a user cannot create new users (i.e., translate into a single group object. A group object uses the profile objects) and may only select collaborators for a project following key attributes to store information such as but not from existing profile objects. Relegating the management of 35 limited to: name, species, strain, Vendor, and itemIDS. In one project collaborators to project owners (i.e., PIs) is an embodiment, the itemIDs attribute stores a list of unique IDs example of decentralized content management, which allevi for each item of a group object. A user who submits the ates the data management burden of a core facility. A collabo Supplemental billing form initiates the process of automatic rator of a project is typically granted the privilege to access cost computation. MIMI then sets a session objects status experimental data resulting from the project. 40 attribute to completed (C) and updates the values of the Session Model remaining attributes. In one embodiment, the session model, as shown in FIG. 4. In one embodiment, the resource usage compilation capa specifies the behavior of the scheduling, billing, and usage bility may allow a core facility to regularly track the usage of statistics compilation segments of the administrative work its equipment and provide important and useful Summary flow. The initial state of the model is a decision node to 45 statistics on different aspects of the daily operations of a core. determine which actions to perform depending on whether MIMI can generate performance assessments of imaging sys completed or scheduled sessions are selected. In one embodi tems with different time intervals. When compiling a perfor ment, if completed sessions are selected, usage-statistics mance assessment, Plone locates the relevant completed ses compilation may be performed. If scheduled sessions are sion objects and Sums the values of their time duration selected, a calendar with the scheduling information will be 50 attributes. Plone's built-in search interface may be modified displayed. The session model specifies further actions for to filter completed session objects using criteria Such as, but scheduled sessions and empty time slots on the schedule. A not limited to, principal investigator, project, and date range scheduled session that is invalid (incorrectly scheduled) will through text-fields and dropdown lists. be canceled, whereas a scheduled session that is valid (cor Data-Flow rectly scheduled) will be followed by input for billing details. 55 MIMI addresses a core facility’s scientific workflow with An empty time-slot on the schedule permits the scheduling of the data-flow process. FIG. 5 is a summary of the data-flow a new session within the selected time interval. In one process. MIMI implements a data-flow process that seam embodiment, the Plone implementation of the session model lessly links data with the associated session metadata. uses the session object, which represents a scheduled or com FIG. 5 is a summary of the data-flow process that shows: 1) pleted session for an imaging system. The session object 60 User requests to view scheduled session; 2) Meta Server resides at the lowest level of the Plone-object hierarchy. It replies with session object; 3) User copies Data Server Path stores information using, but not limited to, the following string into Uploader application; 4) Uploader application attributes: imaging system name, date, time-slot, project, sends scientific data to Data Server; 5) Data Server stores operator, Scanned items, time duration, total cost, status, and Scientific data: 6) Person requests to view downloading inter Data Server Path, which may also be know as the RAID path. 65 face; 7) Meta Server replies with downloading interface; 8) The project attribute stores the ID of a project object related to User issues download request; 9) Meta Server forwards the session object. From the project object, relevant project download request to Data Server; 10) Data Server sends sci US 8,856,169 B2 9 10 entific data to Meta Server; 11) Meta Server forwards scien load individual pieces. When a user encounters a folder larger tific data to Internet PC; and 12) User stores scientific data on than 1 GB, it is also possible to download only a subset of its Internet PC. contents at one time. In one embodiment where the seamlessly linked data is An innovative feature of MIMI's implementation of the imaging data, when MIMI implements a data-flow process data-flow process is the Data Server Path Attribute, which that seamlessly links imaging data with the associated session enables the treatment of imaging data as binary files. This metadata, with the completion of an imaging session, imag unleashes MIMI from the complexity and variety of image ing data is stored in a standard folder hierarchy on the file formats, such as dcm, nifti, analyze, and other known attached local work station PC. The operator then selects ajar image file formats, and avoids conversion to any standard data file on the work station PC. The jar file is a Java executable 10 formats. The necessary metadata, usually stored as header program for the Uploader application, which is responsible information, resides in the portable path names for the folder for transferring the scanned imaging data to appropriate fold hierarchy. It also serves as imaging data's unique IDs. ers on the Data Server. After launching the program, the user In one embodiment, the Meta Server and the Data Server looks up the correct session object from the Meta Server and are deployed with a carefully chosen set of hardware and retrieves the value of its Data Server Path Attribute—a value 15 software components. The Meta Server runs on a Dell Pow automatically generated when a session object is created from erEdge with dual 3-GHz Intel Xeon processors, 4 GB of the scheduling interface. In one embodiment, the Data Server DDR2 RAM, and two 300-GB 10-K RPM Ultra-SCSI hard Path value is a string with six main parts, year, month, day, PI drives. It operates using Redhat Linux and runs an Apache name, imaging system name, and time-slot, that uses the front-end for secure sockets layer (SSL) transmission. backslash as a delimiter. Because MIMI automatically In one embodiment, the Data Server may operate under a accounts for scheduling conflicts, the Data Server Path value variety of operating systems such the Windows 2003 operat represents a unique storage location on the Data Server. The ing system and provides a RAID with eight 300-GB hard operator copies the Data Server Path value, pastes it into a drives connected with Dynamic Network Factory's 8-channel textbox of the Uploader application, and selects the local controller handling the RAID-5 functionality. directory path for the folder containing the imaging data. 25 In one embodiment, VISAGE (VISual AGgregator and Once the origin and destination for the imaging data are Explorer) is developed as a query interface for clinical given, the Uploader application initiates a data transfer ses research. A user-centered development approach is followed sion with a single mouse-click. and incorporates visual, ontological, searchable and explor At the receiving end of the data transfer process, the Data ative features in three interrelated components: Query Server runs a Receiver Script that listens continuously for 30 Builder, Query Manager and Query Explorer. The Query requests from active Uploader applications. For all incoming Explorer provides novel on-line data mining capabilities for requests, the Receiver Script first obtains the DataServer Path purposes such as hypothesis generation or cohort identifica string. The Script then fetches an incoming file's path and its tion. In one embodiment, the VISAGE query interface has name and concatenates them to the Data Server Path string to been implemented as a significant component of Physio form an absolute storage path. The Receiver Script parses the 35 MIMI. Preliminary evaluation results show that VISAGE is absolute storage path into a valid folder hierarchy and creates more efficient for query construction than the i2b2 web any missing folders to form a unique storage location. The client. Script then creates an empty file object and retrieves its con In one embodiment, VISAGE is a query interface that may tents by streaming binary data in 65,535-byte increments. be used in Physio-MIMI, a device that may be used to, but is The entire cycle repeats until all files transfer successfully to 40 not limited to, improve informatics Support for researchers the Data Server. conducting clinical studies. In one embodiment, the Physio Once data such as, but not limited to, imaging data is MIMI data integration environment has two salient features. moved to the Data Server, it can be immediately downloaded First, it is a federated system linking data across institutions by their owners and collaborators through the Meta Server. without requiring a common data model or uniform data MIMI supports this step with a Retrieval Script that runs on 45 Source systems. This would greatly reduce data warehousing the Data Server and listens continuously for requests by the activities such ETL, often a significant overhead for data Data Request Script that runs on the Meta Server. The com integration. Second, Physio-MIMI is tightly focused on serv munication process begins when the Data Request Script ing the needs of clinical research investigators. VISAGE must accesses a session object, obtains its value of the Data Server therefore provide robust data mining capabilities and must Path Attribute, and sends this value along with a relative 50 support federated queries, while still being user-friendly. In folder path to the Retrieval Script. The Retrieval Script joins one embodiment, VISAGE may be directly used by clinical the Data Server Path value and the relative folder path to form researchers, for activities such as data exploration seeking to a query path. The Retrieval Script then opens the query path formulate, clarify, and determine the availability of support on the Data Server and obtains a list of its files and folders, if for potential hypotheses as well as for cohort identification for there are any. The Script then iterates through the list, com 55 clinical trials. putes file and folder sizes, and forwards these details to the Such an interface would enable an evolution of the data Meta Server. The Meta Server dynamically constructs the access paradigm: the current paradigm (left of FIG. 6) is one visual downloading interface and sends it to the user. After the in which clinical investigators communicate a data request to user selects files or folders to download, the Data Request an Analyst or Database Manager (1) who in turn translates the Script builds a list that holds their path strings and sends it to 60 requestinto a database query and interrogates the database (2) the Retrieval Script. The Retrieval Script creates a temporary to obtain requested data, finally returning results (3). The time Zip file and populates it by iterating through folder paths in the span between 1 and 3 in the left of FIG.6 may be weeks if not list and fetching any files. In the end, the Retrieval Script months, and steps 1-3 often need to be repeated as the query sends the zip file to the Data Request Script, and the Data criteria are refined. Request Script forwards it to user's local desktop. In one 65 VISAGE seeks to change this to a paradigm which empow embodiment, when a file is larger than 1 GB, the Retrieval ers clinical investigators with data access and exploration Script virtually partitions the file and allows the user to down tools directly (right of FIG. 6). In this case clinical investiga US 8,856,169 B2 11 12 tors (1) and data analysts (2) access data directly, and then or other large binary files Such as images). Typically, results perform collaborative data exploration (3) as shown on the are limited to counts and aggregate statistics until the user right side of FIG. 6. achieved a sense of which direction to pursue further. A user-centered approach, proven essential for Successful In one embodiment, the Query Explorer and some of the user interface development for websites was used for the 5 design features are aimed at reducing the user's effort in design, implementation and preliminary evaluation results of formulating new queries and revising existing ones. The VISAGE. This approach requires the engagement of the end visual slider bars have the added advantage of error reduction user in all steps of the developmental process. Such as needs for constraint specification. analysis, user and task analysis, functional analysis and A federated model was used due to the complexity of the requirement analysis. To improve usability, VISAGE incor- 10 clinical and physiological data to be available through porates visual, ontological, searchable and explorative fea Physio-MIMI. Rather than forcing each data source to con tures in three main components: (1) Query Builder, with form to a standard database schema, Physio-MIMI is based ontology-driven terminology Support and visual controls on the mapping of individual databases to a common Domain such as slider bar and radio button; (2) Query Manager, which Ontology (DO). The DO consists of a set of concepts (terms) stores and labels queries for reuse and sharing; and (3) Query 15 in a selected domain and the relationships between the con Explorer, for comparative analysis of one or multiple sets of cepts. The concepts are organized in hierarchical (SubClass, query results for purposes such as screening, case-control IS-A) relationships, as well as others such as, but not limited comparison and longitudinal studies. Together, these compo to, “part Of”, “findingSite”, “associated-Morphology', etc. nents help efficient query construction, query sharing and The Query Builder, backed by the domain ontology, provides reuse, and data exploration, which are important objectives of 20 a searchable list of terms as the starting point. And for each the Physio-MIMI project. term, it provides the user with context-specific navigation to In one embodiment, the agile development methodology explore its relationships—allowing the user to traverse up or was adopted to make VISAGE usable directly by clinical down the parent-child hierarchical relationships as well as researchers. A key requirement of this methodology is the along the other axis relevant to the term in order to further close interaction between the developers and the users. In 25 refine the query. By employing the DO, a standard set of designing VISAGE, user-centered design principles were fol terminology can be employed while allowing individual data lowed, which involve use cases, user and task analysis and contributors to maintain data according to their desired functional analysis, described in the rest of this section. schema. The ability of VISAGE to query across disparate In one embodiment, the overarching use case for VISAGE, databases across institutions is therefore dependent on this of which there are several more specific thematic variations, 30 ontological mapping. The Query Builder provides the user is a clinical researcher exploring available data with the intent interface to formulate the necessary patterns—allowing the of discovering the nature, scope, and provenance of the data construction of a logical query. The logical query is translated as it may apply to the researchers interests and intended uses. into a local database query based on the mapping between the Among the variations are, but not limited to, the following: ontology model and the database specific data model. (1) searching for hitherto unnoticed patterns of association 35 The resulting design and implementation of Query Builder and correlation among the available data that suggest or rein (FIG.7) and Query Explorer (FIG. 8) is shown in FIGS. 7 and force nascent research hypotheses; (2) deriving and assem 8. The Query Manager saves queries (optionally their results) bling clinical, demographic, behavioral, and assay data sets for reuse, which may be searched by keywords in title, for use in statistical analyses that can be used in the justifica description, or the query itself (e.g., for finding queries about tion of funding proposals for research studies; and (3) profil- 40 a specific symptom or disorder). The functionalities of Query ing patient populations to determine the availability of Manager are similar to that of an email management applica cohorts who could be recruited as Subjects in proposed tion. research studies. Query Builder Such tasks are commonly referred to as data mining, typi The query builder interface includes functional areas 1-12 cal down-stream steps that require in-depth analysis, by stat 45 shown in FIG.7. As shown in FIG. 7, the Database Selector 1 isticians or computer Scientists, of queried data sets for the allows a user to select which database(s) in the system against discovery of patterns and associations. In one embodiment, which to run the query. VISAGE allows informaticians to VISAGE's Query Explorer interface serves to incorporate quickly make data sources available for querying by Supply those activities that are typically carried out in Such down ing tools for secure database connectivity and online tools for stream data mining analysis, in order to support discovery 50 mapping database elements to DO concepts. Once mapped, driven query exploration by clinical investigators directly. the database can be available to query and will appear to the VISAGE is not designed to replace the role of data mining: user in the Database Selector. Using the Query Builder rather, it complements data mining by incorporating steps researchers can quickly generate a query across multiple that may be routinely performed before a more in-depth, databases or compare results of the same criteria against off-line analysis. 55 different databases. In order to support hypothesis generation and testing and The Search Bar 2 allows the user to search the hierarchy of cohort identification, an interface that greatly accelerates terms, displaying those that match in the Term Selection Area access to relevant data sets: past queries should be quickly 3 below. When terms are clicked, they are added to the Term recallable; new queries should be easily constructible; exist Display Area 4. As mentioned above, the user can search for ing queries should be readily modifiable is needed. 60 any synonyms of concepts in the ontology and be presented The sense of exploration would quickly diminish if it takes with the appropriate ontological concept. The searchable list too much effort or too much time for a set of queries to return ofterms is backed by the DO and provides the user the ability meaningful results. To help achieve a speedy response of the to navigate using ontological relations to further refine the system during the highly explorative phase of the user, VIS query. To use the VISAGE interface, a clinical researcher AGE provides the user a choice of three tiered query results: 65 needs only to understand the clinical model (domain ontol counts only; counts with attribute vectors; attribute vectors ogy), and the Query Builder provides the interface for formu with associated files (physiological signal data, genetic data, lating the necessary patterns for the construction of a logical US 8,856,169 B2 13 14 query. The logical query is then translated into a database The following is an example of a use of Query Explorer and specific query based on the mapping between the ontology Query Builder, but is not, in any way, a limitation. FIG. 8 model and the database schema. illustrates an explorative step for the query used in FIG. 7, In one embodiment, the query's logic is in Conjunctive where no gender criteria is included. The Query Explorer Normal Form, which means records need to only satisfy at interface allows one to search and select variables that may or least one condition in each group to be included in the query may not be present in the original query. The pie-chart in FIG. result set. To change to Disjunctive Normal Form, the Flip 6 8 shows the gender distribution in the result for the selected action is made available. The grouping logic is denoted by the query in FIG. 7. The histogram of age distribution is dis color of the box. In one embodiment, elements in a green box played on the right in FIG.8. A user may select two or more 10 queries so a user can explore a variety of patterns and studies are logically connected by AND, while elements in a light Such as, but not limited to, potential patterns for a case popu blue box are joined by OR. However, the boxes may be of any lation and a control population (one query for each), or for desired color. Terms can be selected with the checkboxes and Longitudinal Studies (same query with varying time points). grouped together or separated by clicking Group or Ungroup VISAGE is a powerful interface that is intuitive, usable and 5, allowing for different parenthetical groupings of terms for 15 simple. Agile and user-centered methodologies are used for the conjunctive or disjunctive relationships. Additional term the query interface development. It entails that a clear sepa manipulation functionality includes Rearrangement 7, which ration between design and implementation is neither feasible, lets a user drag and drop the terms to arrange them how he nor necessary. Design versions are usually at a conceptual or wishes, and Deletion 8, which allows removal of terms that functional level, and the details are relegated to the prototyp the user may have mistakenly added to the query. To specify ing phase, which drives the design revision. Rapid prototyp inclusion conditions, each term added to a query comes with ing of VISAGE may beachieved through the use of various term-specific controls. Open Source Web development tools and frameworks includ For categorical data, Checkboxes 9 may display the pos ing Ruby on Rails, Prototype, and Script.aculo.us JavaScript sible values for categorical variables. The values for categori libraries. All of these are web-based (Web 2.0) and work cal variables may also be derived from the DO, and map to 25 across platforms. specific values in the underlying database schema(s). A user Multi-Modality, Multi-Resource Environment for Physi needs to only know the conceptual categories not the under ological and Clinical Research (Physio-MIMI) is an innova lying structure, and due to the VISAGE database mapping tive data federation platform. In one embodiment, Physio individual databases need not code categorical variables in MIMI sported an expandable Domain Ontology; fine-grained the same manner. For continuous variables, Sliders 10 allow 30 interface for role-based data-source level access control; easy and expressive creation of intervals, with ranges of plug-and-play adaptor to mediate data access services; and inclusion specified by light blue shading as well as numeric data schema to Domain Ontology (DO) mapper that trans display. The Sliders have the additional advantage of allowing forms local databases into integrated resources accessible for the creation of multiple disjoint intervals, something that using the intuitive and powerful federated, query interface is often not possible in interfaces that provide manual speci 35 VISAGE. In another embodiment Physio-MIMI developed fication of continuous ranges. an ontology-driven, federated data sharing system to access When the user is finished adding terms and modifying multiple sources of data covering both clinical and physi inclusion conditions, the number of records that satisfy the ological domains using, but not limited to, medicine as the conditions is displayed in the Result Count Area 11. Finally, primary exemplar, and developing a Suite of tools for curation the user can Describe/Save/Update 12 the query to the Query 40 of physiologic recordings including EEG, ECG, EMG, from Manager for future use in the Query Explorer or re-use in the Vendor specific polysomnography (PSG) data formats to data Query Builder. in (open-source) European Data Format (EDF), making Query Explorer study-level information and signal-level information sharable The Query Explorer allows the records returned by one or across laboratories. more queries to be further investigated. Not only can the user 45 Physio-MIMI has broader implications as a unique pilot view distributions of the terms that were used as criteria in the model for: collaboration among multiple CTSA sites; col specification of a query, but any other available term may be laboration among informaticians, domain experts, and selected for exploration within that result set. The Query project managers; agile development, management, and com Explorer provides numeric distributional information includ munication frameworks in an academic setting for producing ing frequency and percent for each level of categorical vari 50 easy-to-use, production-strength tools integrating end-user ables, and mean, standard deviation, and range for continuous testing in each step of the delivery milestones. variable. The Query Explorer also provides graphical dis Physio-MIMI has many complexities due to the breadth plays of distributions including pie charts and histograms for and depth of data sources used, multi-plicity of Software categorical and continuous variables, respectively. environments and tools involved, the distribution and dynam Discovery-driven query exploration may start with one, 55 ics of personnel and multi-site collaboration, and the short two or multiple queries in a query group, arranged in a spe delivery timeline. In one embodiment, expanding the appli cific order by the end-user, not unlike a workflow. The queries cation scope of Physio-MIMI's system architecture and in a query group are “aligned to allow the user to Zero in on accelerating the dissemination of the Software to the larger selected attributes to gain a sense of value distribution of the CTSA community is used. selected attribute among the patients represented in the query 60 In one embodiment, organization and communication are results. an important aspect for the Success of this project. The com By exploring the value distribution of a certain variable plexity of the system to be developed and the usability of the within a set of query results, a user may discover how some of web-based interfaces by clinical investigators mandated an the baseline query criteria influence the value distribution of agile approach in which not all details of the design were fully specific attributes, as shown by example in, but not limited to, 65 specified before a limited-scope prototype could be tested and the pie-chart in FIG. 8, without issuing another query with an progressively extended. The desired integration of domain additional attribute specified. experts, informaticians and project managers in the same US 8,856,169 B2 15 16 team required effective communication not only across insti To inform the agile process, Physio-MIMI adopted four tutions, but also vertically within an institution. Specifically Use Cases of increasing complexity to guide the informatics for institutions with main developmental responsibilities. development: (1) determine availability of potential subjects In one embodiment, one goal is for a required close inter meeting inclusion-exclusion criteria for designated analyses; action among diverse disciplines, a matrix organizational (2) identify members of a candidate cohort based on inclu framework where members from participating institutions sion-exclusion criteria; (3) retrieve data for analysis (PSGs, were assigned clear roles and responsibilities was developed annotation files, etc.) for specified members of the analytical to address this issue. Representative leads from each institu data set; and (4) cross-link information in research databases tion are highlighted in the table below. with data obtained from PSG's via application of dedicated 10 quantitative processing algorithms. In one embodiment, five committees/subcommittees were All Use Case development was led by the Domain Ontol planned: the Executive Committee, Domain Experts Sub ogy Subcommittee with assistance from the Informatics Sub committee, Informatics Subcommittee, Ontology Subcom committee. With higher-level system components of Query mittee and Steering Committee. The Executive Committee Builder, Query Manager and Query Explorer along with its consisted of the PIs, the Project Manager and the software and 15 Database-to-Ontology mapper, the Use Cases helped identify tools developers. The Steering Committee consisted of the an increasingly rich set of variables to be captured in the SDO representatives responsible for major roles from each institu for the subsequent iteration. Each iteration typically took 2 to tion. The Domain Experts and Ontology Subcommittees were 3 weeks. combined into a Domain Ontology Subcommittee early on Agile development informed by UCs allowed us to pro due to their substantial overlapping objectives. This organi duce incremental prototypes with gradually enhanced fea zational framework allowed for the informaticians and tures for testing and demo. The testing and demo by the domain experts to work independently within their own areas, informaticians inspired feedback from the rest of the team for and coordination through the Steering Committee. further development and refinement, but it also suggested At least two vehicles were used to facilitate communica architecture changes from time to time. One change involved tion: project wiki and Rally. 25 the elimination of the Honest Broker Core from the system Project wiki: A dedicated private project wiki site was architecture. Another allowed for direct PSG file download proposed and implemented. All team members had edit privi 1ng. lege to all content areas. Continuous documentation in Such a In one embodiment, agile software development can wiki site was found to be valuable for sharing information, greatly facilitate the implementation of a complex project. recording design specification, and providing a history for the 30 Use cases provide a valuable mechanism to facilitate agile “thought process' for major design decision and revision. development in defining project iterations and milestones. Meeting schedules and minutes were also posted on the wiki. However, success of agile development is conditioned on a set Rally: A shared community version of Rally was used by of basic requirements, which makes it not universally appli project managers for agile project management. Milestones cable. were broken into stories, stories were broken into tasks, 35 Despite the overall excellent level of communication, team which were assigned to developers of the team with clearly members felt that greater involvement of the technical experts defined artifacts, estimated effort and timeline. This greatly in the development of the UCs would have enhanced their facilitated planning and scheduling of releases. Bug-fixes understanding of the project and better guided initial Software identified in testing were also recorded in Rally as tasks. development. Thus, an important lesson is the importance of Finished tasks were checked by an independent observer. 40 ongoing communications among the end-users and develop Rally also greatly facilitated preparation. CS. In one embodiment, agile Software development method Agile development requires that the developer team mem ology is Suited for projects where high level goals can be quite bers have compatible levels of expertise and are not afraid of clear, but the pathways achieving these can be murky at the coding without a completely specified design in writing. This beginning. System and functional requirements are often 45 is because not all team members have a predefined set of under-specified because of the high-risk and experimental tasks, and software components to be developed are dynami nature of a project. Agile Software development, although cally generated and assigned from iteration to iteration. The fitting in a multi-disciplinary environment, is not often fully project’s ability to continue without interruption may be practiced in an academic setting. The integration of the infor attributed to the overlap in roles and shared responsibilities maticians and domain experts in the same team and the scope 50 and paired-programming (each key Software component was and complexity of the project made the agile Software devel assigned to least two developers at all times). In one embodi opment methodology a useful option for the Physio-MIMI. ment, future similar projects using the agile development Rapid incremental prototyping and iterative refinement are paradigm would be to use research developerS Such as those the hallmarks of agile development. In contrast, the tradi with advanced degree with good coding experience, who can tional Waterfall approach requires a clear and complete sepa 55 be at ease with self-teaching a new tool and are always in the ration of the design phase and the coding phase. For Physio outlook for new technology and best-practices. MIMI, because the tools were originally developed for use by Adoption of a Management Tool. Even though agile soft clinical sleep researchers, it was not feasible for either the ware development was selected as the developmental meth informaticians or the domain experts to develop a design odology, using a Community Edition of Rally proved benefi document with a complete set of details for the envisioned 60 cial as the benefits of Rally in Supporting project system in advance. Instead, the full-specification embodied in management, communication and reporting became clear. the final release emerged as a result of a highly collaborative In one embodiment, because of the frequent updates of process involving frequent and close interactions between the code base during each iteration in agile development, version informatics team and the sleep researcher team. This process control becomes an essential part of project management, consisted of iterative cycles of design, coding, testing, demo 65 especially to facilitate the collaborative development among ing, evaluation/feedback, with each iteration spiraling closer team members. In one embodiment, Physio-MIMI used the towards a fully-fledged system. Subversion Version Control System (SVn) to maintain two US 8,856,169 B2 17 18 code sets at all times: developing version and production including EDF files, available for query access. The Applica version. The developing version represented a code set that tion Server consisted of a suite of tools for the normalization was under active development, while the production version of signal attributes and the translation of header information represented a code set that was stable enough for testing and contained in vendor-specific PSG files: EDF Editor, EDF evaluation, but did not have all the latest features. Near the 5 Translator and EDF Viewer. Communication between the end of the project, we switched to the Fast Version Control Meta Server and the Data Server was facilitated through System Git to account for code branching, making it possible secure messaging using Honest Broker. to develop systems that shared some basic features but had a Guided by the proposed high-level system architecture disjoint set of more specialized features for different pur early in the project, the development team adopted a set of poses. Git provided the desired flexibility for continued 10 components from the MIMI system such as user registration, improvement of the shared basic features for different access control, and auditing, and incrementally refined the branches as well as the merging of specialized features at a initial design with the development of additional components future point. outlined in VISAGE. In one embodiment, Physio-MIMI is designed to be During development, execution of Use Cases and an analy focused on breaking new grounds in data integration and data 15 sis of the feedback along with performance and risk analyses access, rather than building on existing frameworks with revealed potential bottlenecks and reliability issues in two incremental enhancements. This ambitious goal was embod areas—EDF file downloading and routing of service requests ied in the novel uses of ontology for directly driving the through servers at distributed locations. federated query interface VISAGE and for integrating File Download: After query results were retrieved, Physio autonomous data resources through the database to ontology MIMI provided a way for associated study files (in EDF mapper. These uses were beyond the traditional role of format) to be downloaded for each of the matching study ontologies for terminology standardization and data subject records. In the initial design, the files from each of the exchange. To provide flexibility in reusing the same frame data sources were first transmitted to the Honest Broker Core work beyond sleep medicine with ontology as a plug-and and then onto VISAGE where they were compressed into a play component, additional aspects of the terms were cap 25 single zip archive and sent to the client’s desktop. The files tured. These include value type, min-max values, and units themselves were quite large, approximately hundreds of conversion. These additional aspects resulted in a Physio megabytes, and therefore file transfers were often slow, espe MIMI-style domain ontology framework, for which the Sleep cially in situations of low network bandwidth. In addition, Domain Ontology developed specifically for this project compressing a collection of large files on-the-fly exerted sig served as the first and primary example. 30 nificant CPU workload on the servers. Repeated query con In one example, in developing the Domain Ontology (DO), taining overlapping records translated to redundant work in Use Cases were created for identifying an initial list of about handling multiple download requests. To overcome this, a 50 domain-related terms covering, but not limited to, labora design change to eliminate the “middleman:” a mechanism tory findings, time intervals, disorders, procedures, medica using a token-based session authentication procedure was tions and Summary measures. A set of ontological modeling 35 developed to allow for direct download of EDF files from the principles were followed in the development of DO: (1) reus data sources to the clients. This process removed the depen ing existing reference ontological terms when available, (2) dency on the HB Core and VISAGE in file downloading by conforming to standard frameworks, and (3) striving forgen providing a direct, and yet secure, path between the file server erality and reusability. Following this set of principles, the and the end-user. standard ontological systems such as SNOMED-CT and 40 Service Workflow Dependency. In one embodiment, the FMA were systematically reviewed for possible reuse of implementation of Physio-MIMI, service requests were made existing terms. Although SNOMEDCT contained over 300, through the VISAGE, the HB Core server and the HB 000 concepts, its coverage of the domainterms was poor. The Adapter(s) attached to data sources at various institutions. two unique intended roles of ontology for Physio-MIMI This created centralized service nodes which could make the entailed that a wholesale import of SNOMED-CT and FMA 45 overall system less robust. The service architecture was re terms into Physio-MIMI would not likely be cost-effective. designed, eliminating the HB Core and transferring its ser Additionally, the specific Physio-MIMI-style domain ontol vices to VISAGE. As a result, VISAGE interacted directly ogy framework needed for driving the VISAGE interface with HB Adapter instances. This modified service-request implied minimal value in Such a direct import. Therefore, a routing strategy had the advantage in (a) providing a direct segmentation algorithm was used to extract a set of limited 50 path between VISAGE and the various HB Adapters, and (b) terms from the two reference ontologies. To improve effi allowing for the deployment of multiple instances of VIS ciency and interoperability, we used an open ontology frame AGE servers if so desired. Multiple server instances of VIS work for developing the DO, drawing upon concepts and AGE, with a pre-coordinated configuration, would increase structure within, but not limited to, Basic Formal Ontology the capacity for handling large number of simultaneous query (BFO), Ontology for General Medical Science (OGMS) and 55 requests, allow for load-balancing and intelligent routing for the Computerized Patient Record (CPR) ontology in addition data transfer by taking account of the network proximity to the integration of reference ontologies such as FMA. Term between users and data sources. definitions were provided by domain experts, Supplemented Initially, the MetaServer component consisted of an aggre with information from reference handbooks and other web gate of technologies likely needed, but there were significant SOUCS. 60 lack of details translating the conception and into an imple Physio-MIMI has been conceived as a distributed system mentation. VISAGE helped guiding the implementation by with modular components providing different services using the desired user interfaces, with usability and user experience a Service-Oriented Architecture (SOA). The proposed high a priority for interface design. VISAGE, served as a galvaniz level Physio-MIMI architecture consisted of the Meta Server, ing fulcrum to critically examine the relevant features of the the Data Server and the Application Server. The Meta Server 65 system and to also use external feedback to further refine the was instantiated and refined through the VISAGE interface. model. Each and every feature of VISAGE Query Builder, The Data Server referred to the collection of data sources, Query Manager, Query Explorer, Ontology Browser, Data US 8,856,169 B2 19 20 base to Ontology Mapper were developediteratively result The following is an example of an important data type for ing in continuous prototyping, testing and demonstrating to Physio-MIMI that shows its capabilities, but is in no way a end-users. In this sense, VISAGE provided a roadmap for limitation on the types of data that may be used. Polysomno agile implementation, focusing on the front-end and then grams (PSGS) is an example of an important data type for drilling down to the back-end. The end result was a product Physio-MIMI since they are recordings of time series data of that, in spite of back-end architectural changes and refine multiple concurrent physiological signals and thus represent ments, contained few interface overhauls. a model for many other data types in medicine (e.g., electro VISAGE allows the end-user to build queries one ontologi encephalograms, electrocardiograms, actigraphy, ambula cal concept at a time. In one embodiment, a primitive query tory blood pressure, etc.) A federated approach for data inte 10 gration in Physio-MIMI helps to deal with their large size (e.g., “age between 40 and 45') can be generated by selecting (from 1 to 20 GB per recording). The value of such data for a term (e.g., “age') from the DO browser and specifying clinical research relates to the ability of the user to access and desired constraints by dragging or clicking on the automati analyze the primary physiological signals and cross link these cally generated widgets (e.g., clicking and dragging on the to files that contain well-defined annotations and clinical slider bar to highlight the interval 40, 45). Primitive queries 15 covariate data. Since existing Electronic Health Records have can be grouped, reordered, or negated. Each primitive query limited capability to accommodate the types of time series results in a count, and the combination of two primitive que data needed to describe data phenotypes such as, but not ries, (e.g., “Age between 40 and 45” AND “BMI between 39 limited to, sleep, developing improved tools for extracting and 42), also results in a count. Under the hood, for this relevant information from clinically available PSG reports for combined query, VISAGE sends the HB Adapter three the purposes of characterizing patient populations for tar abstract queries in order to obtain these three counts—one for geted study recruitment or outcome studies is useful. each primitive query and one for the conjunction. Every time The following is an example of an important data type for a user modifies any part of a query, VISAGE sends the HB Physio-MIMI that shows its capabilities, but is in no way a Adapter abstract queries for all the Subqueries all over again. limitation on the types of data that may be used. Access to the In one embodiment neither VISAGE nor HB Adapter caches 25 PSG files may be facilitated in Physio-MIMI through the query results. In another embodiment, either system could following steps: (1) using VISAGE query to identify indi cache the result of each subquery. Therefore, since a VISAGE viduals within given databases that met criteria for designated client is typically interfaced with multiple HB Adapters, analyses and who had PSG records available for download query results may be cached on the side of HB Adapters. ing; (2) using newly developed EDF Application Tools to A Domain Ontology (DO) is created that addresses the 30 de-identify and normalize the associated EDF files down requirements of the present Use Cases. This results in the loaded through the VISAGE file links; and (3) using the development and elaboration of many terms (e.g., for a Sleep Application server to assist with signal analysis of the EDF Domain Ontology over 400 sleep terms, over 140 medication files and provide output for analysis. terms and over 60 measurement units related terms). In one In one embodiment, Physio-MIMI provides a one-stop embodiment, some of the terms definitions may be quite 35 place for (1) institutions to make available de-identified clini complex and push the editing tools to their limits. In one cal data Such as, but not limited to, sleep data and any other embodiment, information regarding a device used for a mea clinical data in a web-based, queriable format; (2) researchers Surement Such as, but not limited to, blood pressure may be from participating institutions to register; conduct feasibility needed. In another embodiment, Supplementing completely searches; apply, secure and register IRB approvals; download specified terms with common names may be needed. In 40 analytical tools; and conduct approved studies with access to another embodiment, terms are processed that are defined de-identified data originally collected for clinical and/or differently across laboratories and had changed over time. research purposes, including, but not limited to, the raw Data dictionaries often contain variables that are derived from physiological polysomnography data. Investigators would base line data, resulting in a proliferation of terms. Proper have permission to perform queries and extract the de-iden ontological modeling (i.e., constructing a Sustainable, usable 45 tified data from the aggregate of data sources for the purposes domain ontology) requires more than a direct importing of of feasibility studies, data mining and outcome/quality con existing variables from data dictionaries. In one embodiment, trol studies. Data would be stored in a format to enable online a lesser amount of pre-coordination has the benefit of keeping queries of the structured data. The results of the queries, the ontological system concise, although this does require however, would be returned to the researcher in a de-identi careful refactorization to ensure coverage. In another 50 fied format, (i.e., all data within the boundaries of an institu embodiment, ontological modeling may need to be guided by tion would be identified but would be scrubbed and de-iden usability and the overall user experience. tified prior to sending it outside the institutional boundaries). In one embodiment, Ruby on Rails (RoR) may be chosen In one embodiment, preparing specific de-identified data as the main development environment for VISAGE. How sets available as data sources thus removing the need for ever, Java and .NET may also be selected. RoR’s built-in 55 dynamic de-identification was implemented to help over features of migration of data models, Model-View-Controller come the fact that current hospital-based IRB and medical framework for database-backed web applications, convention record access policies at institutions may not allow an inves over configuration, and seamless integration of relational tigator at one institution to directly query contents of another databases and object-orientation, not all unique in RoR, are institution's clinical database. valuable for the agile development of a project. However, 60 In another embodiment, to minimize dependencies of the RoR's does not have the status of being as mainstream as Java most restrictive regulatory processing requirements the and .NET, and there may be a shortage of RoR developers. In remote data access task may be divided into two parts. An another embodiment, VISAGE may be completely recoded in end-user should be able to get onto VISAGE, elaborate the Java or .NET. However, tens of thousands of applications query for their search with the user-friendly software that the around the world are running in RoR http://rubyonrails.org/ 65 project has developed, and then automatically send that query applications) with no sign of slowing down in its prolific use with a “data request' to a human receiver at each of several in web application development any time soon. targeted institutions with databases of interest. The recipients US 8,856,169 B2 21 22 could then wholly internally—execute that query using regular webinars and training sessions; and (d) face-to-face VISAGE to their own databases, and return information (i.e., workshop of an initial user community to share the experi number of qualified subjects, etc.) to the original inquiring ences and develop a Physio-MIMI user group community. investigator. Transmittal of the actual data at Some later point In one embodiment, Physio-MIMI emphasizes end-user would of course require IRB approvals and inter-institutional priority by allowing the end-user to specify a list of needs and data transfer agreements, but once established, the Physio requirements for the system. Once these requirements have MIMI system would again facilitate the process by extracting been enumerated, the end-user prioritizes the list by putting the relevant data, identifying studies to be transferred, de the requirements that are needed most at the top of the list. As identifying the data, and getting them ready for what should the project advances through product iterations, this list is be a simple human verification that the data are successfully 10 updated and re-prioritized. Features that cover a larger scope de-identified. The data would then be transmitted to the origi of the project are placed into the release backlog. During each nal requesting investigator. The entire process should be release cycle a Subset of the release backlog is implemented to acceptable to IRBs. meet project milestones and deliverables. Features that cover In one embodiment of Physio-MIMI, tools were developed a smaller scope are place in the iteration backlog, and usually that would facilitate standardization and de-identification of 15 represent atomic portions of the features within the release EDF files, translating vendor specific annotations to a com backlog. mon Scheme and visualize polysomnograms or other related Using the release and iteration backlog, Physio-MIMI then data. There is a desire to batch process multiple files, but in makes use of the next two principles of agile development. At previous iterations the processing of large (100 MB+ files) the beginning of each iteration, an iteration planning meeting may be painstakingly slow for an interactive system. Second, is set to assign specific features and tasks to developers. The cross-platform testing on Windows OS may cause user inter iteration planning meeting includes both the end-users and face inconsistencies with the rendering on the Mac OS sys the developers, and focuses on making Sure that every devel tem. Third, attempts to distribute the application tools to oper on the team has a balanced work load for that iteration. partner sites may lead to issues with software distribution In order to assure that no developer is overburdened, each versioning and licensing. In one embodiment, the entire 25 developer fills in an estimate of how much time each task will application Suite was ported to Java-essentially redevelop take the developer. If the developer does not have enough time ing the applications all over again. Java is a mature and freely for the tasks assigned to him, the high priority items are available programming language Suited for cross-platform selected first, and the rest are pushed into the following itera applications. Its performance has been proven in many mis tion. One of the important parts of agile development is that sion-critical applications. Having EDF tools developed in 30 the iteration and releases do not get pushed back, and instead Java has the advantage of portability, expandability, and reus that the features and tasks are designed to be completed ability. APIs for the tools were desirable for batch processing within the iteration timeframe which is generally two to three and are feasible options for implementation in Java. weeks. In order to eliminate uneven workloads, the iteration Data sources were brought in incrementally one by one planning meetings use the task estimates to make Sure that when ready to be shared. This complements well with the 35 each developer has a similar balance based on the developer's agile approach in rapid prototyping and iterative updates. availability during the iteration timeframe. The use of the This underscores the principle that data sources that are ready agile principles has allowed the rapid development of Physio early on can be made available in earlier prototypes, and as a MIMI. project progresses, more data sources becoming ready can be Both iteration and release planning are an important part of integrated for the testing and evaluation of a more extensive 40 the agile Software development cycle. Iteration planning is set of system features. In contrast, the Waterfall approach done at the beginning of each iteration which can last for a would be more compatible with a data warehouse framework, period of time such as, but not limited to, two to three weeks, where a complete design and implementation of the data and release planning is done at the beginning of a release warehouse framework must precede the data uploading. In cycle which can contain for example, but not limited to, two one embodiment, once implemented, the common data model 45 to six iterations. In one embodiment, for Physio-MIMI, Rally is not meant to be frequently updated. was used as the tool for iteration and release planning. While Physio-MIMI is a system designed to be generally appli Rally is not the only tool that allows for agile development cable. There are many uses for this general applicable tool. management, it most closely follows the workflow of the Two of the many uses are: agile development process used for the Physio-MIMI project. In one embodiment, expanding the application scope of 50 Rally makes strong use of the agile principles and terminol Physio-MIMI in two ways: (a) enhancing Physio-MIMI's ogy. In one embodiment, an example of, but in no way is domain-ontology and mapping interfaces to Support system limiting, the terminology used in Rally are: product owner, atic, incremental transformations of existing disparate data delivery team, backlog, features, and tasks. Each of these dictionaries (ranging from Neurology, Urology to Cardiol terms has a corresponding link in the Physio-MIMI project. ogy) into Physio-MIMI-style domain ontologies and facili 55 In one embodiment, the product owner in Physio-MIMI is tating the sharing and dissemination of the domainontologies often the end-user who uses the services provided by Physio through for example, but not limited to, NCBO; (b) piloting MIMI. Since Physio-MIMI is divided into VISAGE and the the repurpose of VISAGE by i) using it as the query interface Honest Broker Adapter, VISAGE in turn becomes an end for legacy or in-progress studies and ii) studying the cost user of the services provided by the Honest Broker Adapter. effectiveness of a Physio-MIMI-style federation of locally 60 Therefore, a lot of the development of the messaging and mirrored databases as an alternative institutional data ware types of services between VISAGE and the Honest Broker house model. Adapter were driven by needs within the VISAGE interface Accelerating the dissemination of Physio-MIMI to the itself. VISAGE in turn was driven by the needs of the larger CTSA community through (a) improved technical and researchers of the system who access the underlying data via user guides; (b) enhanced public web-site physiomimi.ca 65 the abstract query interface. se.edu that provides access to a live demo system and blog In one embodiment, the delivery team is the team respon space for sharing experiences and providing feedback; (c) sible for creating the functionality within the system. In US 8,856,169 B2 23 24 Physio-MIMI, there were three primary delivery teams, the ration for an instance of Physio-MIMI that makes use of Honest Broker Adapter developers, the developers of VIS secure https along with url rewriting for a public instance of AGE, and the domain experts in charge of developing the the Physio-MIMI wiki. sleep domain ontology. These teams work closely together to The integration allows for the use of the underlying tech create a cohesive product that was guided by the product nology already provided by Apache along with additional backlog. features provided by Phusion Passenger. The primary benefit In one embodiment, the product backlog contains all the ofusing Phusion Passenger is the its ability to start, stop, and requirements and goals of the Physio-MIMI project that are restart servers dynamically and transparently without requir strongly connected to the project milestones and deliverables, ing additional ports to be opened, or additional server load and release and iteration features. The product backlog is 10 balancing to be specified. Phusion Passenger provides a prioritized by the end-users and then features and tasks are robust platform for deploying a production-ready Ruby on moved into the iteration and release during the corresponding Rails application. planning meetings. In one embodiment, the code base is stored in a GIT code In one embodiment, the features are broken into tasks by repository to handle the complex requirements and specifica the delivery team in order to judge the time requirements of 15 tions of Physio-MIMI. A GIT repository allows for flexible completing a feature. A feature whose tasks span multiple handling of requirements that are produced in an agile soft iterations is split into Smaller features which are then com ware development environment. pleted within the iteration timelines. GIT allows for a number of branching strategies, each of One of the important features of agile software develop which can be used effectively in differing scenarios. The ment is the ability to showcase the new features iterations at branching strategy chosen for VISAGE is flexible, easy to the end of the iteration. At this point, the end-user can see the maintain, and has a Small learning curve for new developers features that were implemented, and either accept or reject on the project. In the Physio-MIMI code base the master them. This immediate feedback allows for the creation of a branch is responsible for the production ready code. When product that is very similar to that which is expected by the ever new features for a release are generated, the source code end-user. The ability to demonstrate a functioning system at 25 is branched from the master branch for development. This the end of each iteration is a vast difference to the develop branch is then tagged with the next release number for that ment cycle of the waterfall model, which puts a lot of effort branch. Once the features in this branch have been fully into the initial requirements specification. The Rally system tested, the branch is then merged back into the master branch. allows for the efficient planning and management of Physio The development branch is then tagged an is maintained as a MIMI which contains multiple development teams at differ 30 development checkpoint. During the time of new feature ent institutions, multiple end-users, and a large two year development, an end-user may find a critical bug in the master project scope for deliverables. branch. In this case, an additional bug fix branch is created In one embodiment, Ruby on Rails provides a solid frame from the master branch. Once the fix is in place, the bug work for designing and rapidly prototyping web applications branch is merged back into the master branch, and the devel based on the Ruby programming language. The Ruby lan 35 opment branch then pulls these changes from the master guage itself is a concise pure object-oriented language that branch using a merge operation. The bugfix is merged into the allows for generating complex code with few lines. The Ruby development branchin order to ensure that no regression bugs on Rails framework uses Convention over Configuration. are reintroduced when the feature branch is merged back with Convention over Configuration is used in programming to the master branch. FIG. 11 shows this process. limit the amount of code that needs to be written to accom 40 In one embodiment, Physio-MIMI is tested and deployed plish a certain task. A programmer only needs to write addi in three environments, one for development, one for quality tional code if he is trying to achieve something that is not assurance, and one for production. The development environ using the convention provided by the Ruby on Rails frame ment is on the Software developer's local machine, and can work. For example, the convention for a foreign key relation switch between branches within the GIT repository. Most ship in the Ruby on Rails framework expects the foreign key 45 often, the development environment is focused on the latest to be in the format tablename id. An example comparing the development branch, or in the case of a bugfix, is focused on simple relationship using the convention versus a program the bugfix branch. The quality assurance environment exists mer not using the convention is shown in Example 2. on a machine that is accessible to the developers and end Ruby on Rails is based on and expands the Model-View users in charge of quality assurance. The quality assurance Controller (MVC) framework. The MVC framework is used 50 environment is always focused on the branch that will be to separate the functionality of code within a web-based sys pushed into production with the next release. Finally, the tem architecture. The model is used to access the underlying production environment is for end-users working with real database items and provides methods, relationships, scopes, underlying data stores. The production environment is always and instantiations of the underlying data. The controller is on the master branch, and is never updated without thorough used to route incoming web-browser messages to the appro 55 testing of a new feature within the development and quality priate actions. The view is used to generate a result for the assurance environments. Bugs found within the production requesting web-browser using a template with embedded environment are given high priority during iteration and Ruby. The view may pass back information to the web-server release planning meetings. Therefore, in one embodiment, using HTML or XML. the GIT code repository paired with agile software develop In one embodiment, Physio-MIMI uses the Phusion Pas 60 ment creates an environment for rapid prototyping for senger gem to tightly integrate the Ruby on Rails server Physio-MIMI. directly with Apache 2. Phusion Passenger is a production In one embodiment, we created a versioned branch solely grade web server system that allows Apache 2 to load and devoted to updating the code base, developed in Ruby on serve Ruby On Rails applications. Phusion Passenger adds Rails 2.3, to Ruby on Rails 3.0 using the branching strategy. methods to the Apache virtual server, and it also transparently 65 The primary motivation behind this update is to allow Physio handles load balancing and server management. An example MIMI to work using the latest technology. In the case of Ruby of the code displayed in Example 3 shows a sample configu on Rails 3.0, the update provides better coding practices, US 8,856,169 B2 25 26 easier installation of VISAGE, and a more flexible login Usage Analysis system for VISAGE. Many of these updates also inherently Using MIMI's usage-statistics compilation capability, a reduce the lines of code required to accomplish specific tasks usage analysis of MIMI at the Case Center for Imaging which creates a more readable code base for new developers. Research (CCIR) was performed. Since its initial deploy Better coding practices are created through the emphasis ment, MIMI has served approximately 150 principal investi on unobtrusive JavaScript in Ruby on Rails 3.0. Unobtrusive gators, collaborating investigators, and research assistants. JavaScript uses the same idea presented by CSS when CSS During this period, a total of approximately 1,600 distinct was first created for HTML web pages. CSS presents the sessions have been Scheduled through MIMI, spanning an notion that web-page style information should be removed 18-month period or 400 working days. This translates to four from the content or HTML portion of the web-page. This 10 scheduled sessions per working day. Among all sessions, half procedure allows for more readable source code of web are linked to Scientific data. This entails that imaging data pages. This idea is then propagated to be used with JavaScript. have been transferred to the Data Server using MIMI's data With the event of Web 2.0 applications, JavaScript has flow process at the frequency of two times per working day. become more heavily used within web-pages. Unobtrusive Users also typically download the acquired data on the same JavaScript enforces the idea that JavaScript should not be 15 day, so data downloading through the Meta Server occurs present in the content or HTML portion of the web-page. about two times per working day. This does not include data With this new coding practice in place, the content for a downloading activities by collaborators or repeated data web-page resides in the HTML, the styling information downloading afterwards for various reasons. During the same resides in an associated CSS, and the JavaScript is stored in JS period, MIMI cumulated 1.2 terabytes of fresh imaging data, files. An example of how unobtrusive JavaScript can be used which translates to a data acquisition rate of 3 gigabytes per to clarify code within the HTML is shown in FIG. 12. working day. FIG. 12 demonstrates how the use of Unobtrusive JavaS The distribution of accrued content objects during the cript creates cleaner HTML documents. 18-month period of MIMI's content object and data statistics In one embodiment, VISAGE is installed using the new comprise a content type and size of Registered users 150; gem bundler with the assistance of the update to Ruby on 25 Projects 125: Groups 120: Sessions 1,600; and Acquired Rails 3.0. Gems are external dependencies that provide addi images 1.2TB. With respect to the anticipated capacity, the tional functionality to a Ruby on Rails project. An example of Meta Server is expected to be able to handle over 1,000 gems used within Physio-MIMI, but in no way is a limitation, registered users, 500 projects, 1,000 groups, and 10,000 ses provide a login system, pagination, MySQL and SQLite 3 sions. The Data Server is designed to maintain 20terabytes of adapters, form auto-complete functionality, and calendar date 30 online data. select functionality. While most of the gems are straightfor Cost-Benefit Analysis ward to install, a few of them, such as the MySQL and SQLite In addition to figuring out the intricacies behind the prior 3 gems, require compilation before being available to the (status quo) practice, the corresponding cost estimates were Ruby on Rails project. Ruby on Rails 3.0 provides access to gathered. A difficult part of the cost-benefit analysis involves updated versions of these gems that are more Straightforward 35 the accurate and realistic estimation of the time spent on tasks to compile across operating systems. with the status quo. We caution the reader that, although it was In one embodiment, Ruby on Rails 3.0 offers a flexible attempted to get as precise an estimation as possible, there are authentication gem called Devise. Devise provides a number inherent reasons for some of the estimated figures to be based of authentication features such as, but not limited to, provid on rules of the thumb only. ing open authentication (OAuth2) login Support, logging out 40 In carrying out the cost-benefit analysis, Some existing users after a certain time of inactivity, functionality to reset examples were followed such as Grady's analysis of an inte forgotten passwords, and the ability to remember the user grated telemental health care service for the military, Wang et using a remember token. al.'s cost-benefit analysis of electronic patient medical By providing branching strategies, GIT allowed the records, and Erdogmus approaches of cost-benefit analysis Physio-MIMI project to be transparently upgraded to Ruby 45 of software development. on Rails 3.0 without affecting the ability to fix bugs within the The cost-benefit analysis has focused on directly account production environment. The use of Ruby on Rails 3.0 allows able tasks from the view of the CCIR. This is an underesti VISAGE to be on the front end of development, as the most mate because all the users of the MIMI system receive a active community development occurs with the latest releases fraction of similar benefits on a regular basis as well. of Ruby on Rails. Ruby on Rails 3.0 provides Physio-MIMI 50 Financial Benefits with access to this rich development environment and inno Three main tasks have been used for the cost-benefit analy Vative community. sis: scheduling, data distribution, and performance statistics The present invention provides an apparatus and method compilation. that overcomes the many challenges associated with signifi Session scheduling. The status quo procedure for schedul cant information management challenges in modern scien 55 ing imaging sessions involves three steps: 1. A researcher tific research and more specifically biomedical research. contacts a CCIR staff member using e-mail or phone; 2. The researcher and Staff member work out an amenable time; and EXAMPLES 3. The staff member schedules an imaging session for the researcher and sends out a notification. Example 1 60 Each step is further analyzed to estimate the administrative time spent for scheduling an imaging session. The first step is Below is an example of MIMI's usage analysis followed by the responsibility of the researcher and does not occupy a cost-benefit analysis. The usage analysis gives a profile of administrative time. During the second step of the process, MIMI's usage statistics over an 18-month period with respect the CCIR staff member communicates with the researcher to the number of users, imaging sessions, and Scientific data 65 and searches a calendar system for open time slots (the CCIR uploads/downloads. The cost-benefit analysis demonstrates used Microsoft Outlook Calendar for scheduling imaging MIMI's benefits in comparison to a status quo. sessions). We estimate that the second step takes about 2 US 8,856,169 B2 27 28 minutes of a staff member's administrative time. This time is about 30 people annually. The 150 who are currently using obtained as the average of the estimates for phone and e-mail MIMI in the final training cost were also included. It is also communication. The third step involves the entry of pertinent assumed that the annual salary of the training personnel is data into the calendar system by the staff member. We esti approximately $36,000, training personnel work 2,000 hours mate that the third step uses an additional 0.5 minutes on 5 annually, and a training session lasts about 2 hours and trains average because a valid time slot is already determined in step 10 users. With these assumptions, we calculated the cost of a two. In total, we estimate that scheduling an imaging session training session to be S36 per 10 users (S36,000 per year/2. takes approximately 2.5 minutes of a staff member's time. 000 hours per yearx2 hours per training session), which Based on Usage analysis, assuming that that the CCIR aver equals about $3.60 per user. It was estimated the cost of ages about four imaging sessions per working day, this trans 10 training MIMI's initial 150 users to be $540 ($3.60 per userx lates to an estimated 10 minutes of administrative time. 150 users). It was also determined that training 30 users per Assuming the lower end of 260 working days per year and year incurs an annual cost of approximately S108 (S3.60 per S18 per hour for a low-level administrative staff, the CCIR’s userx30 users). annual cost for the low-level administrative staff would be S780. However, a low-level administrative staff cannot 15 SUMMARY handle all the responsibilities of scheduling. A high-level scientific staff member with in-depth knowledge of the imag Table 1 shows a summary of MIMI's cost-benefit analysis. ing systems is involved in final decision making to oversee The annual financial benefits and costs totals are S117,780 scheduling management, resolve scheduling conflict and and S1,208, respectively. MIMI also incurs an initial total cost manage the data distribution. This cost is combined in the data of S103,840. A very rough formula for the overall financial distribution cost. gain after a period of n years is: F(n)=S116572n-S103840. Data distribution. Based on the prior practice at CCIR, a high-level scientific staff member spends half of the time to TABLE 1 oversee Scheduling and manage data distribution based on file A Summary of MIMI's Cost-Benefit Analysis sharing. During a typical working day, the high-level Scien 25 tific staff member is inundated with requests for rescheduling. Occurs The member must also set up user accounts for data distribu Benefit (S) Cost (S) Annually tion via file sharing. Users who do not have direct access to Scheduling and 45,780.00 Y the CCIR network are an additional burden for the high-level data distribution staff member because their PCs require time-consuming 30 Performance 72,000.00 Y updates to access the CCIR network. The staff members statistics Development and 100,000.00 N salary is around S90,000, and the adjusted annual cost will be implementation around $45,780. Initial hardware 3,300.00 N Compiling performance statistics. Performance assess Hardware updates 1,100.00 Y ment and resource usage analysis is essential for justifying 35 Initial training S4O.OO N the continued investment and funding for a core facility. This Further training 108.00 Y has been a time-consuming task usually involving two steps: (1) locating relevant documents in paper or electronic format, and (2) going through the documents, extracting the pertinent With three specific time points as input samples for the information, and Summarizing the performance statistics. formula, we find that foregoing the status quo methods and With the status quo and based on the practice that compilation 40 using MIMI over time periods of one, two, and three years is performed on a monthly basis, this amounts to a full-time yields progressive financial benefits of S12,732, S129,304. job for two administrative staff members. Assuming that that and S245,876, respectively. About one million dollars can be an administrative staff members salary is approximately saved along this trajectory within 10 years. Again, this saving S36,000, the performance statistics task with the status quo does not account for overhead savings provided by MIMI for incurs an annual cost of S72,000. 45 the users in data transfer and sharing. Using MIMI, performance statistics are compiled auto matically. The administrative time needed amounts to log Example 2 ging into MIMI, issuing a performance Summary query, and saving the results. Assuming that such queries are performed A preliminary evaluation was performed on the efficiency no more than several times a week, this incurs negligible time 50 of VISAGE for query construction. Three common queries for a staff member. with increasing levels of logical complexity on patient demo Costs graphics were selected. Two expert users created the queries The costs for using MIMI are of two kinds, nonrecurring in both VISAGE and the i2b2 web client, respectively. The and recurring. Nonrecurring costs include the cost for devel number of clicks and time needed for creating the queries opment and implementation. They also cover hardware and were recorded and tabulated in the next table. Software costs. Recurring costs include hardware upgrades 55 As can be seen from FIG. 9, VISAGE reduced time and and user training. MIMI's development and implementation effort (in terms of the number of clicks) to a half or nearly a cost is approximately $100,000, with $50,000 for a full-time third. However, this evaluation is preliminary and only looks programmer, and S50,000 for a half-time supervisor for one specific aspect of the query interface. design and specification. The software cost for MIMI is SO because MIMI is built completely on open-source software 60 Example 3 that requires neither purchasing fees nor licensing costs. MIMI also incurs a hardware cost of approximately $3.300. An example of a simple relationship using the convention The hardware cost includes a primary server computer and may look as follows: installation fees. Assuming that the server computer is # Database Table (Column) Structures replaced every three years results in an estimated annual cost 65 books.id of S1,100. The estimation of the cost for user training is based chapters.id on the assumption that the CCIR increases its user base by chapters.book id US 8,856,169 B2 29 30 H Model Definition Although the invention has been described with reference class Chapter-ActiveRecord::Base to certain embodiments detailed herein, other embodiments belongs to book can achieve the same or similar results. Variations and modi end fications of the invention will be obvious to those skilled in class BookActiveRecord::Base the art and the invention is intended to cover all such modi has many chapters fications and equivalents. end A programmer not using this convention, and instead using What is claimed is: the following table structures, would need to specify the foreign key explicitly. 10 1. A multi-modality, multi-resource, information integra # Database Table (Column) Structures tion environment system comprising: books.id (a) at least one computer readable medium capable of chapters.id securely storing and archiving system data; chapters. BOOKID (b) at least one computer system, or program thereon, H Model Definition 15 designed to permit and facilitate web-based access of the class Chapter-ActiveRecord::Base at least one computer readable medium containing the belongs to book, foreign key=>“BOOKID secured and archived system data; end (c) at least one computer system, or program thereon, class BookActiveRecord::Base designed to permit and facilitate resource scheduling or has many chapters management; end (d) at least one computer system, or program thereon, designed to monitor the overall resource usage of a core Example 4 facility; and (e) at least one computer system, or program thereon, Below is an example of a configuration file that shows how 25 designed to track regulatory and operational qualifica Phusion Passenger can be integrated with an Apache virtual tions, SeVe. wherein the multi-modality, multi-resource, information # Loads the compiled passenger module for Apache integration environment system achieves the following LoadModule passenger module /usr/local/ . . . ?apache2/ results: (i) permitting a user to access to the at least one mod passenger. So 30 multi-modality, multi-resource, information integration PassengerRoot /usr/local/... gems/passenger-3.0.0 environment system to determine if a user profile exists PassengerRuby fusr/local/bin/ruby and, if necessary, permitting a user to create a user pro # Number of Simultaneous Servers file if the desired user profile does not exist; (ii) assign PassengerMaxPoolSize 60 ing at least one user role to a user profile; and (iii) # Servers Are Always On 35 permitting continued access to the at least one multi PassengerPoolIdleTime 0 modality, multi-resource, information integration envi Listen 443 ronment system based on the user profile in combination