MTR-MNI-000-012 MITRE TECHNICAL REPORT Implementation Recommendations for MOSAIC: A Workflow Architecture for Analytic Enrichment Contract No.: DAAB07 -01 -C-C201 Analysis and recommendations for the Project No.: 0710N7AZ-SF implementation of a cohesive method for orchestrating analytics in a distributed model Approved for Public Release. Distribution Ransom Winder Unlimited. 12-2472 Nathan Giles Joseph Jubinski ©2010 The MITRE Corporation. All Rights Reserved. July, 2010 (updated, February, 2011) MOSAIC – Implementation Recommendations The MITRE Corporation, 2010-2011 Contents Introduction ..................................................................................................................................... 3 Architectural Goal of MOSAIC .................................................................................................. 3 Architectural Options for MOSAIC............................................................................................ 4 Case Study: METEOR .................................................................................................................. 10 Tightly Integrated Architecture Technology Analysis ................................................................. 12 Recommendation ...................................................................................................................... 13 Discrete Process Architecture Technology Analysis .................................................................... 13 Discrete Process Architecture Technology Analysis: Interface................................................ 14 Discrete Process Architecture Technology Analysis: Inbound Gateway ................................. 14 Discrete Process Architecture Technology Analysis: Executive .............................................. 15 UIMA as Executive............................................................................................................... 16 OpenPipeline as Executive ................................................................................................... 17 Mule as Executive ................................................................................................................. 18 LONI or Ptolemy/Kepler (Scientific Workflow Projects) as Executive ............................... 19 Decision Points in Workflows across Possible Executive Options ...................................... 20 A BPEL Engine Executive? .................................................................................................. 21 Other Options for Executive ................................................................................................. 21 Recommendation .................................................................................................................. 21 Discrete Process Architecture Technology Analysis: Data Bus ............................................... 21 Flat File System as Data Bus ................................................................................................ 22 Alfresco as Data Bus ............................................................................................................. 22 ObjectStore as Data Bus ....................................................................................................... 23 Recommendation .................................................................................................................. 23 Discrete Process Architecture Technology Analysis: Analytics .............................................. 25 Specific Analytics and Analytic Workflow .......................................................................... 27 Analytic Pipeline ................................................................................................................... 28 Discrete Process Architecture Technology Analysis: Adapters ............................................... 29 CAS as Common Interchange Format .................................................................................. 32 GrAF as Common Interchange Format................................................................................. 33 Possible Basis Ontologies ..................................................................................................... 33 Summary of Recommendations .................................................................................................... 33 Takeaways................................................................................................................................. 34 1 MOSAIC – Implementation Recommendations The MITRE Corporation, 2010-2011 Glossary ........................................................................................................................................ 36 Appendix A: .................................................................................................................................. 39 A.1: Collection Reader Code and XML ................................................................................... 39 A.2: Tokenizer Wrapper Code and XML ................................................................................. 41 A.3: Decoder Wrapper Code and XML .................................................................................... 44 A.4: Flow Code in UIMA ......................................................................................................... 46 Appendix B: .................................................................................................................................. 48 B.1: Tokenizer Wrapper Code .................................................................................................. 48 B.2: Decoder Wrapper Code ..................................................................................................... 49 B.3: Output Code ...................................................................................................................... 50 B.4. JSP Page ............................................................................................................................ 52 Appendix C: .................................................................................................................................. 53 C.1: XML Code ........................................................................................................................ 53 Appendix D: .................................................................................................................................. 55 D.1: LONI GUI Example Workflow ........................................................................................ 55 D.2: Kepler GUI Example Workflow ....................................................................................... 55 D.3: Ptolemy GUI Example Workflow .................................................................................... 56 Appendix E: .................................................................................................................................. 57 E.1: UIMA ................................................................................................................................ 57 E.2: OpenPipeline ..................................................................................................................... 57 E.3: Mule................................................................................................................................... 58 Appendix F: .................................................................................................................................. 60 2 MOSAIC – Implementation Recommendations The MITRE Corporation, 2010-2011 Introduction This is a companion document to MOSAIC: A Workflow Architecture for Analytic Enrichment that describes the current need for integration of document analytics and a general approach to solving this problem. This document directly addresses the implementation issues of the candidate architecture, with specific frameworks for the different architectural subcomponents analyzed and compared. Ultimately, recommendations are offered. Architectural Goal of MOSAIC Figure 1 . The MOSAIC Architecture’s role in a larger system that delivers it input from a Content Provider and consumes its output in a Knowledge Base Architecture. The goal of this effort is to develop a Natural Language Processing architecture to be used by subject matter experts who are researchers and engineers, termed domain expert engineers here. This architecture, titled MOSAIC, is intended to be shared across multiple projects and hosted in the sponsor’s environments and is intended to be compatible with and facilitate a streaming document flow as opposed to execution on a static batch of documents, which would require an entire corpus be present before processing could commence. 3 MOSAIC – Implementation Recommendations The MITRE Corporation, 2010-2011 In order that the overall goal is addressed, several high level requirements need to be met in order for the system to be considered successful. These requirements are specified here: 1. The system shall maintain a consistent overall structure with evolving components, where the consistent structure is the relationship of the framework built around the analytics, which in turn are the chief evolving components, though the individual products that make up the framework shall be replaceable without deteriorating the interoperability. 2. The system shall at the least be able to accommodate a pipeline of analytics such
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages61 Page
-
File Size-