QUICK START GUIDE PROTOCOL DEVELOPMENT INTEGRATION COLLECTION 2016 Copyright Notice

©2015 Dassault Systèmes. All rights reserved. 3DEXPERIENCE, the Compass icon and the 3DS logo, CATIA, SOLIDWORKS, ENOVIA, DELMIA, SIMULIA, GEOVIA, EXALEAD, 3D VIA, BIOVIA and NETVIBES are commercial trademarks or registered trademarks of Dassault Systèmes or its subsidiaries in the U.S. and/or other countries. All other trademarks are owned by their respective owners. Use of any Dassault Systèmes or its subsidiaries trademarks is subject to their express written approval.

Acknowledgments and References

To print photographs or files of computational results (figures and/or data) obtained using BIOVIA software, acknowledge the source in an appropriate format. For example: "Computational results obtained using software programs from Dassault Systèmes BIOVIA. The ab initio calculations were performed with the DMol3 program, and graphical displays generated with Pipeline Pilot."

BIOVIA may grant permission to republish or reprint its copyrighted materials. Requests should be submitted to BIOVIA Support, either through electronic mail to [email protected], or in writing to:

BIOVIA Support 5005 Wateridge Vista Drive, San Diego, CA 92121 USA Contents

Chapter 1: Introduction 1 Run Program 17 Architectural Overview 1 Run Program on Remote Host 18 Extending Functionality 2 SOAP 18 Application Integration Components 3 Chapter 4: Language-Based Components 19 Command-line Integration 3 19 Run Program Components 3 Java 20 FTP 3 .NET 20 SSH 3 Windows Script Host Components 20 SCP 4 Python 21 Telnet 4 VBScript 21 Language-based Integration 4 PilotScript 21 Java Component Development 4 About the PilotScript Language 21 .NET Component Development 4 PilotScript vs. Third-Party Scripting 21 Perl Component Development 4 Custom Manipulator and Filter Components 22 VBScript Component Development 5 Debugging 22 Python 5 Web Services Integration (SOAP Components) 5 Database Integration (with ODBC-compliant and JDBC-compliant Drivers) 5 Integration with Visualization Tools 5 Additional Information 6 Chapter 2: Common Principles 7 Data Records 7 Input Data 7 Output Data 8 Component Lifetime Management 8 States of the Component State Machine 9 Global Data 10 Using Global Data 11 Scope of Global Data 12 Component GUI 12 Component Parameters 12 Component and Parameter Naming 12 Parameter Types 13 Component and Parameter Help Text 13 Client-side Components 13 Load Balancing and Reverse Proxy Deployments 14 Global Properties 15 Job Completion Notification 15 Enabling the Notification Protocol 16 NotificationProtocol Parameters 16 Chapter 3: Command Line and Web Services Components 17 Chapter 1: Introduction

This guide provides a high-level overview of the components and languages you can use to develop your own Pipeline Pilot protocols. It also presents information about the packages and tools that are available for integrating Pipeline Pilot with third-party applications and services. Pipeline Pilot supports a number of approaches to extending its capabilities by adding new components to an existing installation. To integrate functionality from other software products, you can construct new components using original scripts, compiled code, and simple command lines. Resources are available for incorporating the present capabilities on a local server machine or from across the network. Pipeline Pilot offers many generic and domain-specific (pre-packaged) components that you can use to develop your protocols. Details about how to use these features are available elsewhere (Help Center > Users tab). This guide provides an overview of a number of specialty components, especially useful for customization, integration, scripting, and client-side operations, that allow you to greatly extend the capabilities of the pre-packaged components. Tip: When developing protocols, it is recommended that you make frequent use of the validation feature in the client. See User > Pipeline Pilot > Protocol Publishing > Validating Protocols in the help center.

Architectural Overview The diagram below represents a high-level overview of the server architecture. It illustrates how you can work with Pipeline Pilot Client and Pipeline Pilot to extend server-side capabilities with new components or incorporate protocols into other applications using a client-side software development kit (SDK).

Introduction | Page 1 Client integration and server extension architecture

Extending Functionality You can extend the functionality of Pipeline Pilot by integrating third-party data and computational services on the Pipeline Pilot server. The Integration collection is a set of tools and components designed for this purpose. Integration techniques for extending functionality include: Configuring a program execution component to run a program command line on the server Integrating a web service application Customizing a base language component to encapsulate specific functionality in script or in a compiled language (Java, Perl, VBScript, and Python) Integrating an ODBC-compliant database Incorporating third-party visualization tools running on the client (e.g., Excel, Spotfire, and BIOVIA Viewer) Constructing a new subprotocol component using encapsulated pipelines of other components For each of these approaches, a number of specific technologies may be available, depending on the goal of the integration project. A collection (described below) may contain components constructed with one or more of these techniques. When you build one or more related components, it is known as a "component collection". You can package the collection along with binaries, scripts, data files, documentation, etc. for consistent,

Page 2 | Protocol Development • Quick Start Guide managed deployment into an existing installation. When users install components provided as a package, they can subsequently uninstall or upgrade the component collection following simple, standard procedures. The following information contains guidelines for each of the approaches with reference to the set of components that provide the relevant functionality, with particular emphasis on common techniques and concepts across the different groups of components.

Application Integration Components Application integration components include: Command line integration (Run Program, FTP, SCP, SSH, and Telnet) Language-based scripting integration (Java, .NET, Perl, VBScript) Web services integration (SOAP) Command-line Integration You can execute third-party applications and file transfer services as command-line protocol interfaces. The following components are available for this purpose: Run Program Telnet SSH FTP SCP Run Program Components You can extend the functionality of a protocol to include any operation that you can invoke from a command line. Run Program (on Client) and Run Program (on Server) components are available for this purpose. These components provide a way for you to execute an operating system command on the server or on a client. For example, you can write data files, invoke the command line program to work on these files, and then read the results when the protocol is complete. You can also incorporate the STDIN and STDOUT streams into the data pipeline for a smoother flow. FTP File Transfer Protocol (FTP) is used to transfer files over a TCP/IP network (Internet, UNIX, etc.). Components that support FTP include Copy File from Remote Host (FTP) and Copy File to Remote Host (FTP). Use them to copy files from a server on your network to (or from) the server. Since these components use the FTP protocol, a valid user name and password are required as parameter values. Typically, FTP components are used in conjunction with Telnet or SOAP components to run programs on a remote machine. SSH Secure Shell (SSH) is a security protocol for logging on to a remote server. SSH provides an encrypted session for transferring files and executing server programs. SSH supports a variety of authentication methods and it provides a secure client/server connection for applications. Components are available that support SSH. The Run Command on Remote Host (SSH) component runs a command on a remote machine running SSH daemon and encrypts communication with the remote host. A password or key file is required as a parameter value for this component. The Manage Host Trust (SSH & SCP)

Introduction | Page 3 component validates or updates a key signature in a known host file, which allows you to verify the identity of the remote host before sending the encrypted password. SCP Secure Copy Protocol (SCP) is a secure version of the UNIX remote (rcp) command for transmitting files to or from a remote machine. Unlike FTP, SCP encrypts all communication with the remote host. Components that support SCP include Copy File from Remote Host (SCP) and Copy File to Remote Host (SCP). A password or key file is required as a parameter value setting. Telnet Telnet is a terminal emulation method used on the Internet and TCP/IP-based networks. It allows a user to log onto a remote computer and run a program or execute UNIX commands. The Run Command on Remote Host (Telnet) component supports Telnet. Required parameters for the component include username and password. Typically, this component is used with the FTP components to transfer input and result files to and from the remote host. Language-based Integration You can enhance the capabilities of your protocols by including a scripting component that includes a script written in one of the supported languages (Java, Perl, and VBScript). The script writer has access to the contents of each data record passed to the script component and to the relevant protocol properties. This allows you to write components such as data readers, data writers, data filters, and calculators. The script component provides a facility to work within the framework of the program, while having access to the following: The rich code libraries available as Perl modules Client- and server-side software packages that support COM automation Pre-existing script or GUI utilities that you use in your work and integrate with Pipeline Pilot Java Component Development You can write new components in the Java programming languages using the Java (on Server) component. Java is a compiled language, so the component is parameterized with a reference to the class that supports the standard component interface. You write code that handles each data record processed or generated by the component. A full range of component types (readers, writers, filters, calculators, etc.) can be crafted in Java code, allowing you to extend functionality available as third-party Java libraries. Generally, a set of Java classes (or jar files) and component XML are bundled into a package that facilitates deployment onto multiple servers. .NET Component Development You can write new components in any .NET language, such as C# or VB.NET. Components are available for dynamic compilation and execution as well as execution of already-compiled code. The components are .NET (on Server), Dynamic .NET (on Server), Dynamic C# (on Server), and Dynamic VB.NET (on Server). Perl Component Development You can write new components in Perl using the Perl (on Server) component. Perl scripting is supported on both Windows and , making it a useful language to develop platform-independent components. Perl has full access to the underlying component API, allowing you to write components of all types. Use Perl to design components that launch applications, access network resources, interact with the operating system, and access files. For example, if you are integrating a program that requires parameter files or complex parameter settings, use Perl to develop a more sophisticated program

Page 4 | Protocol Development • Quick Start Guide launcher than what is possible with command-line integration. You can save the underlying Perl code as a parameter on the component or as a file for easier reuse and maintenance. VBScript Component Development You can write new components in VB Script using the VBScript (on Client) and VBScript (on Server) components. Scripting with VB Script is supported on Windows using the Windows Scripting Host (WSH). The VBScript components are useful for COM automation such as performing tasks in Microsoft Office applications. The existence of a client version of this component is useful for integrating desktop applications, such as viewers, that users are more likely to install on the client machines than on the server. Python You can write components in Python using the prototype Python (on Server) component. Scripting with Python is supported on Windows only, using the Windows Scripting Host (WSH). This component shares many of the advantages of the Perl component, extending the functionality of Pipeline Pilot in the context of an industry standard programming language. Web Services Integration (SOAP Components) Web services provide a generic way to access remote applications. Once an application is deployed as a web service, clients need not be concerned with implementation details such as programming language or platform. Analogous to a web browser, the client needs only to submit requests to an HTTP address and the data is returned in a standard format. Simple Object Access Protocol (SOAP) is a format for making requests to web service applications. SOAP services consist of methods, each of which has a distinct signature that defines the expected inputs and outputs. The SOAP components convert data records to SOAP requests, send the requests to a SOAP server, collect the results, and add the results to each data record as new properties. In contrast to the Telnet and SSH components, there are no companion SOAP components to transfer files to the remote server. The SOAP request contains both the instructions and the input data, and the output data are returned directly to the SOAP component. Database Integration (with ODBC-compliant and JDBC-compliant Drivers) Relational databases can be accessed through Open Database Connectivity (ODBC) drivers or Java Database Connectivity (JDBC) drivers. These drivers allow connections to a variety of databases such as Oracle, SQL Server, and MS Access that reside anywhere on the network. Connections to these data sources are defined in the Pipeline Pilot Admin Portal. All information for the data source can be specified in the portal (including the necessary login and password) or this information can be provided from Data Service Names (DSNs) that are defined on the server using the ODBC Administrator tool. The Data Source is used in the program’s SQL components along with the necessary login and password for the database. The database components in the program allow you to select, store, delete, and update data in any accessible database source. Integration with Visualization Tools Integration components are available that support standard data visualization tools such as Excel, Spotfire, and BIOVIA Viewer. These components are based around the VBScript (on Client) or Run Program (on Client) components that are used to control the client applications to view results data on the server. The common pattern is to export an appropriate data file on the server and execute an operating system command to start up the visualization software and read in the data file. With some client

Introduction | Page 5 applications, a download step is also required. The scripting components provide a more direct mechanism for the integration of visualization programs that expose a COM automation interface. Following this model, you can extend the deployed client-side visualization tools, or add new ones.

Additional Information For more information about the Pipeline Pilot Integration collection and other BIOVIA software products, visit https://community.3dsbiovia.com.

Page 6 | Protocol Development • Quick Start Guide Chapter 2: Common Principles

Regardless of the approach used to create a new component, you customize the component look and feel by doing the following: Defining new parameters for the component that allow a protocol builder to modify the component's behavior Selecting an icon for the component's graphical appearance Activating a set of input and output ports appropriate to the component behavior Composing a display name Authoring help text for the component and its parameters The final element depends on the type of customization involved. It may involve writing script or code, or configuring a command line or SOAP service. This step defines the behavior of the component associated with the processing of data records, where that behavior is modified by the component’s parameter settings. In taking these steps, you create a new, unique component. An end-user can work with your component in a new protocol in Pipeline Pilot. From the end-user's perspective, the component should look like any other, and the implementation language should not be apparent. For example, if you build a file reader component, it should look and feel like other file readers, regardless of its implementation. The major data items that are of interest to the component developer are: Component parameter settings Global protocol properties and their values Data record that flows through the component

Data Records It's possible to design a component that resides outside of a data pipeline, representing a single event that does not process data records. However, most of the components that you create are intended to read and write the contents of data records that the framework passes to them. Like the global data, consider a data record as a single flat property collection. In many cases, this is sufficient to access all the data content required by a component. A data record is a container that references a single node that is the root of a node hierarchy. Each node has an associated property collection. The default property collection of a data record is the property collection of the root node. Some components need to access and modify properties deep in the hierarchy to perform their intended task. The Data Record Tree Viewer component provides a useful way to analyze the complete content of a data record. Input Data The input data for a component is either a data record that flows in from an upstream component, or a new, empty data record. The component state dictates the type of records that are passed to it. The component state is under the control of the component script of program, and allows the component to determine how it is treated by the protocol runner, and is adjusted to suit the overall role of the component (reader, calculator, etc.)

Common Principles | Page 7 Output Data Typically, a component performs one or more actions on a data record, and then passes it downstream. A component can act as a filter by directing the data to a specific output port. A component may also suppress the data record so that the component does not output the data at all. If a record does output a data record, it may be identical to the record passed in, or it may be modified, it may be a cached record or it may be a completely new record.

Component Lifetime Management A component works as a state machine, that is, a component progresses along a set of deterministic states. Code for component behavior must support the following events that occur in language-based components: Initialization: Occurs once before any records are processed. During this event, the component code can access the global data and component parameters. Typically, at initialization, a component reads its parameters and initializes any resources it may require. Process: Occurs once for each data record to process. During this event, the component code can access the data record itself, in addition to the global data and component parameters. Finalization: Occurs once when there are no more data records to process (or none to process in the first place). During this event, the component code can access the global data and component parameters. A component should free up any resources it might have used. The protocol framework guarantees that: A single Initialization event occurs before the first Process event, if any. A single Finalization event occurs when there are no more Process events (or if this no Process event in the first place). In either case, Finalization is called even if a protocol error occurs. Therefore, the overall lifetime sequence is: 1 A single Initialization event 2 Zero, one, or more Process events 3 A single Finalization event This sequence ensures that the component runs correctly, whether the sequence is repeated or not. For example, a Run To Completion subprotocol can run the sequence many times. This type of subprotocol reinitializes its complete environment for each data record that it processes. The implications of this are: In the Initialization event, do not assume that the component variables are already in a properly initialized state. Write your code to deal with the case when multiple lifetime sequences take place. At Initialization, the component might be in an uninitialized state or in whatever state the previous Finalization event left it in. Make sure you handle both with robustness, and ensure that you re- initialize all component variables in the Initialization event In the Finalization event, do not assume that all went well. All you can assume is that the Initialization event was invoked. The Finalization event is called even when the component threw an error, or if the protocol itself terminated prematurely. Always check to verify the state of everything. In general, avoid doing any real work in the Finalization phase. If you must, first confirm that the protocol is not in an error state.

Page 8 | Protocol Development • Quick Start Guide States of the Component State Machine With the exception of PilotScript components, language-based components use component state to determine the details of their own lifetime. During the Initialization event, the component determines its initial component state and passes this information back to the framework. The component returns a State value to the framework from each Process event. A simple example is a property calculator component. The framework passes all available input data records to a calculator component, and manipulates one or more property values. The component defines its state as ready to receive any data record available on its input port. Another example is a file reader component. In this case, the component should initially declare its state as expecting to receive new, empty data records whenever control returns to it. To fulfill its function, the reader component adds data to each empty record and passes it to an output port. When the end-of- file is reached, the component indicates that it will no longer receive data records by returning the appropriate state. The framework finalizes the component. With the five following states, you can define a wide range of component behaviors: ReadyForInputData: The component requests that the framework invoke its Process event with any data record that arrives at its input port. ReadyForNewData: The component requests that the framework invoke its Process event with a new, empty data record. A new record is by definition empty, whereas an Input record typically is not empty. ReadyForInputThenNewData: The component requests that the framework invoke its Process event with any data record that arrives at its input port. When there are no more input records to process, the framework will instead pass to the component a new data record. ReadyForInputOrNewData: Similar to ReadyForInputThenNewData except that if there is input data, then no new data record is passed to the component. DoneProcessingData: The component's task is complete. The component requests that the framework does not invoke its Process event again. The required syntax depends on the role of the language-based component. Below are some broad categories of components and how they make use of component state. Component Category Component State Management Calculator Component state is typically set to ReadyForInputData. Filter

Reader or Generator Component state is set to ReadyForNewData. When the operation is complete (for example, end of file for a reader), the state is set to DoneProcessingData. Writer Component state is set to ReadyForInputData. If a maximum output limit is reached, or there are no more data records to process, the state is changed to DoneProcessingData. Integrator Component state is set to ReadyForInputThenNewData. For each input data record, the component caches the necessary data. When there is no further data on the input port and a new data record is processed, the component starts to output records of aggregated data.

Common Principles | Page 9 Component Category Component State Management Functional grouping ReadyForInputOrNewData could be used for grouping of similar functionality into a single component of which some functionality requires input, whereas other functionality does not require input. Other variants are possible based on management of the component state in the logic of the component. For example, consider how you would develop a component that outputs one record following each 10 input records, with mean values for the properties, calculated from the 10 input data records. One solution is for the component to flip its state from ReadyForInputData to ReadyForNewData after each 10 records, and then flip it back.

Global Data Global data is a set of properties (name-value pairs) available to all components in a protocol. For many purposes, you may consider the global data as a single, flat property collection. A property collection is simply an ordered set of named properties and their values. However, to support scalability and to provide some organization of data, the global data structure is actually a hierarchy of nodes, each with its own property collection. Tip: To see the global hierarchy in its entirety, add the Global Data Tree Viewer component into an empty protocol and run it.

Page 10 | Protocol Development • Quick Start Guide Global Data Tree Viewer results Using Global Data The default global property collection is the property collection on the root node of this hierarchy. A component writer can access all properties in the hierarchy, for reading and writing by navigating over the node hierarchy. Note that some properties are read-only, since they are defined by a package or by the protocol runner subsystem and act as configuration constants. These are marked with a padlock in the Global Data Tree Viewer. Examples are the user name or the server folders; it makes no sense to edit such properties in a protocol. Component developers are mostly interested in global properties on the root node, so you can handle the global data as a simple property list and ignore anything at a lower level in the hierarchy. However, if you declare global properties in your package those properties appear in the global property hierarchy in a subnode, named after your package, to ensure name uniqueness. (For details, see the Application Packaging Guide.) These properties are defined for any protocol running on the server where your package is installed. All programming environments include APIs and techniques for getting and setting global property values, from either the default property list or from somewhere else in the data hierarchy. Package global properties are an example of "deep properties". They can be accessed with a deep property syntax, which means that their name starts with a forward slash and includes the path to the node

Common Principles | Page 11 where they reside. (e.g. "/acme/gizmos/bindir"). Alternatively, explicit navigation of the data nodes is also supported by the richer API sets. Scope of Global Data A subprotocol can also define global properties that are scoped only to that subprotocol; it is only visible within that subprotocol and its descendent subprotocols. It is not difficult to see why global property scoping is important. A subprotocol component may be dropped into any other protocol; for a component to pollute the namespace of the top-level protocol could lead to all sorts of problems if a collision were to occur. Even the simple case of using the same subprotocol component twice in a protocol would be error-prone without the possibility of global property scoping. In addition to property scoping, you also can declare any top-level node added as a direct child of the global root node for scoping to a subprotocol. The concept of a hierarchy of global data overlaid by the subprotocol hierarchy of scoped namespaces can be tough to visualize. The main thing to remember is that the global data visible to your component can include properties that are not visible outside the subprotocol where it lives, since they only relate to operations within the subprotocol.

Component GUI Component Parameters In the component implementation code, you can access the parameters of its component as a read-only property collection. This allows you to discover the settings specified by the protocol builder or by the end-user, and to modify the component behavior accordingly. During component construction in Pipeline Pilot, you can mark some parameters as required, while others are optional. For example, a file reader component is only useful if the user specifies a parameter value for the source file location. Therefore, the source parameter would be a required parameter. The maximum number of records to read might be a useful, but optional parameter. If not supplied, your code would probably read all the data in the file. Note: Make sure that your component regression tests include a variety of different parameter settings for a component, so that all possible paths through the code are tested. (For more information, see Regression Testing.) The parameters that you define on the component represent the ways in which you allow protocol writers to control or modify the behavior of your component, so your implementation code should honor that. IMPORTANT! Be sure to offer backward compatibility with your component revisions. If you remove or rename a parameter from one product release to the next, ensure that you support the old parameter, since existing protocols require your new implementation to work with the old component interface definition. Component and Parameter Naming Give components and parameters short names that describe their function. Component and parameter names may contain whitespace and are generally more readable if they do. We recommend that you review the existing examples and component and protocol design guidelines in the Component Development Guide. Familiarizing yourself with the conventions case, spacing, and capitalization will give your components a more standard look and feel.

Page 12 | Protocol Development • Quick Start Guide Parameter Types Each component parameter has an assigned type which is displayed in the Help tab when that parameter is selected. Some consequences of parameter type: A parameter's type controls what user interface tools are available when users set the value of the parameter. For example, parameters of URL type can be set using a File Browser dialog. Certain parameter types, such as BoolType, predefine the legal values for that parameter and expose a corresponding GUI for setting the value. A list of enumerated legal values can be defined to restrict the value of a parameter and simplify user input. Certain parameter types, such as LongType, are subject to validation when the component is initialized. If the current value cannot be converted to that type, the protocol stops running and displays an error. Parameters of ExpressionType are evaluated by the PilotScript interpreter when the component is initialized. The value of the parameter is set to the result of the expression. Parameter types can be set or modified on the Interface tab of the Edit Component dialog. In addition to some intrinsic behaviors, the Edit dialog for a parameter allows you to specify arbitrarily complex validation expressions and enabling/disabling scenarios involving the value of the parameter and the value of other parameters. The set of legal values on a parameter can also be made dynamic in this way. You can even specify the running of a protocol to supply such information, so that the server- side environment can influence the parameter user interface. Component and Parameter Help Text Help text is displayed in the Help window (lower-left corner of program window) when end users select a component or parameter in Pipeline Pilot Client. Make sure that your components and all of their parameters are completely documented so end users have detailed information on how to use them. Component help text includes a brief purpose statement and a detailed description. The purpose statement summarizes what users can do with the component. It is also used as fly-over help and is exposed as a tooltip when hovering the cursor over a component icon in the Explorer or workspace. The description text provides detailed information about the component. It is displayed below the purpose statement in the Help window. You can use HTML to structure the information in a way that makes it easy to digest. All ports include a comment field that is displayed in the Help window, below the purpose and description text. Use the Ports tab in the Edit dialog to modify the port information that is displayed. Comments are useful to specify requirements for input data, what type of data is in the output, and to define what determines if a data record exits the Pass port or the Fail port. Parameter help text describes how to use the parameter and how it influences the behavior of the component. You can describe the set of legal values for the parameter along with any dependencies on other parameters. The parameter help text is displayed with the parameter type, so the parameter type does not need to be stated in the help text. Again, you can use HTML tags if you need to present more structured information.

Client-side Components Components that run on the client machine are an integral part of Pipeline Pilot. Setting up viewers and other interactive functions as client-side components makes it possible to visualize data locally rather than on the remote server. The client-server architecture calls for computation to take place on the

Common Principles | Page 13 server. There may be situations where an application is integrated as a client-side component because the convenience of installing it on the local desktop outweighs the performance advantage of running it on the server. If client-side components are used, the resulting files need to reside on the client machine, instead of on the server. Pipeline Pilot provides components for manipulating files between the server and client. These components include: Create Tempfiles Copy to Client Copy to Server There are also two directories that are useful for this purpose: Temp Directory Client Run Directory The run directories are owned by the running job and can store files temporarily. Their locations are generated at run time, so they are designated using global variables rather than the literal path. The Client Run directory is an ideal place to store files for use by viewers and other interactive components. At run time, the results are generated on the server, written to a file on the server, and copied to a file on the client. After the protocol is finished processing, the data is viewed on the client machine, and any files that reside on the server and client are cleaned up automatically by the server at a later date.

Load Balancing and Reverse Proxy Deployments Here are some guidelines that will help make your protocols and applications portable to enterprise environments such as load balancing and reverse proxies: Job directory names no longer contain braces: If you have protocols that depend on job directory names having braces {ppcXXXX}, you can enable Compatibility Mode in the Pipeline Pilot Admin Portal (Jobs > Settings). Removing braces from job directory names allows grid engines to access the job directory without the path being interpreted by the shell. Avoid manually constructing URLs: New globals are available that allow you to easily refer to locations on your server. Instead of writing "http://@ServerName:@ServerPort/", use the new globals @ServerRoot and @ServerSSLRoot. If the server is behind a load balancer or reverse proxy, the combination of @ServerName:@ServerPort will always point to the wrong place. Get familiar with the @SharedPublicDir global: If you have public data that you need to share across servers use the @SharedPublicDir global to scope the data. Use location.host in your JavaScript: If you are writing JavaScript to access server pages, use location.host instead of location.hostname and location.port. Since using HTTP default ports is now possible, the port specification may not be present. Get familiar with pkgutil for load balancers: Because the XMLDB is read-only under load balanced configurations, the only mechanism for adding components and protocols is pkgutil. Users cannot save their protocols on load balanced systems. See the Application Packaging Guide for details (Help Center > Developrs tab > Development Guides). Do not assume affinity: In a load balanced environment, each HTTP request may go to a different server. If you have an application that saves session or state on a specific server, that information may not be available for the next request. It is better to move this state to a resource that can be accessed by all nodes behind the load balancer.

Page 14 | Protocol Development • Quick Start Guide Note: (on affinity): There is nothing in the server that precludes using affinity. Because load balancers handle affinity in a myriad of ways (e.g., cookies, IP addresses, none), there is no general way to support it. If your application requires affinity, it's necessary to work with the specific load balancers to which your application is deployed.

Global Properties Several new globals are available for optimizing how you write protocols and applications for an enterprise environment. Here are a few examples with explanations of how to use them. Global Property Description @ServerRoot The primary way to create a URL from the client's perspective. For a normal configuration, this is the non-secure base URL for the server. Depending on your configuration, it may refer to the secure base URL. To construct a URL that refers to a resource on the server, you can write @ServerRoot/MyResource. @ServerSSLRoot Points to the secure base URL for the server from the client’s perspective. Use this if you need to require secure access to the resource in question. It might use the non-secure protocol depending on the server configuration. @’/pilot_settings/ROOT’ Use this if you need to make a web request to the server from inside a protocol. Do not assume that @ServerRoot will be accessible from inside a protocol. The Pipeline Pilot server might not have a network route back to the reverse proxy. @’/pilot_settings/SSLROOT This is the secure version of @’/pilot_ settings/ROOT’. Like @ServerSSLRoot, it may use the non-secure protocol. @ SharedPublicDir Appropriate for applications that need to share data that may be persisted across more than one protocol. @LocalJobTempDirectory Using this for temporary data may improve protocol performance when running from shared file systems. This directory could be removed as soon as the protocol is finished.

Job Completion Notification There is a parameter, NotificationProtocol, on the Implementation tab that sends information about the protocol run such as Job ID, Job Status, etc. to a notification protocol that you have created to notify users when the run is stopped. The protocol will be run using anonymous credentials, or credentials specific to your protocol.

Common Principles | Page 15 Enabling the Notification Protocol 1. Navigate to the Pipeline Pilot Server Home Page (http://localhost:9944 by default). 2. Click Administration Portal and navigate to Security > Authentication. 3. Set Notification protocols to one of the following: Use Anonymous Credentials: Run the protocol using the Anonymous Access credentials on the Authentication page. Use Notification Credentials: Use the credentials that appear below this option when selected. Disabled NotificationProtocol Parameters NotificationProtocol specifies the name or component ID (guid) of a protocol stored in the server's protocol database that will be executed when the current job completes. The notification protocol will receive the following parameters that contain information about the job: Notify_JobID: Job id of the execution. Notify_JobStatus: Description of the result of the job execution. Notify_JobStatusCode: Status code that for the result of the job execution. 5: Job was stopped by the client or administrator. 6: Job completed normally with success. 7: Job completed with an error. 8: The process ID associated with the running job crashed or otherwise disappeared. 9: Job failed to start. Notify_ProtocolName: Name of the protocol. Notify_ProtocolPath: Path of the protocol in the DB. This field can be blank for protocols that were launched without saving to the database. Notify_ProtocolLogName: Log name of the protocol. This is usually the same as ProtocolName, however this can be set by the client to a different name than the protocol. Notify_RunHost: Name of the node where the protocol executed. Notify_Username: User that ran the job.

Page 16 | Protocol Development • Quick Start Guide Chapter 3: Command Line and Web Services Components

In many cases, the integration challenge is to construct a component that encapsulates the functionality of an existing "legacy" program that can execute on a command line. Often the encapsulation process also needs to export the data in a specific format so the program can use it and import of the resulting data. Some programs can take input on the STDIN stream, and most can write data or some sort of report data to STDOUT, STDERR or a named file. This set of components includes support for running a legacy program on the same machine as the server, or on a remote host. In some cases a remote program may expose a SOAP interface, or one may be specially constructed for Pipeline Pilot access. In other cases, more basic means of communication is used. The goal of the components in the command line category is to hide all of this complexity behind a component that looks like any other. The protocol author’s concerns should focus on setting appropriate parameter values and not on the underlying mechanisms that make the data processing occur.

Run Program The Run Program components execute a command directly on the server or client machine. This is ideal for integrating applications that have a command-line interface that exposes all or most of the functionality. The server version runs on either Windows or Linux platforms. Windows and Linux have different command syntax, so a single instance of this component cannot successfully run on both platforms. A solution is to create two copies of the component and place them on each output port of a filter that checks the server operating system. The convention of Pipeline Pilot's client-server architecture is to perform all computation steps on the server and to launch data viewers on the client. This convention provides a guide when choosing between the client and server version of Run Program, but there are other considerations: The use of a third-party application may dictate either the client or server version, depending on where the application is (or can be) installed. The client runs on Windows, while the server may run on Windows or Linux. A component that integrates with a Windows application is portable to all installations, if written using Run Program (on Client). Pipeline Pilot Client is not the only way to run protocols. Web Port, SOAP, client SDKs, and some third-party applications provide alternatives. A Run Program (on Client) component is usable only from the Pipeline Pilot Client. To deploy a component, consider creating client and server versions of the component to cover the maximum possible use cases. Unlike COM automation using the VBScript components, the path to the executable file must be specified as part of the Command parameter on the Run Program components. To avoid "hard-coding" this path, it can be stored in a global variable or exposed as a component parameter. The command-line application can receive data from the current data record directly using STDIN. More typically, all the data records are written to a file and the name of that file is passed to the application as a command-line argument. This strategy often employs Pipeline Pilot's temporary file management, particularly when the program runs on the client.

Command Line and Web Services Components | Page 17 Run Program on Remote Host You can create components that run command-line applications on a remote host using either Telnet or SSH. Telnet sends clear data to the remote machine, while SSH sends encrypted data. These components are available on both Windows and Linux servers. Though typically used to access UNIX or Linux machines, the availability of OpenSSH makes it possible to access remote Windows machines with the SSH component. Both the Telnet and SSH components require supplied credentials. The Telnet component requires a username and password. The SSH component can use a file containing a private encryption key. The SSH component also has features for maintaining a list of trusted hosts and warning if a remote host’s identification changes. As with Run Program, the location of the executable must be specified. The same is true for input and output files that are passed to the program as command-line parameters. The SSH component can capture textual output from the command and write it to a property. More typically, the results are collected in a file on the remote machine, and the file is copied back to the server using SCP or FTP.

SOAP Simple Object Access Protocol (SOAP) is a method for accessing services on the web. It employs the XML syntax to send text commands across the Internet using HTTP. SOAP components are available on both Windows and Linux servers. These components access SOAP servers and make requests to services that exist on UNIX, Linux, and other remote machines. Numerous SOAP components are available for sending the necessary data to the SOAP server, collecting the results, and adding the data to the current record in a protocol. The signature of the SOAP service (inputs and outputs) can be read from a Web Service Description Language (WSDL) file or specified using the parameters of the SOAP. The inputs are taken directly from properties on the data records, from global variables, or derived from an expression. The outputs are stored as properties or global variables. In the standard component, one SOAP request is made for each data record and the server waits for a response before sending a request for the next record. Two variations on this order of operations are available. A batched component, which submits multiple records in a single request, can improve performance. However, the SOAP service must be configured to accept batched (array) data. A queued component, which submits multiple requests to the SOAP server without waiting for a response. This component is useful when the SOAP service is running on a cluster or is otherwise capable or parallel operation.

Page 18 | Protocol Development • Quick Start Guide Chapter 4: Language-Based Components

This section focuses on the components that allow you to define functionality with a script or compiled language. These are referred to as language-based components. This chapter provides an overview of the various components available for this purpose and information related to selecting the best component for a specific task. The language-based components include: Perl (on Server) Java (on Server) .NET (on Server), Dynamic .NET (on Server), Dynamic C# (on Server), Dynamic VB.NET (on Server) Python (on Server) VBScript (on Server) VBScript (on Client) Custom Manipulator (PilotScript) Custom Filter (PilotScript) All language-based components require script or code to define the behavior of the component in response to certain events, such as receiving a data record to process. For each language, there is an application programming interface (API), which provides access to appropriate data structures and exposes utility functions. Each component has sufficient access to the Pipeline Pilot data structures that are required to plug and play in the protocol environment where it resides.

Perl You can build components whose function is coded with Perl. For the Perl (on Server) component, write a Perl script to define the Initialization, Process and Finalization phases of the component life cycle. The Perl script utilizes an object-oriented API to work with the object model of the data records that are passed to it, and to access global properties and the component parameters. The Perl (on Server) component is an empty, but functional component that provides a template for adding your Perl script to the three subroutines (onInitialize, onProcess, and onFinalize), that define the life cycle of the component. Alternatively, you can move this code into a Perl module file that exists in an application package independent of the new component and reference it with the "use" syntax in the Perl component. This makes it easier to upgrade the function of your component after deploying it in a number of protocols. This approach also hides the implementation of the component from Pipeline Pilot users. The Perl (on Server) component can access package-defined Perl modules by indicating the package it belongs to in the Use Package parameter. This extends the @INC path to include any Perl paths that are defined in the package configuration file. Using Perl to code a new component gives you access to the vast number of publicly available Perl libraries, in addition to your own Perl modules. Note: For detailed information on using Perl to write new components, see the Perl Component Development Guide.

Language-Based Components | Page 19 Java Java is a compiled language, so the process for building new components with Java is more complex, although the principles of component development remain the same. To implement a component, define and compile a Java class that implements a defined Component interface with the three life cycle methods (onInitialize, onProcess, and onFinalize). A full object model API is supported to query and modify the data records and to access component parameter and global settings. Since the Java binary code is a separate entity from the component definition (XML), Java components are always constructed using the package paradigm. (For details, see Application Packaging). An application package is a folder hierarchy containing the component XML, example protocol XML, binary files, example data file, documentation, and other configuration information. All these files are installed and uninstalled together on any deployment server, so the Java class (or JAR) files are always maintained consistently in the same package as the associated component definitions. The Java (on Server) component is parameterized by indicating the package for which it belongs, and the name of the class to load that implements the Component interface. The package defines CLASSPATH values to locate the class within the package file structure. The Java (on Server) component can incorporate package-define CLASSPATH settings to locate the class that implements its Component interface (or any other class or jar file) within the package file structure. This is done by indicating the package for which the component belongs in its Use Package parameter. This extends the Java run time to include any CLASPATH settings that are defined in the package configuration file. Note: Note: For detailed information on using Java to write new components, see Java Component Development Guide.

.NET .NET languages are compiled into intermediate code that is then executed on the target platform. Pipeline Pilot currently supports .NET on Windows. The .NET components are provided in two flavors: precompiled or dynamic and include: .NET (on Server) Dynamic .NET (on Server) Dynamic C# (on Server) and Dynamic VB.NET (on Server) Note: For detailed information on using .NET languages to write new components, see the .NET Component Development Guide.

Windows Script Host Components On Windows only, there are a few component based on the Windows Script Host (WSH) technology. All components based on the WSH expose a GUI that allows you to define scripts for the usual component lifetime phases (initialization, record processing and finalization). The script can address the following objects: Data: The data record Globals: The global properties Parameters: The component parameter properties Component: To set the component state and route data

Page 20 | Protocol Development • Quick Start Guide Each of these objects has appropriate methods to get and set properties. So the same sort of functionality is exposed as in the Perl and Java component, but in a different way. In addition to the lack of platform portability, one other difference is that the WSH interface does not provide direct access to the node hierarchy. Python Python scripting via the Windows Scripting Host (WSH) is supported, restricting the use of such components to the platform. For details on the API, see Windows Scripting Integration. There are also some example protocols that include Python components. Python scripting is supported via the Python (on Server) component. Its interface includes separate scripts for the Initialization, Process, and Finalization phases of the component life cycle. This lives in the prototype section of the XML database because of a known, reported, but persistent memory leak in the Python WSH scripting engine distributed by ActiveState. VBScript The VBScript (on Server) component allows you to deploy (to Windows servers) a component coded in VBScript. This can provide a useful mechanism for hooking into scriptable Windows applications installed on the server or the many scriptable objects that are deployed with the Windows operating system. The interface of the VBScript (on Server) component includes separate scripts for the Initialization, Process and Finalization phases of the component life cycle. For details on the API, see Windows Scripting Integration. There are also some example protocols that include VBScript components.

PilotScript

About the PilotScript Language PilotScript is an expression language for writing powerful data filters and data manipulator components that work on a streaming data record. The syntax for PilotScript has its origins in a subset of Oracle's PL/SQL language, with overtones of Perl. PilotScript is very fast and has little overhead because it is tied closely to the underlying data structures. It provides a relatively simple way to create components that do direct manipulations and analyses of a data record. PilotScript provides over 150 functions to work on the property lists, properties, and values in data records and global data. Many components include parameters with values that are PilotScript expressions evaluated at run time. In particular, the Custom Manipulator and Custom Filter components (included with Pipeline Pilot) employ PilotScript to define new components that work with the data in the specific ways that you need. PilotScript vs. Third-Party Scripting PilotScript is designed to provide convenient and efficient access to the data structures used by Pipeline Pilot. However, PilotScript does not have any significant facilities for working with external data. The scripting components are designed for integration tasks. They have access to read and write data, but their strengths are the intrinsic capabilities of the scripting languages, and their ability to work with

Language-Based Components | Page 21 external data and computational services. For example (and these are by no means exhaustive descriptions of what is possible with the scripting components): VBScript is well suited to integration tasks that involve services deployed as COM components, such as Windows-based client visualization applications. The Perl (on Server) and Java (on Server) components can directly access functionality from the vast number of Perl modules and Java libraries that are available in the public domain. Python provides an object-oriented environment for working with CORBA; there is much other public domain Python software. The combination of access to Pipeline Pilot data and the integration features of each third-party scripting environment make them useful to consider when you are designing a protocol where you need access to an external service. The other major consideration is that Java, Perl, VBScript, and Python are industry-standard languages with which you may already be familiar. Note: You cannot use PilotScript to hook into functionality in external libraries; in this case, use Java or Perl instead.

Custom Manipulator and Filter Components PilotScript expressions are used in many components, where it is useful to provide the flexibility of an expression rather than a literal text string. For the purposes of this guide, there are two components that you can use as the basis for creating new behaviors defined by PilotScript. These are the Custom Manipulator (PilotScript) and Custom Filter (PilotScript) components, available in all server installations, independent of the integration collection required for most of the other components referenced in this document. The interface for editing these components provides tabs for Initial Expression, Expression, and Final Expression. These correspond to the Initialization, Process, and Finalization stages of the component life cycle as described previously. Both components evaluate and process the PilotScript code defined for the Initialization, Process and Finalization stages of the component life cycle. The only difference between the components is that the filter component evaluates the final statement of the PilotScript code in the Process stage as a Boolean expression, and based on this, directs the data record to the Pass or Fail port of the component.

Debugging Because scripting components often perform complex tasks, the source of unexpected behavior can be less transparent than it is with standard components. Some generic components are useful for debugging. Use Excel or the Notepad to monitor component output. The Data Record Tree Viewer and Display Globals component are useful for troubleshooting and other specialized tools are also available. Use the following when running protocols in Debug Mode (press SHIFT when you start to run protocol): Messages. Debug messages are displayed in the Debug Messages window (lower-left). Debug messages are generated from within a component by using the debugMessage() method. Message generators are attached to the output ports of any component using the Debug Parameters window (lower-right). Warnings. Debug warnings are displayed in the Debug Messages window and shown in red for emphasis. You can generate debug warnings within a component by using the debugMessageError () method. You can also attach warning generators to the output ports of any component using the Debug Parameters window.

Page 22 | Protocol Development • Quick Start Guide Breaks. During debugging, you can pause a protocol and view a status report in a dialog box. The protocol is resumed when you close the dialog. Breaks are set within a component by using the debugBreak() method. Asserts. PilotScript contains an assert() function to halt a protocol if a condition is not met.

Language-Based Components | Page 23