PERL COMPONENT DEVELOPMENT GUIDE PIPELINE PILOT INTEGRATION COLLECTION 2016 Copyright Notice

©2015 Dassault Systèmes. All rights reserved. 3DEXPERIENCE, the Compass icon and the 3DS logo, CATIA, SOLIDWORKS, ENOVIA, DELMIA, SIMULIA, GEOVIA, EXALEAD, 3D VIA, BIOVIA and NETVIBES are commercial trademarks or registered trademarks of Dassault Systèmes or its subsidiaries in the U.S. and/or other countries. All other trademarks are owned by their respective owners. Use of any Dassault Systèmes or its subsidiaries trademarks is subject to their express written approval.

Acknowledgments and References

To print photographs or files of computational results (figures and/or data) obtained using BIOVIA software, acknowledge the source in an appropriate format. For example: "Computational results obtained using software programs from Dassault Systèmes BIOVIA. The ab initio calculations were performed with the DMol3 program, and graphical displays generated with Pipeline Pilot."

BIOVIA may grant permission to republish or reprint its copyrighted materials. Requests should be submitted to BIOVIA Support, either through electronic mail to [email protected], or in writing to:

BIOVIA Support 5005 Wateridge Vista Drive, San Diego, CA 92121 USA Contents

Chapter 1: Perl Component Development pilot::Debug::whoCalled() 28 Overview 1 pilot::Debug::whereCalled() 28 Who Should Read this Guide 1 Requirements 1 Getting Started with Perl Component Development 1 Additional Information 1 Chapter 2: Perl API 2 A Minimal Perl Component 2 Where Does the Perl Code Go? 3 Script Parameter 3 Packages 3 Accessing Component Parameters 3 Accessing Global Variables 4 Accessing Data Record Properties 4 Directing Records to the Pass or Fail Port 4 Component State Return Values 5 Manipulating Hierarchical Data Records 5 Adding Properties and Child Nodes to a Data Record 6 Manipulating Nested Properties 8 Adding Child Nodes to Nested Nodes 10 Deleting Child Nodes from the Root Node 11 Deleting Nested Child Nodes 13 Chapter 3: PMetaData Properties in Perl 15 Finding Whether Metadata Properties Exist 15 Getting Metadata Properties 16 MetaData API for Nodes and Properties 17 Chapter 4: Hash Table Values in Perl 20 Creating HashTableValues in Perl 20 Adding Key/Item Pairs to a Hash Table Value 21 Assigning to a Hash Table Value 21 Testing if a Given Key is in Hash Table Value 22 Removing Key/Item Pairs from a Hash Table Value 22 Iterating over Hash Tables 23 Getting the Size of a Hash Table Value 23 Emptying a Hash Table Value 24 Hash Table Values as Perl Arrays 24 Perl HashTableValue API 25 Chapter 5: Debugging Perl Components 26 pilot::Debug Module 27 pilot::Debug::captureOutput() 27 pilot::Debug::showVar() 27 pilot::Debug::gotHere() 28 Chapter 1: Perl Component Development Overview

The Pipeline Pilot Integration collection includes tools for developing components in the Perl language for use with protocols. Perl is a popular and dynamic scripting language for writing Web applications. There are several reasons why developers choose to design components in Perl, including: Many scientific users lack formal training in computer science, but have some experience writing Perl scripts. By following the information provided in this guide, you should be able to create custom components in Perl. You can develop and interactively debug Perl components, without compilation, linkage, or installation steps. No additional development environment is required. You can write Perl components using only Pipeline Pilot Client. Perl components can make use of the many third-party libraries available for the Perl language (such as BioPerl). Note: Despite these advantages, some users who require faster speed (and are willing to accept a more complex development process), may want to consider using the Java components instead. For details, see Java Component Development.

Who Should Read this Guide This guide provides information about how to use the Perl component and application programming interface (API) to design your own components. It includes the necessary architectural background and the technical instruction for creating, testing, and deploying Perl components.

Requirements To develop components in Perl, you need some experience with Perl scripting. You should also have a basic understanding of how to use Pipeline Pilot to design and run protocols. Note: This document assumes you are familiar with the Protocol Development Quick Start guide, which explains Component Lifetime Management.

Getting Started with Perl Component Development The tools available for developing components in Perl include: Perl (on Server) component Perl API and accompanying API documentation

Additional Information For more information about the Pipeline Pilot Integration collection and other BIOVIA software products, visit https://community.3dsbiovia.com.

Perl Component Development Overview | Page 1 Chapter 2: Perl API

The Perl component development tools include an API that provides a convenient way to access and modify the data structures, including the data records that flow through the components, the various parameter settings, and the global properties of the enclosing protocol. The most commonly used global and data record values are presented to you as Perl hashes, which are tied to the underlying Pipeline Pilot data structures. This allows you to code in a familiar Perl-like fashion, without requiring detailed knowledge of the Pipeline Pilot data objects, or object-oriented programming in Perl. The full specification for the Perl API is described in a separate Perl Component API document. This chapter explains how to use a subset of the most commonly used functions that should be sufficient for solving the majority of problems when developing in Perl. Tip: The sample code used throughout this chapter is based on a Perl example protocol Manipulating Hierarchical Data Records, which is included with the Integration collection. You may want to use this protocol as a starting point for learning and experimentation.

A Minimal Perl Component To begin writing a custom Perl component, start with the Perl (on Server) component. The Perl source code is contained in the Script parameter. To examine or modify the code, select the Script value in the Parameters window. A script editor is displayed for editing the syntax. When you first examine the source of a new Perl (on Server) component, it should contain the following: use strict; sub onInitialize { my $context = shift; return pilot::READYFORINPUTDATA; } sub onProcess { my $context = shift; my $data = shift; return pilot::READYFORINPUTDATA; } sub onFinalize { my $context = shift; } These three subroutines comprise the minimal Perl code required for a component: onInitialize(): Called once, when the protocol starts, and before any data records are received by the component. onProcess(): Called for each data record received by the component. The $data parameter contains a reference to the data record. onFinalize(): Called once, after the last record is processed. The $context and $data parameters contain references to the data structures for context and data. The use of these values is described later in this document.

Perl API | Page 2 Where Does the Perl Code Go? Script Parameter If you choose to keep the Perl code in the component's Script parameter, it is stored within the XML definition of that component in the protocol database. This is the simplest way to create a Perl component and is useful when you want to create a single instance of a component as a part of a protocol. The main disadvantage of this method is that each instance of the component (even within a single protocol) has its own copy of the code, potentially leading to maintenance problems when bugs are fixed or changes are made. Also, any potentially re-useable subroutines in your code are not accessible to other Perl components. Packages Alternatively, you can store the Perl code as a text source file on the server's file system, for run-time loading with a use statement. For example, the Script parameter for the FASTA Sequence Fetcher component contains: use SciTegic::Bio::SeqAnal::FastaSequenceFetcher; In this case, create a package, keeping the Perl source code in files, separate from the component XML. This method allows multiple components or component instances to use a common source file, eliminating code duplication and simplifying maintenance. Using packages also makes it easier to maintain and deploy a set of related components. For details on creating your own packages, see Application Packaging.

Accessing Component Parameters Each component has a set of parameters that users can modify at protocol run time. Your component needs to read these parameter values, as illustrated below. First, a reference to the parameters hash is obtained using the following idiom: my $parametersHashRef = $context->getComponentParameters()->getHashRef(); $context is the first parameter to the onInitialize, onProcess and onFinalize subroutines. If the above syntax looks confusing, or you are not familiar with object-oriented programming in Perl, don't worry. All you need to do is copy this statement verbatim into your code. Once $parametersHashRef is obtained, it contains key/value pairs for each of the component’s parameters and is read just like any other Perl hash. For example: my $fileFormat = $parametersHashRef->{"File Format"};

Note: Component parameters are read-only, and cannot be modified by your Perl code. The Perl API provides several different ways to achieve the same result. An alternative way to read component parameters that you may sometimes see in our examples is: my $parameters = $context->getComponentParameters(); my $fileFormat = $parameters->getByName("File Format")->getValue();

Page 3 | Pipeline Pilot • Perl Component Development Guide Accessing Global Variables Your Perl code can both read and write global variables. They are accessed through a Perl hash reference that is accessed through the $context reference, as shown in the following code example: # obtain a reference to the globals hash: my $globalsHashRef = $context->getGlobalProperties()->getHashRef(); # reading global values: my $chargeOutputFileName = $globalsHashRef->{"ChargeOutputFile"}; # creating or modifying global values: $globalsHashRef->{"numHits"} = 12; When scalar values are assigned to a global variable, they are automatically converted to a compatible data type. References to arrays can also be assigned to a global variable, in which case the global is of type SciTegic.value.FlexArrayValue. $globalsHashRef->{"NumberList"} = ["one", "two", "three"]; Writing to a non-existent global variable automatically creates the variable, the way a Perl hash works. Unlike the parameters hash, changes to the globals hash are immediately seen by the global Pipeline Pilot environment.

Accessing Data Record Properties To be useful in a data stream, your component needs to access the properties of the incoming data records. Pipeline Pilot data records are hierarchical data structures, but in many cases, you may be concerned with properties at the top level. This section demonstrates that reading, modifying, or creating top-level properties is very simple. Accessing nested properties is described later, in the section Manipulating Hierarchical Data Records. # Tie the properties to a Perl hash reference my $sequenceNode = $data->getRoot(); my $propertiesHashRef = $sequenceNode->getProperties()->getHashRef();

# getting property values: my $displayID = $propertiesHashRef->{'dispayID'}

# setting property values: $propertiesHashRef->{'elapsedTime'} = 10.8; ... Assigning a value to a non-existent property automatically adds that property to the data record.

Directing Records to the Pass or Fail Port When a component receives data records, it can do any of the following with the records: Route to the pass port Route to the fail port Do not route to any port (i.e., deleted from the data stream) The Perl code used to select these options is: $data->routeTo(pilot::PASSPORT); $data->routeTo(pilot::FAILPORT); $data->routeTo(pilot::NOPORT);

Perl API | Page 4 In these examples, $data is the second parameter to the onProcess subroutine.

Component State Return Values You can define different categories of component behavior (a reader, calculator, manipulator, etc.) based on the component state returned by the onInitialize and onProcess subroutines. In a Perl component, the onInitialize and onProcess subroutines must end with a statement such as this: return pilot::READYFORINPUTDATA; The list below names each of the available component state return values and explains their meaning: pilot::DONEPROCESSINGDATA: The component's task is complete. It requests that the framework does not invoke its onProcess subroutine again. pilot::READYFORINPUTDATA: The component requests that the framework invoke its onProcess subroutine with any data record that arrives at the input port. pilot::READYFORNEWDATA: The component requests that the framework repeatedly invoke its onProcess subroutine with a new, empty data record, as long as this component state is in force. pilot::READYFORINPUTTHENNEWDATA: The component requests that the framework invoke its onProcess subroutine with any data record that arrives at its input port. When there are no more input records to process, the framework repeatedly passes to the component a new, empty data record, as long as this component state is in force. Below are some broad categories of components and how they make use of component state. Component Category Management of Component State Calculator Component state is always set to ReadyForInputData. Filter Component state is always set to ReadyForInputData. Reader or Generator Component state is set to ReadyForNewData. When the operation is complete, the state is set to DoneProcessingData. Writer Component state is set to ReadyForInputData. If a maximum output limit is reached, the state is changed to DoneProcessingData. Integrator Component state is set to ReadyForInputThenNewData. For each input data record, the component caches the necessary data. When there is no further data on the input port and a new data record is processed, the component starts to output records of aggregated data.

Manipulating Hierarchical Data Records Previously, we discussed how to manipulate (read/write/create) top-level properties on the data record. This section explains how to work with hierarchical data records. An Pipeline Pilot data record is a hierarchical tree structure consisting of nodes (containers for nodes and properties) and properties

Page 5 | Pipeline Pilot • Perl Component Development Guide (named values). In a simple record, there is a single node, called the "root node" and its properties are called "top-level properties". This simple record structure is sufficient for many purposes, but because nodes can contain other nodes, it is possible to created hierarchical (or "nested") data records. A simple analogy with a computer file system can make this structure clear. Just as in a file system, where folders can contain files and/or folders, which can in turn contain files and/or folders, in a data record, nodes can contain nodes and/or properties. Think of a node as a folder, and a property as a file containing a value. Both nodes and properties have names, just like files and folders, which can be used to manipulate them. Adding Properties and Child Nodes to a Data Record The code example below shows how to add properties and child nodes to an empty data record. At the beginning of the onProcess() subroutine, $data contains a reference to the incoming data record. The first step is to use the getRoot() method to get a reference to the root node. Adding child nodes is done in the addAnimal() and addEquipment() subroutines. Comments in the code explain the steps in detail. sub onProcess { my $context = shift; my $data = shift;

# Given an empty record, get a reference to the root node.

my $root = $data->getRoot();

# Label the root node "Farm"

$root->setName("Farm");

my $properties = $root->getProperties()->getHashRef();

# Set some top-level properties

$properties->{"farm owner"} = "Farmer Brown"; $properties->{"farm type"} = "Potato";

# Add some child nodes representing animals

addAnimal($root, "cow"); addAnimal($root, "pig"); addAnimal($root, "chicken"); addAnimal($root, "chicken"); addAnimal($root, "chicken"); addAnimal($root, "horse");

# Add some child nodes representing equipment

addEquipment($root, "tractor"); addEquipment($root, "pitchfork"); addEquipment($root, "milking stool");

Perl API | Page 6 return pilot::READYFORINPUTDATA; }

sub addAnimal { my ($node, $animalType) = @_;

# $node contains a reference to the node of the data record # to which we will attach the new child node. # # In other words, it is the node that will contain the # new Animal node.

# Create a new node named "Animal", with one property: "type"

my $animal = pilot::createNode();

$animal->setName("Animal");

$animal->getProperties()->getHashRef()->{"type"} = $animalType;

# Attach the newly created node to the existing node

$node->appendChild($animal); }

sub addEquipment { . . . # this code is similar to addAnimal() }

Note: For clarity and simplicity in this example, addAnimal() and addEquipment() are implemented as separate subroutines, but in actual use it would be better to write a single more general subroutine to add a new node. This is the data record that results, as shown by the Data Record Tree Viewer:

Page 7 | Pipeline Pilot • Perl Component Development Guide Data record tree view results Manipulating Nested Properties The next code example shows how to read, write, and create properties belonging to a child node (also called "deep" or "nested" properties). Iterating over all child nodes of a specific type is done by using the findChildrenByName() method of the node object, which returns a list of references to all of the node's children that have the specified name. sub onProcess { my $context = shift; my $data = shift; # Get a reference to the root node of the data record. my $root = $data->getRoot();

# Iterate over every "Animal" node directly attached to # the root node: foreach my $animal ($root->findChildrenByName("Animal")) {

my $propertiesHashRef = $animal->getProperties()->getHashRef();

# Reading a property value:

my $typeValue = $propertiesHashRef->{"type"};

# Writing a new value to a property:

Perl API | Page 8 $propertiesHashRef->{"type"} = "genetically modified " . $typeValue;

# Notice that $propertiesHashRef works just like any # other Perl hash, so you could create a new property # here by assigning a value to a new key: # # $propertiesHashRef->{"newKey"} = "newValue"; }

# Upgrade the equipment (by turbo-charging) foreach my $equipment ($root->findChildrenByName("Equipment")) {

my $propertiesHashRef = $equipment->getProperties()->getHashRef(); $propertiesHashRef->{"type"} = "turbo-" . $propertiesHashRef->{"type"}; }

return pilot::READYFORINPUTDATA; } The resulting data record shows that all of the animals have been genetically modified, and all of the equipment has been turbocharged:

Data record tree view results

Page 9 | Pipeline Pilot • Perl Component Development Guide Adding Child Nodes to Nested Nodes If you have a reference to an existing node, a new child node can be attached to it using the appendChild() method. In the following example, we iterate over all of the "Animal" nodes belonging to the root node, and if the node's type contains the word "chicken", three new "Egg" nodes are attached to it. This is done in the addEgg() subroutine, which works just like the addAnimal() subroutine discussed in a previous example. sub onProcess { my $context = shift; my $data = shift;

# The chickens are laying, so give each chicken some "Egg" # child nodes.

my $root = $data->getRoot(); foreach my $animal ($root->findChildrenByName("Animal")) { my $propertiesHashRef = $animal->getProperties()->getHashRef();

if ($propertiesHashRef->{"type"} =~ /chicken/i) {

# Give each chicken three eggs

addEgg($animal, "white"); addEgg($animal, "brown"); addEgg($animal, "broken"); } } return pilot::READYFORINPUTDATA; }

sub addEgg { my ($node, $eggType) = @_;

my $egg = pilot::createNode(); $egg->setName("Egg"); $egg->getProperties()->getHashRef()->{"type"} = $eggType; $node->appendChild($egg); } The resulting data record is now three levels deep, as all of the chickens now have eggs:

Perl API | Page 10 Data record tree view Deleting Child Nodes from the Root Node The next code sample shows how to delete a child node from the root node. Notice that deleting a node also removes its properties and child nodes, so deleting a node is the same as deleting an entire subtree. Also note that since you still have a reference to the deleted node (or subtree), it can be reattached to another node of the original record, or kept in a Perl package level variable and later attached to a different record. This gives you the ability take apart and rearrange data record trees in any way you want. sub onProcess { my $context = shift; my $data = shift;

# Farmer Brown wants bacon, so delete all "Pig" child- # nodes that are directly attached to the root node.

Page 11 | Pipeline Pilot • Perl Component Development Guide my $root = $data->getRoot();

# Iterate over all "Animal" nodes belonging to the root node: foreach my $animal ($root->findChildrenByName("Animal")) { my $propertiesHashRef = $animal->getProperties()->getHashRef();

# Any "Animal" node whose type contains "pig" is a pig. if ($propertiesHashRef->{"type"} =~ /pig/i) { # $root is the parent node, # $animal is the child node to delete

$root->removeChild($animal); }

} return pilot::READYFORINPUTDATA; } The "pig" node was deleted from the data record. If there were more than one "pig", the above code would delete all of them.

Perl API | Page 12 Data record tree view Deleting Nested Child Nodes The following code example shows how to use nested loops and findChildrenByName() to operate on all of the deeply nested "Egg" nodes in the data record. sub onProcess { my $context = shift; my $data = shift;

my $root = $data->getRoot();

# Remove all "Egg" nodes whose type property value is "broken", # regardless of what type of "Animal" they belong to.

foreach my $animal ($root->findChildrenByName("Animal")) { foreach my $egg ($animal->findChildrenByName("Egg"))

Page 13 | Pipeline Pilot • Perl Component Development Guide {

if ($egg->getProperties()->getHashRef()->{"type"} =~ /broken/i) { # $animal is the parent node, $egg is the # child node to delete $animal->removeChild($egg);

} } } return pilot::READYFORINPUTDATA; } In the resulting data record, all of the "Egg" nodes with type = "broken" were deleted:

Data record tree view

Perl API | Page 14 Chapter 3: PMetaData Properties in Perl

Metadata properties are properties in a property collection associated with every node and property in a data hierarchy. While not all nodes and properties may have such a property collection, all can have such a collection. This allows saving of memory (when metadata is not needed on a particular node or property) while allowing the user to effect the creation of such a collection when it is needed. Metadata properties are available in Perl as objects that implement a property collection. There are typically two methods to access the metadata property collection – a "find" method that returns 'undef' if the metadata properties have never been accessed, and a "get" method that forces the creation of an empty property collection if it does not already exist. Once you get the metadata property collection, you can perform all the actions you would perform with any other property collection in Perl: call methods, get a hashref, assign, etc.

Finding Whether Metadata Properties Exist You can find out whether a node or property contains metadata with the findMetaData method. If the metadata exists, the property collection will be returned. If not, undef will be returned. This is useful to check the existence of metadata attributes without forcing the creation (and memory use!) of an un- needed property collection. For example, you could find the metadata property collection on the root node, and if it exists, check for the existence of some property. If that property exists, and it is true, assign it to false (0 in Perl). sub onProcess { my $context = shift; my $data = shift; my $root = $data->getRoot();

my $metadata = $root->findMetaData(); if (defined $metadata) { my $generated = $metadata->findByName(“Generated”); if (defined $generated and $generated) { $metadata->define(“Generated”, 0); } } return pilot::READYFORINPUTDATA; } You can test this Perl snippet by inserting a Custom Manipulator (PilotScript) upstream, and setting the Expression to: nodemetadataproperty(dataroot(), "Generated") := true; Downstream, insert a Custom Manipulator (PilotScript), and set the Expression to: GeneratedNewValue := nodemetadataproperty(dataroot(), "Generated"); Using a Data Record Tree Viewer, you should be able to see the generated new value to be "0" after the application of the Perl script. Similarly, you can find metadata from a property "Measurement", and if "Units" exists on the metadata, set them to "Unknown":

PMetaData Properties in Perl | Page 15 sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties(); my $measurement = $props->getByName(“Measurement”);

my $metadata = $measurement->findMetaData(); if (defined $metadata) { my $units = $metadata->findByName(“Units”); if (defined $units) { $metadata->define(“Units”, “Unknown”); } } return pilot::READYFORINPUTDATA; } You can test this Perl snippet by inserting a Custom Manipulator (PilotScript) upstream, and setting the Expression to: Measurement := 10; metadataproperty(measurement, 'units') := 'ms'; Downstream, insert a Custom Manipulator (PilotScript), and set the Expression to: NewUnits := metadataproperty(“Measurement”, “Units”); Using a Data Record Tree Viewer, you should be able to see the new value for the metadata property Units to be "Unknown" after the application of the Perl script.

Getting Metadata Properties These methods are similar to the previously-described "find" methods, but you never have to check whether the result is undefined – if the requested metadata property collection does not already exist, it is created. For example, you could get the metadata property collection on the root node and assign the property "Generated" to false (0 in Perl). sub onProcess { my $context = shift; my $data = shift; my $root = $data->getRoot();

my $metadata = $root->getMetaData(); $metadata->define(“Generated”, 0);

return pilot::READYFORINPUTDATA; } You can test whether this Perl snippet worked by inserting a Custom Manipulator (PilotScript) downstream, and setting the Expression to: GeneratedValue := nodemetadataproperty(dataroot(), "Generated"); Using a Data Record Tree Viewer, you should be able to see the value of Generated to be "0" after the application of the Perl script.

Page 16 | Pipeline Pilot • Perl Component Development Guide Similarly, you can get metadata for a property "Measurement", and set metadata property "Units" to "Unknown": sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties(); my $measurement = $props->getByName(“Measurement”);

my $metadata = $measurement->findMetaData(); $metadata->define(“Units”, “Unknown”);

return pilot::READYFORINPUTDATA; } You can test this Perl snippet by inserting a Custom Manipulator (PilotScript) upstream, and setting the Expression to: Measurement := 10; Downstream, insert a Custom Manipulator (PilotScript), and set the Expression to: If (metadataproperty(“Measurement”, “Units”) is defined) then Units := metadataproperty(“Measurement”, “Units”); End If; Using a Data Record Tree Viewer, you should be able to see the value for the metadata property Units to be defined to value "Unknown" after the application of the Perl script.

MetaData API for Nodes and Properties The following methods may be called for either nodes or properties. findMetaData Returns a property collection containing the metadata if it exists, else undef. Return value: {Property Collection} A property collection containing the metadata if it exists, else undef. Example: Return the metadata from the root node, and see if it contains the flag "Generated" with the value true. If so, set it to false.

PMetaData Properties in Perl | Page 17 findMetaData

sub onProcess { my $context = shift; my $data = shift; my $root = $data->getRoot();

Example 2: Return the metadata from a property "Measurement", and see if it contains the metadata property "Units". If so, set it to "Unknown".

sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties(); my $measurement = $props->getByName(“Measurement”);

getMetaData Returns a property collection containing the metadata. If it does not already exist, create it. Return value: {Property Collection} A property collection containing the metadata property collection. Example: Return the metadata from the root node, and set the flag "Generated" to false.

Page 18 | Pipeline Pilot • Perl Component Development Guide getMetaData

sub onProcess { my $context = shift; my $data = shift; my $root = $data->getRoot();

my $metadata = $root->getMetaData(); $metadata->define(“Generated”, 0);

return pilot::READYFORINPUTDATA; }

Example 2: Return the metadata from a property "Measurement", and set the metadata property "Units" to "Unknown".

sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties(); my $measurement = $props->getByName(“Measurement”);

my $metadata = $measurement->findMetaData(); $metadata->define(“Units”, “Unknown”);

return pilot::READYFORINPUTDATA; }

PMetaData Properties in Perl | Page 19 Chapter 4: Hash Table Values in Perl

Properties can contain any of a large number of possible values. One value type is known as a HashTableValue; this is an implementation of a general hash table that can be added to any property collection, cloned, and cached. This section will discuss the creation, access, and manipulation of these value types via the Perl interface. HashTableValues are new value types starting with Pipeline Pilot 9.0. Previously, hash tables were available only in PilotScript; they were denoted using a numeric value, and kept in a static table. These tables could not be accessed by any other scripting language. The new hash table value is a native value type. It is accessible by the scripting languages, as well as by PilotScript. They can be serialized, copied, and are cloned when a node is cloned. They can be stored on global properties and shared across components. This gives them a robustness that was not possible with the previous static implementation. All the current PilotScript methods work with the new hash table values, with the only difference being that they are created using HashValueCreate() rather than HashCreate(). In Perl, for a property containing a value of type HashTableValue, the hashref can be obtained and used to access or manipulate the table. Locally-created Perl hashes can be stored onto a property collection, with the new property having a value of type HashTableValue. Thus you can easily manipulate hash tables in Perl, then save the result onto a property collection. Downstream Perl components can access and manipulate these hash table values via hash references to give a look-and-feel exactly the same as native Perl hash tables.

Creating HashTableValues in Perl We'll demonstrate the creation of hash table values by adding to the onProcess method of the Perl (on Server) component. You can test the component by using Generate Empty Data, followed by Perl (on Server), followed by the Data Record Tree Viewer. Create the protocol as described above, and replace the onProcess method with the following. This creates a local hash in Perl, then uses "define" to add it as a property to the property collection. sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties();

my $hashref = {‘A’, ‘apple’, ‘B’, ‘banana’}; my $hashprop = $props->define(“H”, $hashref);

return pilot::READYFORINPUTDATA; } The output should look like:

Hash Table Values in Perl | Page 20 The Data Record Tree Viewer displays hash table values as an array of entries, with the key and item of each entry separated by an equal sign.

Adding Key/Item Pairs to a Hash Table Value Once created, you can manipulate a property containing a hash table value by getting a Perl hash reference and performing native Perl hash operations. To access a hash table value as a native Perl hash reference, call the getHashRef method on the property containing the hash table value. Here is an example, in which we add a key "A" with a value "apple". sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties(); my $hashprop = $props->define(“H”, {});

my $hashref = $hashprop->getHashRef(); $hashref->{“A”} = “apple”;

return pilot::READYFORINPUTDATA; } The output looks like:

Assigning to a Hash Table Value Assignment from a Perl hashref can be used to assign the state of a hash table, erasing any current content. sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties(); my $hashprop = $props->define(“H”, {'A', 'apple', 'B', 'banana'});

my $hashref = $hashprop->getHashRef(); %$hashref = (‘C’, ‘carrot’, ‘D’, ‘dill’);

return pilot::READYFORINPUTDATA; } The output looks like:

Page 21 | Pipeline Pilot • Perl Component Development Guide Testing if a Given Key is in Hash Table Value You can test whether a given key is in a hash table using either the hashref, or calling a native method. First, using the hashref: sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties(); my $hashprop = $props->define(“H”, {'A', 'apple', 'B', 'banana'}); my $hashref = $hashprop->getHashRef();

if (!exists $hashref->{“A”}) { die “Strange – did not find key A!”; }

return pilot::READYFORINPUTDATA; } This snippet should run and not execute the "die" statement.

Removing Key/Item Pairs from a Hash Table Value You can remove a single key/item pair from a hash table using the hashref. sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties(); my $hashprop = $props->define(“H”, {'A', 'apple', 'B', 'banana'}); my $hashref = $hashprop->getHashRef();

delete $hashref->{“A”};

return pilot::READYFORINPUTDATA; } This will leave you with a hash table containing only one entry with key "B":

Hash Table Values in Perl | Page 22 Iterating over Hash Tables Just like a native Perl hash reference, we can iterate over hash table value references. In this example, we take all the entries in the hash table value and make them properties on the data root. sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties(); my $hashprop = $props->define(“H”, {'A', 'apple', 'B', 'banana'}); my $hashref = $hashprop->getHashRef();

while ((my $key, my $value) = each %$hashref) { $props->define($key, $value); } return pilot::READYFORINPUTDATA; } Another equivalent scheme would be to use the foreach loop: sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties(); my $hashprop = $props->define(“H”, {'A', 'apple', 'B', 'banana'}); my $hashref = $hashprop->getHashRef();

foreach my $key (keys %$hashref) { $props->define($key, $hashref->{$key}); } return pilot::READYFORINPUTDATA; } Note that the order in which the iteration proceeds is arbitrary; you should not plan on any particular order, as it is implementation-dependent. In either case, you should see something like the following output:

Getting the Size of a Hash Table Value You can get the size (that is, the number of entries) of a hash table using the hashref:

Page 23 | Pipeline Pilot • Perl Component Development Guide sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties(); my $hashprop = $props->define(“H”, {'A', 'apple', 'B', 'banana'}); my $hashref = $hashprop->getHashRef();

$props->define(“HSize”, scalar keys %$hashref);

return pilot::READYFORINPUTDATA; } The output looks like:

Emptying a Hash Table Value You can clear a hash table using the hashref: sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties(); my $hashprop = $props->define(“H”, {'A', 'apple', 'B', 'banana'}); my $hashref = $hashprop->getHashRef();

%$hashref = {};

return pilot::READYFORINPUTDATA; } At the end, you should have no contents in your table when you view it.

Hash Table Values as Perl Arrays While not a standard Perl capability, you can also access a hash table value as a native Perl array of strings. In this case, the ith entry is composed of the key, followed by name equal sign, followed by the value. This is most useful for display purposes. sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties(); my $hashprop = $props->define(“H”, {'A', 'apple', 'B', 'banana'});

my @arr = $hashprop->getValue();

Hash Table Values in Perl | Page 24 $props->define(“SecondEntry”, $arr[1]);

return pilot::READYFORINPUTDATA; } The result is:

The order of the array is not necessarily the order you'll get if you perform a standard iteration on a hash table. Hash tables are not by nature ordered, so the results will depend on the implementation.

Perl HashTableValue API All of these methods are called on the property containing the hash table value. (You can also access and manipulate these values by getting a hashref, and using native Perl syntax for working with hash tables; we will assume the user has some familiarity with Perl, and do not described these methods further.) getHashRef Returns a hashref allowing access and manipulation of the property hash table value. Return value: {Hashref} A hashref allowing access and manipulation of the property hash table value. Example: Return a hashref and iterate over the hash table, moving all key/item pairs to the root. sub onProcess { my $context = shift; my $data = shift; my $props = $data->getProperties(); my $hashprop = $props->define(“H”, {'A', 'apple', 'B', 'banana'}); my $hashref = $hashprop->getHashRef();

while ((my $key, my $value) = each %$hashref) { $props->define($key, $value); } return pilot::READYFORINPUTDATA; }

Page 25 | Pipeline Pilot • Perl Component Development Guide Chapter 5: Debugging Perl Components

The Pipeline Pilot Client provides a window that displays messages about your Perl component. The Debug Messages window is exposed when you run the protocol in debug mode, and it appears in the lower-left window. (You can also view it by selecting View > Debug Windows > Debug Messages.) The following example shows the Debug Messages window for the Perl Debugging example protocol:

Debug messages for the Perl Debugging example Tip: To run protocols in debug mode, press SHIFT+F5 when you run the job. The Debug Messages window opens if you configured your protocol to use debugging features. The Debug Messages window displays output messages from all selected components. If you are only interested in seeing the output from a single component, select it by clicking it in the protocol window. To see the output from all of the components, click on the background of the protocol window. You can send messages to the Debug Messages window from your Perl code with:

Debugging Perl Components | Page 26 pilot::debugMessage("Your message"); or with: pilot::debugMessageError("Your error message"); Both of these commands display your string to the window. The only difference is that debugMessage displays in black, and debugMessageError displays in red. The normal Perl string interpolation allows you to display the values of variables in your code: pilot::debugMessage("My value = $myVariable"); pilot::Debug Module Several useful debugging subroutines are provided in the pilot::Debug module. You can include this module in your program by putting the following statement near the beginning of your program: use pilot::Debug; pilot::Debug::captureOutput() This function allows you to redirect output from Perl print and warn statements. If the single Boolean argument is true, output capturing is enabled. Otherwise output capturing is disabled. Enabling output capturing applies to all code executed by this component and remains in effect until it is disabled or the protocol terminates. The following code excerpt shows how it is used. # Start capturing the Perl STDOUT and STDERR streams. pilot::Debug::captureOutput(1); print "This message will be printed to STDOUT.\n"; print STDERR "This message will be printed to STDERR.\n"; warn "This warning message will be printed to STDERR.\n";

# Stop capturing the STDOUT and STDERR streams. pilot::Debug::captureOutput(0); print "This message will not be displayed\n"; pilot::Debug::showVar() This subroutine prints human readable representation of Perl data structures to the Debug Messages window. pilot::Debug::showVar() has two parameters: a reference to the variable to display, and an optional descriptive message. It is a wrapper for Data::Dumper::dump() from the standard Perl library. Example: # simple example my $i = 123; pilot::Debug::showVar( \$i , "This is just a reference to a scalar.");

# showVar() can display complex nested data structures. pilot::Debug::showVar( [ 'one', ['two - A', 'two - B', 'two - C'],

Page 27 | Pipeline Pilot • Perl Component Development Guide 'three', { firstname => 'John', lastname => 'Smith' } ] ); pilot::Debug::gotHere() This subroutine prints the following: subroutine name that contains the gotHere() call line number in the file containing the gotHere() call subroutine name that called subroutine containing the gotHere() call to the console. pilot::Debug::whoCalled() Returns a string containing the name of the subroutine that called the current subroutine. pilot::Debug::whereCalled() Returns a string containing the filename and line number identifying where the current subroutine was called.

Debugging Perl Components | Page 28