Observe

Observe Inc.

Sep 27, 2021

GETTING STARTED

1 Overview 3

2 Ingesting and Exploring Data with Observe5

3 Introduction to Alerts and Monitors 13

4 Introduction to Metrics 33

5 Data ingestion 43

6 Worksheets 113

7 OPAL — Observe Processing and Analysis Language 127

8 List of OPAL verbs 143

9 List of OPAL functions 169

10 Observe Glossary 241

11 Helpful Hints 243

12 Observe Datasets and Time 249

13 Observe Basic Data Processing Model 251

14 FAQ 255

Index 257

i ii Observe

For immediate support, please ask to join our Slack: Sign up for a new account | Slack

GETTING STARTED 1 Observe

2 GETTING STARTED CHAPTER ONE

OVERVIEW

Observe shapes and relates data, making it easy to ask any question about your application, infrastructure, service, or system. It all starts with your data. In Observe, any event data that provides insight into the state of a system, is called an Observation. We will ingest a trace, log, metric, or pretty much anything else with a timestamp. Our data collectors are very permissive and support all of the popular open source collectors. To make this data easier to work with, we transform it into datasets. Datasets are structured representations of your data and can be linked to each other. We provide datasets out the box for popular technologies, e.g. Kubernetes, AWS etc. If we don’t have your use case covered don’t sweat it. In Observe you can build these yourself with about the same amount of work it would take to create a dashboard. There are two different interfaces for working with data in Observe: worksheets and landing pages: Landing Pages Landing pages are automatically generated dashboards. They use the structure of datasets to generate visualizations, context, and navigation. When you click on a dataset card in Observe this is the default view. Worksheets Worksheets are like an infinite spreadsheet for your data. They support direct manipulation of data that include: corre- lation, field extraction, aggregation, visualization, and dataset creation. Each worksheet contains one or more stages, which are tables that can be dependent on the results of another stage. Linked stages are very useful for capturing all of the steps of an investigation so that it can be reused or shared with others. If you need to extend Observe we have you covered. Everything you do in a worksheet generates OPAL (Observe Processing Analytics Language), so power users can hammer away. Anyone can create a new dataset from a worksheet selecting the publish option in the UI.

3 Observe

4 Chapter 1. Overview CHAPTER TWO

INGESTING AND EXPLORING DATA WITH OBSERVE

You’ve logged into Observe and had a look around. Maybe someone on your team started collecting data. Now what? This page describes the basics of ingesting data from a script and exploring it in Observe. It walks you through gener- ating test data, viewing it in the Firehose, and shaping it in a worksheet. To follow this tutorial, you will need: • Your customer ID • An ingest token (How to create an ingest token) • One or more MacOS, , or Win 10 systems • Python 3.x for MacOS and Linux, or PowerShell for Windows

2.1 A basic data generating script: ps-top-cpu.py

You can send nearly any type of data to Observe, including from shell commands and scripts. The ps-top-cpu script gets the highest CPU using process with ps and sends it to the HTTP collection endpoint as a JSON object. GitHub links: MacOS and Linux: ps-top-cpu.py Windows PowerShell: top-cpu.ps1 To use it, save the appropriate file to your local system and update the following values: # path and host are used to construct the collection URL # Example: # https://collect.observeinc.com/v1/http/my_path?host=my-laptop path="my-ps-top-cpu" host="my-laptop"

# customer_id and ingest_token are sent in an Authorization header customer_id="12345" ingest_token="my-token"

# The command to run: get the process using the most cpu # Uncomment the appropriate one for your system # MacOS: cmd="ps -Ao pid,pcpu,comm -r -c | head -n 2 | sed 1d" # Linux: # cmd = "ps -eo pid,pcpu,comm --sort=-pcpu | head -n 2 | sed 1d"

5 Observe

Note: the PowerShell script does not require a value for cmd. In the script, path is appended to the collection URL and host added as a URL parameter. As observations from this source are ingested, these become additional values in the EXTRA column. You can later use them to query events from this source. (You can add additional path segments and URL parameters if you like. Separate path segments with a single slash /.) If desired, change sleep_time to send observations more or less often. The default is every 10 seconds. Make sure the file has execute permissions so you can run it. Since it contains your ingest key, you may want to restrict access to the script if you are on a shared system. Run the script to send data to Observe. If you are sending from more than one machine, remember to update host for each local copy. This allows you to see which system a particular observation came from. Leave the script running while you explore the data. When you are finished collecting, type Ctrl-C to stop.

2.2 About the Firehose, or the Observation table

When a new data source is ingested, before any shaping or filtering, it is visible in the Firehose. Also called the “Observation table,” this dataset shows everything you have coming into Observe. If there isn’t much yet, you can do some simple searching from here. But it could also be quite a lot. A better way is to create a worksheet.

2.3 Refine your results in a Worksheet

A worksheet is where you shape your data into a cohesive view. Youcan manipulate and transform, create visualizations, link additional datasets, and save and share the results. If you are still looking at the Firehose, you can open a new worksheet from there by clicking the Open Worksheet button. Alternatively, go to Worksheets from the left sidebar and click the New Worksheet button. A dialog displays different types of datasets you could choose for your new Worksheet. To get the same data you were looking at in the Firehose, search for “Observation” and select the Observation event stream.

Now you have a basic worksheet with data from the Observation table. (The tab name has an asterisk and is in a different font to indicate you have unsaved changes.)

6 Chapter 2. Ingesting and Exploring Data with Observe Observe

To narrow the results to just your ps-top-cpu data, start by filtering on its path: • In the EXTRA column header, select Filter JSON from the menu. This opens a dialog with a list of fields in the data. • Select Value from the dropdown menu, since the path you want is a value rather than a field. • Search for your path, then Check the box and click Apply to show only those rows.

In the FIELDS column, you should only see the data of interest. But it’s still JSON. Use Extract From JSON to create new columns.

2.3. Refine your results in a Worksheet 7 Observe

With these new columns, maybe you don’t need FIELDS anymore. You can temporarily hide it, or delete if you won’t use it again in this worksheet.

To show a hidden column again, open the Table Controls dialog and toggle its visibility. Also, none of this changes the underlying data. If you delete a column in this worksheet, it is still available for other worksheets.

8 Chapter 2. Ingesting and Exploring Data with Observe Observe

As you explore this data, you might have noticed the console at the bottom of the page. As you update your worksheet, the console displays the equivalent OPAL statements. You can combine UI actions with OPAL, the Observe Processing and Analysis Language, to build more complex queries than the UI alone. For more, see OPAL — Observe Processing and Analysis Language

2.4 Create a visualization

Now that you have some useful columns, try creating a visualization. From the More menu, select Add Visualization:

This creates a new visualization card, ready to configure in the right rail.

2.4. Create a visualization 9 Observe

Example: Maximum of CPU grouped by Command, as a Stacked Area chart:

10 Chapter 2. Ingesting and Exploring Data with Observe Observe

If you like this worksheet, click the Save button to save it. You can find it later under “Your Worksheets” and pickup where you left off or share it with others. You can also change its name to something more meaningful byclickingon “Observation” at the top of the page. In addition to referring back to this particular data, you might want to link the results of your shaping elsewhere in Observe. To do this, create a new dataset by publishing it.

2.5 Publishing an event stream

You have already seen an event stream, in the form of the Firehose. Event streams, along with Resource Sets, are types of datasets. And like any dataset, they can be linked to other stages in other worksheets as part of data shaping. To create an event stream from this worksheet, click Publish New Event Stream in the right rail. Your current worksheet updates to reference this new dataset, so if its definition changes later, it gets those changes automatically. (And sowill any other worksheets that reference it.)

2.5. Publishing an event stream 11 Observe

12 Chapter 2. Ingesting and Exploring Data with Observe CHAPTER THREE

INTRODUCTION TO ALERTS AND MONITORS

Observe Monitors are a flexible way to alert on patterns in your incoming data. Define who should receive alertswith channels and channel actions, then create monitors to watch for your desired conditions. When one occurs, Observe sends alerts to everyone (or every service) in its channel. You can send alerts to any combination of email addresses and webhook-enabled services. Monitors complement resource notifications by adding alerts. The diagram below describes how they work with noti- fications and channels:

13 Observe

3.1 What does a Monitor do?

A monitor watches a dataset for a particular condition, such as a count of events or a specific text value. When you create a monitor, Observe makes a new dataset based on the contents of the page and your conditions. This allows multiple monitors from the same page to be independent of each other. The Notifications tab lists active Important notifications, for alerts currently matching or exceeding their triggering conditions. Click on an active notification to view its history. (For more about Important/Informational notifications, see Monitor Notification Options.)

3.2 What are Channels and Channel Actions?

A channel is a set of alert recipients, and each type of recipient is defined in a Channel Action. A channel action specifies the type of alert (email or webhook), where it is sent, and the template for its message orpayload. A channel may have multiple channel actions, and itself be subscribed to multiple notifications. When a monitor triggers an alert, emails or webhook requests go to all recipients in its configured channels. You don’t need to select individual recipients each time you create a new monitor.

3.3 How do I configure alerts?

Set up channels and channel actions, and then create monitors from any worksheet or landing page. Each channel action can have a custom alert message: send your HTML to an email recipient or JSON in a webhook request. For an example, see this page: Alerting Example: Channels, Channel Actions, and Monitors. For more about trigger conditions, see Monitor Notification Options

3.3.1 Alerting Example: Channels, Channel Actions, and Monitors

Here is an example that illustrates how channels and channel actions work with monitors to create alerts. The goal: You want to alert on too many errors in the TestWebApp application. Alerts should go to the owner of the application, [email protected], and a Slack channel, #test-web-app. How? Use a monitor to watch the number of errors and alert when it exceeds a threshold. You expect similar alertable conditions in the future, so set up a channel and actions in advance: Monitor notifications may send alerts to multiple channels, and a channel may contain multiple actions. Channelsallow you to configure alerts for a group of recipients without selecting them individually for each newmonitor. (You can also create channels on the fly. See below for more.)

14 Chapter 3. Introduction to Alerts and Monitors Observe

Create Channel Actions

A channel action sends a single type of alert. For multiple types, or for different alert messages for different recipients, create a channel action for each. To send an email to [email protected]: (For more about the message body, see Customizing Alert Messages) Click Continue to go to the Channel section: Click Continue to go to the Name section:

Now create a second channel action for the Slack message: Click Continue to go to the Channel section: Note: You can send alerts to any service that accepts incoming webhooks. For Slack’s docs, see Incoming webhooks for Slack.

Create a Channel

Now that you have some actions, create a channel so you can send both types of alerts at the same time:

3.3. How do I configure alerts? 15 Observe

Create a Monitor

With that setup done, later when you create a monitor you only need to select the appropriate channel. For this example, presume you have a worksheet that shows errors from TestWebApp. (You can create monitors from landing pages too.) You want to trigger an alert when there are more than 50 errors in the previous 10 minutes, so create a monitor with that condition: • In the worksheet, click the More button and select Create a Monitor. • You want to compare the number of events to a static value, so choose a Count monitor. • Click the green options to set your desired condition: greater than, 50, 10, and minutes. For more about condition options, see Monitor Notification Options. Click Continue to go to the Notification section. Click Continue to go to the Name section. Your new monitor displays in the Monitors tab:

16 Chapter 3. Introduction to Alerts and Monitors Observe

Create a New Channel While Creating a Monitor

You may also create a new channel at the same time you are creating a monitor. In the Notification section of the Create a Monitor page, select Create New Channel at the bottom of the list of channels. Then complete your monitor configuration as above. Your new channel won’t have any actions, so return to the Monitors tab and create a new channel action. Then add it to your channel.

3.3.2 Monitor Notification Options

There are several options to control how many notifications a monitor generates. These are “Group” or “Don’t Group”, and “Single” or “Separate” for how many grouped notifications to send. Additionally, a notification may beeither Informational or Important.

Grouping notifications

A “Don’t Group” alert generates a message for each triggering condition.

3.3. How do I configure alerts? 17 Observe

A “Group” notification combines all events of a similar type when determining whether to send anotification. For example, you may have multiple devices, each with a DeviceID. When some of them trigger an alert, you can group alerts by DeviceID to only send one message per device.

Single or Separate

For grouped notification, also choose a single notification for all the currently triggering conditions, or aseparateone for each. To illustrate how separate/single works, consider a door sensor: Triggering condition: Alert if any door is open for longer than 10 minutes. The door sensors report the following activity: A single notification sends one notification when any door is open for longer than 10 minutes. In this case,itmeans doors were open from 10:00am-10:30am A separate Door notification sends one notification for each door that is open for longer than 10 minutes. Twonotifi- cation emails are sent, one for the back door, and another for the side door.

18 Chapter 3. Introduction to Alerts and Monitors Observe

Importance

Notifications are either Informational or Important. By default, both types of alerts (email and webhook) include importance in the body or payload. Important alerts are shown in the Notifications tab. For resource landing page alerts, display all alerts by checking the Informational box under Namespace - Importance in the notifications tab.

For alerts sent to a service, you may be able to route or act on informational and important alerts differently. Please consult the service’s documentation for its available options.

3.3.3 Customizing alert messages

About alert templates

In addition to recipients, channel actions also define the contents of alert messages through a Mustache template. You may replace the default message with a custom HTML email or JSON payload template. Some variables are nested inside sections containing multiple values. See Template syntax for more about accessing items inside a section. Available variables are described in the tables below. Note that some types of information may be accessible in multiple ways. You may find it helpful to navigate this page with the Table of Contents in the right sidebar.

3.3. How do I configure alerts? 19 Observe

Channels and Channel Actions

Channel and Channel Action variables contain names and links to view or edit the Channel or Action:

Channel element Variable name Example Name channel.name “On Call” URL channel.url Link to the Edit Channel page for this channel

Channel Action element Variable name Example Name channelAction. “#ops-p2-alerts” name URL channelAction.url Link to the Edit Channel Action page for this channel action

Monitors

Common monitor variables

Monitor variables contain details of the monitor, such as the triggering condition and notification threshold. Every monitor includes the following basic elements:

Monitor ele- Variable name Example ment ID monitor.id “1234567890” Name monitor.name “Test monitor” Description monitor. “Send a Separate Informational Notification when Any of the data trig- notifyWhen gers” Condition to trig- monitor. “Check all of the data where count of CONTAINER_ID is greater than ger on triggerWhen 100 in the last 10m0s” URL monitor.url Link to the Edit Monitor page for this monitor

The monitor.notifyWhenDetails variables contain information about how the notification was configured, such as whether it is a single or separate notification.

Monitor ele- Variable name Example ment Condition monitor.notifyWhenDetails Section containing details of why the monitor triggered no- details tifications to be sent Any or all monitor.notifyWhenDetails. “Any” groups condition Importance monitor.notifyWhenDetails. “Informational” importance Single or sep- monitor.notifyWhenDetails. “Separate” arate merged

In addition to the text description in monitor.triggerWhen, the monitor.triggerWhenDetails variables contain more information about the triggering condition.

20 Chapter 3. Introduction to Alerts and Monitors Observe

Monitor ele- Variable name Example ment Condition monitor.triggerWhenDetails Section containing the details of the triggering condi- details tion Duration monitor.triggerWhenDetails. How long the condition has been occurring: “10m0s” duration Field to monitor.triggerWhenDetails. “CONTAINER_ID” check field Grouping monitor.triggerWhenDetails. “None” type grouping Group by monitor.triggerWhenDetails. The fields to group notifications by: “deviceId” fields groupingFields Kind monitor.triggerWhenDetails. Type of monitor: Count, Promote, Text Value (Facet), kind Threshold (Metric)

Variables by type of monitor

Most kinds of monitors have additional values for details specific to that type. For Count monitors:

Monitor element Variable name Example Threshold value monitor.triggerWhenDetails.threshold “is greater than 100”

For Text Value (Facet) monitors:

Monitor ele- Variable name Example ment Continuity monitor.triggerWhenDetails. “all the time” continuity Triggering text monitor.triggerWhenDetails. The triggering value to check for in the specified value fieldValueEquality field: “active”

Threshold (Metric) monitors:

Monitor element Variable name Example Threshold value monitor.triggerWhenDetails.threshold “is greater than 1000”

Notifications

Common notification variables

Notification variables contain details of the notification itself, such as when the alert triggered and the resourcesor values that triggered it. Every monitor includes the following basic notification elements:

3.3. How do I configure alerts? 21 Observe

Notifi- Variable name Example cation element Descrip- notification. Value from description field for Promotion monitors, text describing trigger for tion description others: “count of CONTAINER_ID is greater than 100 in the last 10m0s” Recipient notification. Email address, for email notifications: “[email protected]” email email Kind notification. For Promotion monitors, value from the field configured as the Notification Kind. kind Same as notification.description for other types. Triggered notification. “2021-01-29T01:33:46Z” at startTime URL notification. Link to this notification in Observe. url Has re- notification. If this notification includes triggering resources: true sources? hasResources T/F Has val- notification. If this notification includes triggering values: false ues? T/F hasValues

Variables for triggering resources

If the notification includes resources, their details are available in multiple formats. Choose the one most appropriate for how you want to display resources in your alert. (See Template syntax for more about working with sections.) Both resourcesByLinkType and resourcesWithLinkType are ordered by the link name, also called its label. The label represents the foreign key relationship created by linking datasets. For example, a logGroup field linked to a Cloudwatch Log Group resource creates a linked column, in this example called Log Group in the event table. The values in that Log Group column are the individual name items, for example “/aws/lambda/ObserveCollection”.

The resourcesByLinkType section contains resources grouped by the link name, also called its label. For each label, there is a list of all the items of that type. This is useful when you are looking for a specific item you expect in the data, as you need to know the label.

Notification element Variable name Example Resources by link type notification.resourcesByLinkType Section containing resources by type Resource name name “Chris R. User” Resource URL url Resource URL

The resourcesWithLinkType section also groups items by the label, each containing a list of instances of that type. This is more convenient for presenting resources arranged into sections.

22 Chapter 3. Introduction to Alerts and Monitors Observe

Notification ele- Variable name Example ment Resource link type notification. Section containing link types with their in- resourcesWithLinkType stances Resource link type linkType “User”, “Container” Instances of this type instances List of resources, with name and url

The notification.resources section contains an unordered list of resources. This is useful for showing a list of items in a webhook payload.

Notification element Variable name Example All triggering resources notification.resources Section containing all triggering resources Resource name name Resource name Resource URL url Resource URL

Examples

Using resourcesByLinkType: Access values for a link type in this section by appending its label to #notification.resourcesByLinkType. Then specify the desired item, available values are name and url. This example displays the name for each User resource. Template: {{#notification.hasResources}} {{#notification.resourcesByLinkType.User}} {{name}} {{/notification.resourcesByLinkType.User}} {{/notification.hasResources}}

Sample data: "resourcesByLinkType": { "Cluster": [ { "name": "k8s.example.com", "url": "https://..." } ], "User": [ { "name": "Chris R. User", "url": "https://..." } ] }

Using resourcesWithLinkType: This example displays a list of url and name values for each link type. Template:

3.3. How do I configure alerts? 23 Observe

{{#notification.hasResources}} {{#notification.resourcesWithLinkType}} {{linkType}} {{#instances}} {{name}} {{/instances}} {{/notification.resourcesWithLinkType}} {{/notification.hasResources}}

Sample data: "resourcesWithLinkType": [ { "linkType": "Cluster", "instances": [ { "name": "k8s.example.com", "url": "https://..." } ] }, { "linkType": "User", "instances": [ { "name": "Chris R. User", "url": "https://..." } ] } ]

Using notification.resources: This example iterates over the list of items in the resources section and displays the value of name for each. Template: "Resources": "{{#notification.resources}}{{name}} {{/notification.resources}}"

Sample data: "resources": [ { "name": "k8s.example.com", "url": "https://..." }, { (continues on next page)

24 Chapter 3. Introduction to Alerts and Monitors Observe

(continued from previous page) "name": "Chris R. User", "url": "https://..." } ]

Variables for triggering values

Like resources, triggering values are also available in multiple ways. If notification.hasValues is true, these variables contain details of the values that triggered the notification. The resourcesByFieldName section contains a list of name/value pairs for each resource. This is useful when you are looking for a specific item you expect in the data, as you need to know the the fieldname.

Notification element Variable name Example Values by field name notification.valuesByFieldName Section containing name/value pairs

The resourcesWithFieldName section contains a list of resources grouped by the field name. This is helpful for presenting resources grouped by field.

Notification element Variable name Example Values with field notification. Section containing field names and values as indi- name valuesWithFieldName vidual items Field name fieldName “city” List of values for this values [“San Mateo”, “San Francisco”] field

The notification.values section contains two lists, one with field names and another with values for those fields. The two lists are in the same order: the first field in the fields list has a corresponding value in the first item inthevalues list. This is useful for constructing a two dimensional table of the triggering values.

Notification element Variable name Example All triggering values notification.values Section containing all triggering values List of value fields fields List of fields List of value rows rows List of rows, in the same order as fields list

Examples

Using valuesByFieldName: Similar to resourcesByLinkType, access values in a notification.valuesByFieldName section by appending the desired label. Since it’s a list, also append {{.}} to get its members. This example displays a list of userId values. Template: {{#notification.hasValues}} {{#notification.valuesByFieldName.userId}}{{.}} {{/notification.valuesByFieldName.userId}} {{/notification.hasValues}}

Sample data:

3.3. How do I configure alerts? 25 Observe

"valuesByFieldName": { "MonitorName": ["An error occurred"], "userId": ["123"] }

Using valuesWithFieldName: This HTML example displays a list of values for each field name as rows in a table. Template: {{#notification.hasValues}} {{#notification.valuesWithFieldName}} {{fieldName}} {{#values}} {{.}} {{/values}} {{/notification.valuesWithFieldName}} {{/notification.hasValues}}

Sample data: "valuesWithFieldName": [ { "fieldName": "userId", "values": ["123"] }, { "fieldName": "MonitorName", "values": ["An error occurred"] } ]

Using notification.values: This example displays the contents of the rows list. Template: "Values": "{{#notification.values.rows}}{{.}} {{/notification.values.rows}}",

Sample data: "values": { "fields": [ "userId", "MonitorName" ], "rows": [ (continues on next page)

26 Chapter 3. Introduction to Alerts and Monitors Observe

(continued from previous page) [ "123", "An error occurred" ] ] }

Template syntax

Most variables are string values, these may be placed anywhere in your template. When an alert is triggered, Observe replaces the template variables with values from the current alert. Add the desired variables inside double braces, like this: {{monitor.name}} at {{notification.startTime}}

To add a comment, put the text inside a {{! }} element. Example webhook payload template: { {{! Display some basic info }} "Notification": "{{notification.kind}}" "Monitor": "{{monitor.name}}", "Trigger at": "{{notification.startTime}}", "Importance": "{{monitor.notifyWhenDetails.importance}}", "Trigger Condition": "{{monitor.triggerWhen}}", "Notification": "{{monitor.notifyWhen}}", "Description": "{{notification.description}}", "Resources": "{{#notification.resources}}{{name}} {{/notification.resources}}", "Values": "{{#notification.values.rows}}{{.}} {{/notification.values.rows}}", "URL": "{{{notification.url}}}" } notification.resources and notification.values.rows contain lists of the resources or threshold values that triggered the notification. In the HTML example below, {{#notification.hasResources}} begins a template section. If notification. hasResources is True, the contents of this section are evaluated, displaying the list of resources, with their name and url. If False, the section is skipped. {{/notification.hasResources}} indicates the end of the section. Vari- ables inside a {{#sectionName}} {{/sectionName}} conditional section are accessed directly by name, without a section prefix. Example email body template: {{#notification.hasResources}} {{#notification.resourcesWithLinkType}} {{linkType}} {{#instances}} (continues on next page)

3.3. How do I configure alerts? 27 Observe

(continued from previous page) {{name}} {{/instances}} {{/notification.resourcesWithLinkType}} {{/notification.hasResources}}

For more about template syntax, see the Mustache documentation.

Examples

Slack

Slack alerts use Block Kit and the Incoming Webhook URL for your Slack app. Slack Incoming Webhook guidelines: • A message may have up to 50 blocks. • A text item inside a section text field may contain up to 3000 characters. • Avoid fields blocks inside a section, as they may only contain 10 fields. – text items in field blocks are limited to 2000 characters. For more information, see the Slack layout block reference.

Basic Slack alert body example

{ "blocks": [ { {{! header is a large bold font }} "type": "header", "text": { "type": "plain_text", "text": "Test Slack Alert at {{notification.startTime}}" } }, { {{! section is plain text, can use markdown }} "type": "section", "text": { "type": "mrkdwn", "text": "*Monitor:*\n{{monitor.name}}" } }, { {{! this is a comment }} "type": "section", "text": { "type": "mrkdwn", "text": "*Kind:*\n{{notification.kind}}" (continues on next page)

28 Chapter 3. Introduction to Alerts and Monitors Observe

(continued from previous page) } }, { {{! divider is a narrow horizontal line }} "type": "divider", }, { {{! actions is for interactive elements, like a button }} "type": "actions", "elements": [ { "type": "button", "text": { "type": "plain_text", "emoji": true, "text": "View Notification" }, "style": "primary", "url": "{{notification.url}}" } ] } ] }

More complex Slack alert body example

{ "blocks": [ { "type": "header", "text": { "type": "plain_text", "text": "{{notification.kind}} at {{notification.startTime}}" } }, {{#notification.hasResources}} {{#notification.resourcesWithLinkType}} { "type": "section", "text": { "type": "mrkdwn", "text": "*{{linkType}}:*\n{{#instances}}<{{url}}|{{name}}>\n{{/instances}}" } }, {{/notification.resourcesWithLinkType}} {{/notification.hasResources}} {{#notification.hasValues}} {{#notification.valuesWithFieldName}} { (continues on next page)

3.3. How do I configure alerts? 29 Observe

(continued from previous page) "type": "section", "text": { "type": "mrkdwn", "text": "*{{fieldName}}:*\n{{#values}}{{.}}\n{{/values}}" } }, {{/notification.valuesWithFieldName}} {{/notification.hasValues}} { "type": "section", "text": { "type": "mrkdwn", "text": "*Monitor:*\n{{monitor.name}}" } }, { "type": "section", "text": { "type": "mrkdwn", "text": "*Description:*\n{{notification.description}}" } }, { "type": "actions", "elements": [ { "type": "button", "text": { "type": "plain_text", "emoji": true, "text": "View Notification" }, "style": "primary", "url": "{{notification.url}}" } ] } ] }

Microsoft Teams

Teams alerts use an Incoming Webhook. For more details, see the Microsoft Teams documentation: Create and send messages

30 Chapter 3. Introduction to Alerts and Monitors Observe

Microsoft Teams alert body example

{ "@type": "MessageCard", "@context": "http://schema.org/extensions", "themeColor": "0076D7", "summary": "{{monitor.name}} triggered at {{notification.startTime}}", "sections": [{ "activityTitle": "{{monitor.name}} triggered at {{notification.startTime}}", "activitySubtitle": "{{notification.description}}", "activityImage": "https://teamsnodesample.azurewebsites.net/static/img/image5.png", "facts": [{ "name": "Assigned to", "value": "Unassigned" }, { "name": "Started at", "value": "{{notification.startTime}}" }, { "name": "Importance", "value": "{{monitor.notifyWhenDetails.importance}}" }], "markdown": true }], "potentialAction": [{ "@type": "OpenUri", "name": "View Monitor", "targets": [{ "os": "default", "uri": "{{notification.url}}" }] }] }

3.3. How do I configure alerts? 31 Observe

32 Chapter 3. Introduction to Alerts and Monitors CHAPTER FOUR

INTRODUCTION TO METRICS

A metric is any sort of value you can measure over time. It could be blocks used on a filesystem, the number of nodes in a cluster, or a temperature reading. They are reported in the form of a time series: a set of values in time order. Each point in a time series represents a measurement from a single resource, with its name, value, and tags. Observe links metrics to Resource Sets, so you can view relevant metrics on a Resource Landing Page.

This page describes the process of shaping raw metrics data for Resources. There are several considerations and decisions to make in the modeling process. Please contact us if you have questions about modeling your specific data.

Note: Metrics use OPAL in a worksheet to transform the raw data, add metadata, and create relationships between datasets. If you are not familiar with OPAL, please see OPAL — Observe Processing and Analysis Language

33 Observe

4.1 What Is a Metric Dataset?

An Observe metric dataset contains both metrics data and metadata that provides additional context. Observe uses two different forms for metrics, called narrow and wide.

4.1.1 Narrow Metrics

Narrow metrics contain one metric per row: a single data point containing a timestamp, name, value, and zero or more tags. For example, the following table contains values for two metrics in narrow form:

valid_from metric_name metric_value metric_tags 00:00:00 disk_used_bytes 20000000 {“device”:”sda1”} 00:00:00 disk_total_bytes 50000000 {“device”:”sda1”} 00:01:00 disk_used_bytes 10000000 {“device”:”sda1”} 00:01:00 disk_total_bytes 50000000 {“device”:”sda1”} 00:02:00 disk_used_bytes 40000000 {“device”:”sda1”} 00:02:00 disk_total_bytes 50000000 {“device”:”sda1”}

Some systems generate this by default, or you can shape other data into the correct form with OPAL.

Note: Metric values must be float64. If you need to convert from another type, see the float64 function.

Narrow metrics are easier to manage at ingest time and as events. With one metric per row, it is clear which value and tags belong to what metric. The interface verb registers a dataset as a metric dataset, and addmetric specifies the details of the individual metrics it contains.

4.1.2 Wide Metrics

Wide metrics contain several, often related, metrics. This form is easier for calculations, such as percent usage, because the needed values can be available in the same row. Wide metrics are created by the rollup and aggregate verbs. rollup defines how each narrow metric should be aggregated over time, and aggregate determines how wide metrics from different sources are aggregated by tags. The table below is a wide format rollup based on the example above. It includes valid_from and valid_to timestamps indicate the time period over which the average is calculated.

valid_from valid_to disk_used_bytes_avg disk_total_bytes_avg metric_tags 00:00:00 00:01:00 15000000 50000000 {“device”:”sda1”} 00:01:00 00:02:00 25000000 50000000 {“device”:”sda1”}

34 Chapter 4. Introduction to Metrics Observe

4.2 Create an Interface

interface maps fields to a metric interface so subsequent operations know which fields contain the the metricnames and values. This metadata-only operation prepares a dataset for use as metrics. Example: interface "metric", metric:metricNameColumn, value:metricValueColumn

Registering, or “implementing the metric interface,” establishes the following conditions: • This dataset contains narrow metrics • Each row represents one point in a time series • The metricNameColumn column contains the metric names • The metricValueColumn column contains the metric values

4.3 Define Individual Metrics

Once the dataset is set up for metrics, use addmetric to define the metadata for each metric. If you have many metrics, you can register some and add others later by updating the Event Stream definition. Example: addmetric options(label:"Ingress Bytes", type:"cumulativeCounter", unit:"bytes",␣

˓→description:"Ingress reported from somewhere", rollup:"rate", aggregate:"sum"),

˓→"ingress_bytes"

This statement registers the metric ingress_bytes as a cumulativeCounter, which is aggregated over time as a rate, and across multiple tags as a sum. For more about allowed values for the rollup and aggregate options, please see the OPAL verb documentation for addmetric and the example walkthrough below.

Note: addmetric units use standard SI unit names from the math.js library, with the exceptions noted below. They may be combined for compound units like rates and ratios. Other units may not scale appropriately in charts, please contact us if you have trouble with an unusual or custom unit. You may use either the unit names or abbreviations, and most names can be either singular (hour) or plural (hours.) Please see the math.js docs for details. We recommend using full names for clarity. Note that both names and abbreviations are case-sensitive. For a unitless measurement, either omit unit: or use unit:"". Examples of data units:

Name Abbreviation bits b bytes B kilobytes kB gigabytes GB terabytes TB bytes/second B/s megabits/second Mb/s

Exceptions:

4.2. Create an Interface 35 Observe

• m is minutes, use meter for length • C is degrees celsius, use coulomb for electric charge • F is degrees fahrenheit, use farad for capacitance

4.4 Link Metrics to Resources

To show metrics on a Resource Landing Page, link the metric dataset to the Resource Set’s primary key. From the metric dataset worksheet, select Link To Resource Set from the column heading menu for the same key. Save the updated Event Stream definition to link the two datasets. Reload the Resource Landing Page to see the new metrics.

4.5 Walkthrough: Putting It All Together

To show how this works, here is an example of creating metrics from process data. We have a shell script that sends data from ps to Observe every five seconds. Before it’s converted to JSON, the original output looks like this: PID RSS TIME %CPU COMMAND 1 12752 1 2.0 systemd 2 0 0 0.0 kthreadd 3 0 0 0.0 rcu_gp ps reports several pieces of information for each process, so the first step is to shape the data into narrow form with OPAL. 1. Open a new worksheet based on the Firehose, also called the Observation Event Stream. Then filter to the desired observations and extract needed fields: // The script used a unique path for HTTP ingestion // Filter on it to get the desired data filter OBSERVATION_KIND="http" and string(EXTRA.path)="/metricquickstart"

// Flattenleaves creates a new row for each set of process data, // corresponding to one row in the original output // Creates _c_FIELDS_stdout_value containing each string // and _c_FIELDS_stdout_path for its position in the JSON object (which we don't␣

˓→need.) flattenleaves FIELDS.stdout

// Select the field that contains the data we want colpick BUNDLE_TIMESTAMP, ps:string(_c_FIELDS_stdout_value)

// Extract fields from the ps string output with a regex colregex ps, /^\s+(?P\d+)\s+(?P\d+)\s+(?P\d+)\s+(?P\d+.\

˓→d+)\s+(?P\S+)\s*$/

The reformatted data now looks like this:

36 Chapter 4. Introduction to Metrics Observe

BUNDLE_TIMESTAMP ps command pcpu cputimes rss pid 02/24/21 16:14:03.151 1 12752 1 2.0 systemd systemd 2.0 1 12752 1 02/24/21 16:14:03.151 2 0 0 0.0 kthreadd kthreadd 0.0 0 0 2 02/24/21 16:14:03.151 3 0 0 0.0 rcu_gp rcu_gp 0.0 0 0 3

Note: If your desired data is already part of an existing Resource Set, start from there instead of beginning with Observation. See Performance for more.

2. Shape into narrow metrics: // Create a new object containing the desired values, // along with more verbose metric names colmake metrics:makeobject("resident_set_size":rss, "cumulative_cpu_time":cputimes,

˓→"cpu_utilization":pcpu)

// Flatten that metrics object to create one row for each value flattenleaves metrics

// Select the desired fields, renaming in the process // Also convert value to float64, as currently required for metric values colpick valid_from:BUNDLE_TIMESTAMP, pid, command, metric_name:string(_c_metrics_path), metric_value:float64(_c_metrics_value)

After shaping, it looks like this:

valid_from pid command metric_name metric_value 02/24/21 16:14:03.151 1 systemd cpu_utilization 2.0 02/24/21 16:14:03.151 1 systemd resident_set_size 12752 02/24/21 16:14:03.151 1 systemd cumulative_cpu_time 1 02/24/21 16:14:03.151 2 kthreadd cpu_utilization 0.0 02/24/21 16:14:03.151 2 kthreadd resident_set_size 0 02/24/21 16:14:03.151 2 kthreadd cumulative_cpu_time 0 02/24/21 16:14:03.151 3 rcu_gp cpu_utilization 0.0 02/24/21 16:14:03.151 3 rcu_gp resident_set_size 0 02/24/21 16:14:03.151 3 rcu_gp cumulative_cpu_time 0

3. Register an interface to identify this dataset as containing metrics data. // This interface statement specifies that the names of our metrics are in // metric_name, and their values in metric_value interface "metric", metric:metric_name, value:metric_value

The interface verb adds metadata, so there’s no visible effect on the data yet. The metric keyword indicates that we want a metrics interface. This operation defines several important pieces of information about this dataset. Some are directly specified, and some are inferred from the dataset’s definition, or schema. • This is a narrow metric dataset, where each row represents one metric point • The values in metric_name are the metric names

4.5. Walkthrough: Putting It All Together 37 Observe

• The values in metric_value are the metric values • The values in valid_from are the time of the observation • The other fields (pid and command) are tags, used later for linking to a Resource Set 4. Define individual metrics Now we have a metrics-ready dataset. It contains raw metrics data, and we have told Observe which fields contain the names and values. To use it, we need additional metadata about the individual values. Create this for each metric using addmetric. // RSS is a gauge, a measurement at a point in time // rollup type "avg" means when a metric's value is tracked over time, we want the␣

˓→average // aggregate "sum" means when these values are tracked across multiple processes,␣

˓→we want the total sum // The name of this metric is resident_set_size, linking it to identically named␣

˓→values in the metric_name field addmetric options(label:"Memory Usage: RSS", unit:"kilobytes", description:"Resident set size of the process", type:"gauge", rollup:"avg", aggregate:"sum"), "resident_set_size"

// Cumulative CPU Time is a cumulativeCounter, a monotonically increasing total // rollup "rate" gives the rate at which a particular metric's value increases over␣

˓→time addmetric options(label:"Cumulative CPU Time", unit:"s", description:"The cumulative CPU time spent by the process", type:"cumulativeCounter", rollup:"rate", aggregate:"sum"), "cumulative_cpu_time"

// CPU Utilization is also a gauge measurement, with rollup "avg" and aggregate "sum

˓→" // This measurement is unitless, so unit: is omitted addmetric options(label:"CPU Utilization", description:"CPU utilization of the process, expressed as a percentage", type:"gauge", rollup:"avg", aggregate:"sum"), "cpu_utilization"

This defines what we want to track and how to treat it in subsequent rollup and aggregation operations. Thereis also another metric type, delta, the change from the previous measurement. Create a new dataset by publishing this worksheet as a new Event Stream.

38 Chapter 4. Introduction to Metrics Observe

5. Link the metrics dataset to a related Resource Set To view metrics on a Resource Landing Page, first we need a Resource Set. Start from the Event Stream wejust created, and open it as a worksheet. The pid and command fields contain additional tags for the metric datain the metric_name and metric_value fields we created earlier. Select these two fields (cmd-click or ctrl-click on the column headers), right click to open the menu, andchoose Create New Resource Set. Check thepid and command fields, and then specify pid as the primary key. This allows us to link the new Resource Set to the metric dataset. Click Create to save.

4.5. Walkthrough: Putting It All Together 39 Observe

Now you have a second stage in your worksheet, for the pid Resource Set. Click Publish New Resource Set in the right rail to make it available as a Resource Set. pid isn’t that descriptive of a name, so call it Process and click Publish to save. Now we need to tell the metrics dataset about the Resource Set’s primary key. Open a new tab and edit the walkthrough-metrics-quickstart Event Stream definition. Select the pid field and choose Link To Resource Set from the menu and then Process in the sub-menu.

Click Apply, and then Save to save changes to the Event Stream definition. 6. View metrics on the Resource Landing Page Open the Process Landing Page in a new tab to see the metrics in new cards.

40 Chapter 4. Introduction to Metrics Observe

4.5. Walkthrough: Putting It All Together 41 Observe

42 Chapter 4. Introduction to Metrics CHAPTER FIVE

DATA INGESTION

This section describes how to ingest data into Observe. Our goal is to accept data in any format: if your source or forwarder isn’t documented yet, let us know! This documentation is divided into four categories:

5.1 Integrations

Integrations streamline the process of collecting data from multiple sources. Where possible, we recommend starting with an integration and adding additional sources as needed. Example: Kubernetes

5.2 Sources

Sources may send data to Observe directly via an outgoing webhook, a forwarder, or another type of agent. The documentation for each source describes the recommended method and any additional installations required. Examples: AWS CloudWatch logs, Jenkins build logs

5.3 Forwarders

Forwarders collect data from a source and send it to Observe. They often offer additional features, such as the ability to aggregate data from multiple sources or perform lightweight transformations. Forwarders are useful when the original source does not have a way to send data elsewhere, such as a process that only generates a local log file. Examples: FluentBit, Prometheus Server, Google Cloud Pub/Sub

5.4 Endpoints

Endpoints support various wire protocols by which data can be ingested. All of our source and forwarder instructions ultimately send data to an endpoint. If you have a custom or highly customized source, you may configure it to use the appropriate endpoint directly. Example: JSON via HTTP POST

43 Observe

5.4.1 Integrations

An integration streamlines the task of collecting multiple sources for a given target environment. Integrations prioritize ease-of-use and time-to-value over configurability. If you are ingesting data from several related sources, consider installing one of our integrations rather than configuring all of them manually. The defaults are suitable for most use cases, and you can add additional sources or forwarders as needed.

AWS

The Observe AWS Integration streamlines the process of collecting data from AWS. Install it once and ingest logs and metrics from several common AWS services. Configure ingest for additional services by sending that data to forwarders that are already set up for you. The AWS Integration works with the datasets in your workspace. Contact us for assistance creating datasets and model- ing the relationships between them. We can automate many common data modeling tasks for you, ensuring an accurate picture of your infrastructure. If you are already ingesting AWS data, we are happy to discuss if the AWS Integration could enhance your existing data collection strategy.

What data does it ingest?

Standard ingest sources

The AWS Integration automatically ingests the following types of data from a single region: • CloudTrail logs of AWS API calls • CloudWatch Metrics streams with metrics for services you use • EventBridge state change events from applications and services To do this, it creates several forwarding paths: • An S3 bucket • A Kinesis Firehose delivery stream • The Observe Lambda forwarder These forwarders work in a single region, as many AWS services are specific to a particular region. For information about multi-region collection, see How do I collect data from multiple regions? in the FAQ.

Additional ingest sources

With these already configured and working, add additional services by configuring them to write to the bucket orsend logs to one of the forwarders. Details for common services may be found in our documentation: • API Gateway execution and access logs from your REST API • AppSync request logs from your GraphQL API • CloudWatch logs from EC2, Route53, and other services • GuardDuty security findings for threat detection • S3 access logs for requests to S3 buckets

44 Chapter 5. Data ingestion Observe

Using AWS Integration data

After shaping, the incoming data populates datasets like these: • CloudWatch Log Group - Application errors, log event detail • IAM – IAM Group - Which groups are accessing resources – IAM Policy - Policies in use, their descriptions and contents – IAM Role - Compare role permissions over time – IAM User - Most active users • EC2 – EC2 EBS Volume - Volumes in use, size, usage and performance metrics – EC2 Instance - What instances are in which VPCs, instance type, IP address – EC2 Network Interface - Associated instance, type, DNS name – EC2 Subnet - CIDR block, number of addresses available – EC2 VPC - Account and region, if default • Account - View resources by account • Lambda Function - Active functions, associated Log Group, invocation metrics • S3 Bucket - Buckets by account and region

Setup

Installation

AWS Console

Use our CloudFormation template to automate installing the AWS integration. To install via the AWS Console: 1. Navigate to the CloudFormation console and view existing stacks. 2. Click Create stack. If prompted, select With new resources. 3. Provide the template details: 1. Under Specify template, select Amazon S3 URL. 2. In the Amazon S3 URL field, enter https://observeinc.s3-us-west-2.amazonaws.com/ cloudformation/collection.yaml. 3. Click Next to continue. (You may be prompted to view the function in Designer. Click Next again to skip.) 4. Specify the stack details: 1. In Stack name, provide a name for this stack. It must be unique within a region, and is used to name created resources. 2. Under Required Parameters, provide your Customer ID in ObserveCustomer and ingest token in Ob- serveToken. 3. Click Next

5.4. Endpoints 45 Observe

5. Under Configure stack options, there are no required options to configure. Click Next to continue. 6. Review your stack options: 1. Under Capabilities, check the box to acknowledge that this stack may create IAM resources. 2. Click Create stack

Video instructions

Alternatively, you can deploy the CloudFormation template using the awscli utility:

Caution: If you have multiple AWS profiles, make sure you configure the appropriate AWS_REGION and AWS_PROFILE environment variables in addition to OBSERVE_CUSTOMER and OBSERVE_TOKEN.

$ curl -Lo collection.yaml https://observeinc.s3-us-west-2.amazonaws.com/cloudformation/

˓→collection.yaml $ aws cloudformation deploy --template-file ./collection.yaml \ --stack-name ObserveLambda \ --capabilities CAPABILITY_NAMED_IAM \ --parameter-overrides ObserveCustomer="${OBSERVE_CUSTOMER?}" ObserveToken="$

˓→{OBSERVE_TOKEN?}"

Terraform

You may also use our Terraform module to install the AWS integration and created the needed Kinesis Firehose delivery stream. The following is an example instantiation of this module: module "observe_collection"{ source= "github.com/observeinc/terraform-aws-collection" observe_customer= "${OBSERVE_CUSTOMER}" observe_token= "${OBSERVE_TOKEN}" }

We recommend that you pin the module version to the latest tagged version.

FAQ

Where are the integration’s forwarders located?

All resources are created in the region where you installed the AWS Integration, such as us-east-1. They are named based on the CloudFormation stack name or Terraform module name you provided. For example, a CloudFormation stack called Observe-AWS-Integration would result in names like: • Lambda function Observe-AWS-Integration • S3 bucket observe-aws-integration-bucket-1a2b3c4d5e • Kinesis Firehose delivery stream Observe-AWS-Integration-Delivery-Stream-1a2b3c4d5e

Note: To ensure the generated resources comply with AWS naming rules, your stack or module name should contain only:

46 Chapter 5. Data ingestion Observe

• Letters (A-Z and a-z) • Numbers (0-9) • Hyphens (-) • Maximum of 30 characters

How do I collect data from multiple regions?

The Observe AWS integration operates on a per-region basis because some sources, such as CloudWatch metrics, are specific to a single region. For multiple regions, we recommend installing the integration in each region. Youmaydo this with a CloudFormation StackSet, or by tying the Terraform module into your existing manifests.

What permissions are required?

The integration periodically queries the AWS API for information about certain services. To do this, its Lambda function has permissions to execute the following actions: • dynamodb:List* • dynamodb:Describe* • ec2:Describe* • ecs:List* • ecs:Describe* • elasticache:Describe* • elasticloadbalancing:Describe* • firehose:List* • firehose:Describe* • iam:Get* • iam:List* • lambda:List* • logs:Describe* • rds:Describe* • redshift:Describe* • route53:List* • s3:List* You may change these permissions if needed. If the Lambda function does not have permission for a particular service, it will not collect that information. The integration S3 bucket is subscribed to the Observe Lambda, with permissions that allow other AWS services to write to it. For example, ELB access logs or VPC flow logs.

5.4. Endpoints 47 Observe

Kubernetes

Installation

Getting started

To proceed with this step, you will need a customer ID and token. Observe provides a manifest which installs all the necessary components for collecting telemetry data from Kubernetes. This manifest can be retrieved directly from https://api.observeinc.com/v1/kubernetes/manifest. At its simplest, the install process can be reduced to two steps: $ kubectl apply -f https://api.observeinc.com/v1/kubernetes/manifest&& \ kubectl -n observe create secret generic credentials \ --from-literal=customer=${OBSERVE_CUSTOMER?} \ --from-literal=token=${OBSERVE_TOKEN?}

This example is for illustrative purposes only. In production environments, we recommend downloading the manifest separately and tracking changes over time using your configuration management tool of choice. If you have a preferred installation process which you would like us to support, please let us know on slack. By omission, our manifest will create an observe namespace which contains all our collection infrastructure. Only then can we create a secret containing the appropriate credentials for sending data to Observe. If you are monitoring multiple clusters, it is useful to provide a human readable name for each one. You can attach an identifier by providing a observeinc.com/cluster-name annotation: $ kubectl annotate namespace observe observeinc.com/cluster-name="My Cluster"

Validating installation

Once your manifest is applied, you can wait for all pods within the namespace to be ready: $ kubectl wait pods -n observe --for=condition=Ready --all

To verify data is streaming out correctly, you can check the egress logs going through the proxy: $ kubectl logs -n observe -l name=observe-proxy 172.20.59.66 - - [01/Jul/2020:17:33:44 +0000] "POST /v1/http/kubernetes/logs HTTP/1.1"␣

˓→202 11 172.20.51.151 - - [01/Jul/2020:17:33:44 +0000] "POST /v1/http/kubernetes/events HTTP/1.1

˓→" 202 11 172.20.51.151 - - [01/Jul/2020:17:33:44 +0000] "POST /v1/http/kubernetes/logs HTTP/1.1"␣

˓→202 11 172.20.46.135 - - [01/Jul/2020:17:33:45 +0000] "POST /v1/http/kubernetes/logs HTTP/1.1"␣

˓→202 11 172.20.59.52 - - [01/Jul/2020:17:33:45 +0000] "POST /v1/http/kubernetes/logs HTTP/1.1"␣

˓→202 11

You should see requests logged in Apache access log format. All requests are forwarded towards /v1/http/ kubernetes/*. If your credentials are incorrect, you may see 401 status codes.

48 Chapter 5. Data ingestion Observe

Overriding defaults

Our manifest is derived from a template which is populated at query time. You can override certain settings through the use of query parameters. For example, to instead install our agent to the kube-system namespace, you would override the namespace parameter: $ kubectl apply -f https://api.observeinc.com/v1/kubernetes/manifest?namespace=kube-

˓→system

To override multiple parameters: $ kubectl apply -f 'https://api.observeinc.com/v1/kubernetes/manifest?namespace=kube-

˓→system&multiline=true'

The following table documents accepted query parameters:

Pa- Default Description rame- ter ver- 0.0.2 Manifest version. You may want to specify this value explicitly to avoid breaking changes sion when applying updates. names- observe Namespace used for install. The manifest will create a new namespace only if its name starts pace with observe. All other values will be assumed to belong to an existing namespace. col- col- API endpoint to send data to lector lect.observeinc.com coor- true Enable use of lease locks. This must be disabled for Kubernetes versions older than 1.14. dina- tion prometheus Port number for prometheus proxying. This feature is experimental and disabled by default. Please contact support for more information. zipkin Port number for zipkin proxying. This feature is experimental and disabled by default. Please contact support for more information. star- false Enable the use of a startupProbe that does end-to-end validation that data can be submitted to tup- Observe before marking a proxy as ready. This is only supported in Kubernetes 1.16 onward. Probe prox- 1 Number of replicas in proxy deployment. If you hit resource limits, we recommend increasing yRepli- the number of replicas. cas multi- false Enable multiline log parsing. When set to true, log records beginning with a whitespace are line coalesced with the previous message. This option is disabled by default as the heuristic used is susceptible to false positives. otel false Include an OpenTelemetry agent container as part of the observe-agent daemonset. This option is disabled by default. otelVer- latest Specify a version of the OpenTelemetry agent to use. The default is the most recent version. sion

5.4. Endpoints 49 Observe

Uninstalling

The cleanest way to remove our integration is to delete all resources included in the original installation manifest. You must include any query parameters you provided on install in order to delete the correct set of resources: $ kubectl delete -f https://api.observeinc.com/v1/kubernetes/manifest

Manifest contents

Collection architecture

By default, the manifest creates an observe namespace, within which there are two main components: • the observe-proxy deployment and corresponding proxy service, through which all data egresses the cluster • the observe-agent daemonset, which is responsible for collecting data

All data collected by Observe goes through a proxy service, which maps to pods in the observe-proxy deployment. The proxy modifies all requests towards Observe with: • authentication data provided in the credentials secret • a clusterId tag to all observation data This avoids the need for configuring credentials for every process wishing to post data to Observe, and ensures thatro- tating credentials can be achieved through a single deployment rollout. Any request sent to http://proxy.observe. svc.cluster.local will be forwarded towards https://collect.observeinc.com with the appropriate Autho- rization header and cluster ID. A final advantage of decoupling where data is collected from where data egresses the cluster is that more finegrained network policies can be applied. For example, we can restrict external network access within a cluster to a subset of nodes without affecting data collection. The observe-agent daemonset ensures our collectors run on every node. It is responsible for collecting container logs, kubernetes state changes, and kubelet metrics.

50 Chapter 5. Data ingestion Observe

Container log collection

Container logs are written to the /var/logs/containers on the host node’s filesystem. The observe-agent dae- monset runs a fluentbit container, which is responsible for reading all files within this directory, and parsing out metadata from the log filenamepodName ( , containerName, containerId, etc). This stream of data is batched and shipped to the /v1/http/kubernetes/logs endpoint. As a result, container logs will show up in Observe with kind http and path /kubernetes/logs. The fluentbit container is additionally configured to track the current state to a SQLite3 database mounted onthehost node. This allows log processing to continue seamlessly across pod restarts, allowing any update to the observe-agent daemonset to be rolled out safely.

5.4. Endpoints 51 Observe

Kubernetes State Changes

The Kubernetes API allows watching for changes to any resource type. Our agent runs a kubelog container which subscribes to all resource changes, and emits them in JSON format. Rather than submit this data directly to the proxy, we instead write the data to the fluentbit container in the same pod. This allows us to reuse the same batching and retry logic we have in place for shipping container logs. If every node were to run kubelog concurrently, we would get multiple copies of the same set of events. Instead, we ensure that only one kubelog is running at any given time through the use of a Kubernetes LeaseLock. This is more convenient than managing a separate deployment, since it reduces the number of moving parts in our manifest and maintains the abstraction of running a single agent per node. The Lease type in the coordination.k8s.io API group was only promoted to v1 in Kubernetes 1.14. For legacy Kubernetes versions, please contact support. Kubernetes API updates are streamed from fluentbit to Observe over the /v1/http/kubernetes/events HTTP endpoint. As a result, this data will appear in the Observation table with kind http and path /kubernetes/events.

Kubelet metrics

This collection method is experimental and disabled by default The kubelet agent runs on every node, and exposes a set of metrics over an HTTPS endpoint. If metrics collection is enabled, the observe-agent pod will have an additional telegraf container. This container will periodically poll kubelet for metrics, and submit the data directly to the proxy under /v1/http/kubernetes/telegraf.

52 Chapter 5. Data ingestion Observe

FAQ

What Kubernetes versions do you support?

Kubernetes maintains a concept of “supported versions”, which are the three most recent minor releases. For example, if the most recent release is 1.18, the supported Kubernetes versions are 1.18, 1.17 and 1.16:

Our support policy is as follows: • our default manifest targets supported Kubernetes releases • future releases should work, but may not be production ready • older releases work, but may require configuration overrides • the oldest release we support is the oldest version supported by EKS (https://docs.aws.amazon.com/eks/latest/ userguide/kubernetes-versions.html) As of July 2020, the oldest release we officially support is 1.13. In order to do so, you will need to disable theuseof lease locks (?coordination=false) when generating a manifest.

What container runtimes do you support?

The container runtime only affects log collection. Our current fluentbit configuration has been validated on both docker and containerd. Other runtimes or older versions may require minor configuration adjustments - please reach out to support for assistance.

Can collection be restricted to a specific namespace?

We do not currently support this. Collecting kubernetes state in particular requires accessing resources that are not namespaced, such as node and persistentvolume. We may be able to restrict log collection to specific namespaces - if you are interested in this feature please contact support.

5.4. Endpoints 53 Observe

How can I disable scheduling the agent on a specific node?

In order to provide full coverage of your cluster, the observe-agent daemonset is by design scheduled onto all nodes. If you wish to remove it from a subset of nodes, you can add a taint: $ kubectl taint nodes ${NODENAME?} observeinc.com/unschedulable

This taint is only verified during scheduling. If an observe-agent pod is already running on the node, you will haveto delete it manually: $ kubectl delete pods -n observe -l name=observe-agent --field-selector=spec.nodeName=$

˓→{NODENAME?}

Retry on failure

Fluent Bit retries on 5XX and 429 Too Many Requests errors. It will stop reading new log data when its buffer fills and resume when possible. kubelog memory usage will increase, however. In the event of extended failures, you may experience kubelog out of memory errors. Fluent Bit does not retry on other 4XX errors.

5.4.2 Sources

A source generates data to be ingested into Observe. This section documents collection strategies for many potential sources of telemetry. This list is not exhaustive - any source can be supported if it is capable of pushing data directly to a supported endpoint, or through an existing forwarder.

Amazon API Gateway logs

Observe supports ingesting Amazon API Gateway logs via CloudWatch and the Observe Lambda forwarder.

For access logs: create a CloudWatch log group

API Gateway access logs use a CloudWatch log group. To create one: 1. Follow the directions at Create a log group in CloudWatch Logs in the AWS documentation. 2. Note the ARN for this log group, as you will need it in a later step.

Configure API Gateway for logging

Following Setting up CloudWatch logging for a REST API in API Gateway in the AWS documentation, configure logging and grant API Gateway permission to send logs to CloudWatch. Summary of steps: To enable API Gateway logging:

54 Chapter 5. Data ingestion Observe

1. In the IAM console, create an API Gateway role with the AmazonAPIGatewayPushToCloudWatchLogs policy. 2. In the API Gateway console settings, configure the CloudWatch log role ARN with the ARN of this role. To configure logging for your API:

1. In the API Gateway console, navigate to the desired stage of your API and click the Logs/Tracing tab. 2. In CloudWatch Settings: • Enable the desired CloudWatch logs (execution logs). 3. In Custom Access Logging: • Enable access logging by providing the log group ARN and the desired log format. 4. Click Save Changes See the AWS documentation for further details.

5.4. Endpoints 55 Observe

Install the Observe Lambda forwarder

If needed, install the Observe Lambda forwarder. If you are already using the Lambda forwarder for another source, you do not need to install it again. Following the instructions at AWS CloudWatch Logs, create a Lambda subscription filter. As your API handles requests, API Gateway sends execution and access logs to CloudWatch, and then the Observe Lambda forwarder sends them to Observe.

Amazon S3 Access Logs

Ingest S3 bucket access logs using the Observe Lambda forwarder.

Enable S3 access logging

S3 bucket access logging is disabled by default. If needed, first enable logging for the desired bucket: 1. Navigate to S3 in the AWS Console 2. Select the bucket you’d like to get access logs for 3. Click on “Properties” 4. Under “Server access logging”, click “Edit” 5. Select “Enable” and provide the log destination bucket in “Target bucket” 6. Click “Save changes”

56 Chapter 5. Data ingestion Observe

See the AWS access logging documentation for full details.

Forward logs using Lambda

If needed, install the Observe AWS Integration or the standalone Observe Lambda forwarder following the instructions in the documentation. If you are already using the Lambda forwarder, you do not need to install it again. If you are installig it for the first time, consider the AWS Integration to easily ingest additional AWS data. For each log bucket (“Target bucket”), add a trigger so the forwarder can send access logs as they are generated. 1. Navigate to Lambda in the AWS Console 2. Select the Observe Lambda function (created by the forwarder or integration installation process) 3. Select “Add Trigger”, then search for “S3”

4. Configure the trigger with the following settings: • Bucket: the log bucket • Event type: the desired events to send, such as “All object create events” • Prefix or Suffix if desired (optional) 5. Click “Add” to save.

Note: S3 access logs may take some time to be created in the target bucket. For details, see the AWS documentation about best-effort delivery.

5.4. Endpoints 57 Observe

AWS AppSync

AWS AppSync simplifies creating GraphQL APIs by managing connections to multiple databases, microservices, and APIs. Send its logs to Observe using CloudWatch and the Observe Lambda forwarder.

Enable logging

Follow the AWS AppSync Setup and Configuration documentation to enable logging to CloudWatch.

Install the Lambda forwarder

If needed, install the Observe AWS Integration or the standalone Observe Lambda forwarder following the instructions in the documentation. If you are already using the Lambda forwarder, you do not need to install it again. If you are installig it for the first time, consider the AWS Integration to easily ingest additional AWS data.

Add a Lambda trigger for the CloudWatch Log Group

Follow the Observe CloudWatch Logs configuration documentation to set up a subscription filter for your Log Group.

AWS CloudTrail

AWS CloudTrail monitors AWS account activity, publishing its logs to an S3 bucket you specify. Send its logs to Observe using the Observe Lambda forwarder.

Note: CloudTrail ingest is a component of the Observe AWS integration. If you have installed this integration, you do not need to configure CloudTrail ingest separately.

Create a new Trail

1. Navigate to the CloudTrail console 2. Click Create Trail 3. Configure the Trail attributes with the settings below:

Trail name A name for the new Trail, following the AWS CloudTrail Trail Naming Requirements Storage location The bucket to log to, either new or existing. If you choose an existing bucket, its policy must grant CloudTrail permission to write to it. Log file SSE-KMS Uncheck to disable Encryption Log file validation Enabled SNS notification Disabled delivery CloudWatch Logs Optional Tags Optional

58 Chapter 5. Data ingestion Observe

4. Click Next to choose the type of events you would like to send to Observe.

Warning: For Data events, ensure you only select Write events. Selecting Read events causes the Lambda forwarder to trigger on its own Read events, resulting in an endless read/write loop.

5. Click Next to review your configuration, then Create Trail to save. For more about configuring CloudTrail, see Creating a trail in the AWS CloudTrail documentation. If you would like to use SNS, please see the CloudTrail documentation for more information.

Install the Lambda forwarder

If needed, install the Observe Lambda forwarder following the instructions in its documentation. If you are already using the Lambda forwarder for another source, you do not need to install it again.

Add a Lambda trigger for the bucket

In the AWS Lambda Console: 1. Navigate to Functions 2. Select the observe-lambda function 3. Select Add Trigger 4. Select S3 from the list 5. Configure the trigger with the settings below:

Bucket The name of the CloudTrail bucket Event Type All object create events Prefix Optional Suffix Optional

6. Click Add to create the trigger. For additional details about Lambda and S3, see Using AWS Lambda with Amazon S3 in the AWS documentation. Note that an S3 bucket may only have one trigger of each type. If this is an issue, you may wish to use a new bucket. For more about which AWS services send events to CloudTrail, see CloudTrail Supported Services and Integrations.

Amazon EventBridge

Amazon EventBridge is a serverless event bus service, previously called CloudWatch Events. Many AWS services send status events to EventBridge, which may then be consumed by custom applications or streamed directly to a third-party. EventBridge data is useful for reconstructing the state of AWS resources over time.

Note: EventBridge ingest is a component of the Observe AWS integration. If you have installed this integration, you do not need to configure EventBridge ingest separately.

To configure EventBridge ingest without the Observe AWS integration, follow the steps below.

5.4. Endpoints 59 Observe

Collection methods

Observe supports several methods of collecting AWS EventBridge events: 1. Send to an Amazon Kinesis Firehose delivery stream. This is the recommended method for most use cases. 2. Configure Observe as an API destination. This is useful for low event volumes. 3. Use the Observe Lambda forwarder. Helpful if you already use the Lambda forwarder for other sources.

Kinesis Firehose

If you are already ingesting via a Kinesis Firehose delivery stream, you may send additional traffic over the same stream to improve batch delivery latency. This also uses the same retransmission policies configured for the existing delivery stream.

AWS Console

1. Navigate to the the EventBridge Console and view existing Event Rules. 2. Click Create Rule 3. Under Name and description, provide a Name 4. Under Define pattern, select Event Pattern and then Pre-defined pattern by service 5. For Service provider, select All Events from the dropdown. 6. Under Select targets, search for target Firehose delivery stream. 7. In the Stream dropdown, select your delivery stream that is configured to send to Observe. 8. Click Create

Video instructions

Terraform

The Observe Kinesis Firehose Terraform module provides a helper submodule for configuring a delivery stream asa target for multiple rules. Instantiate a firehose by setting the appropriate values for observe_customer and observe_token and providing a list of rules to forward data for: module"observe_kinesis_firehose"{ source="github.com/observeinc/terraform-aws-kinesis-firehose?ref=main"

name="observe-kinesis-firehose" observe_customer= var.observe_customer observe_token= var.observe_token }

module"observe_firehose_eventbridge"{ source="github.com/observeinc/terraform-aws-kinesis-firehose//eventbridge" kinesis_firehose= module.observe_kinesis_firehose iam_name_prefix= var.name (continues on next page)

60 Chapter 5. Data ingestion Observe

(continued from previous page) rules=[ aws_cloudwatch_event_rule.example, ] }

API destination

EventBridge supports sending events directly to an HTTP endpoint. This avoids the cost of triggering a lambda for every event. If you are not already using the Observe Lambda for another source, you may prefer to configure an API destination.

AWS Console

1. Navigate to the the EventBridge Console and view existing Event Rules. 2. Click Create Rule. 3. Under Name and description, provide a Name. 4. Under Define pattern, configure the desired pattern: 1. Select Event Pattern. 2. Select Pre-defined pattern by service. 3. Select All Events from the Service provider dropdown. 5. Under Select targets, configure the desired target: 1. Search for target API Destination in the Target dropdown. 2. Select Create New API Destination and provide a name. 3. In API destination endpoint, enter https://collect.observeinc.com/v1/http/aws/ eventbridge 4. For HTTP method, select POST. 5. Set the Invocation rate limit per second to 300. 6. Select Create a new connection and provide a Connection name. 7. For Authorization type, select Basic 8. Provide your customer ID for Username and your ingest token for Password 6. Click Create

Video instructions

Terraform

We do not provide a Terraform module for configuring an API destination. If you are using Terraform, we recommend using Kinesis Firehose instead.

5.4. Endpoints 61 Observe

Amazon GuardDuty

Amazon GuardDuty monitors AWS accounts for unusual activity and reports security findings. Send these findings to Observe using Amazon EventBridge and Kinesis Data Firehose.

Enable GuardDuty

Enable GuardDuty in your desired AWS regions, following the Amazon GuardDuty documentation. There is no need to enable exporting to an S3 bucket.

Forward from EventBridge to Kinesis Firehose

The recommended method to ingest findings is from GuardDuty to EventBridge, then from EventBridge to aKinesis Data Firehose delivery stream, and then to Observe.

Create a delivery stream

If needed, follow the instructions at Amazon Kinesis Firehose to create a delivery stream that sends to Observe. If you have installed the Observe AWS Integration, you can send to its delivery stream instead of creating a new one.

Create an EventBridge rule

Following the instructions at Creating Amazon EventBridge rules that react to events, create an EventBridge rule to send findings from EventBridge to Kinesis Firehose. Configure the rule as appropriate for your environment, with the pattern to match and target as described below: Under Define pattern, configure the following options:

62 Chapter 5. Data ingestion Observe

• Select Event pattern to build a pattern to match events • Under Event matching pattern, select Pre-defined pattern by service. • For the Service provider, select AWS • For Service name, select GuardDuty • For Event type, select GuardDuty Finding Under Select targets:

5.4. Endpoints 63 Observe

• For Target, select Firehose delivery stream from the menu • For Stream, select your desired stream As findings are generated, GuardDuty exports them to this delivery stream, which forwards them to Observe.

CloudWatch metrics streams

Observe supports ingesting metrics data from CloudWatch Metrics Streams using an Amazon Kinesis Firehose for- warder. Follow the steps below to send CloudWatch metrics for the services you use to Observe.

Note: CloudWatch metrics ingest is a component of the Observe AWS integration. If you have installed this integration, you do not need to configure CloudWatch metrics ingest separately.

Create a Kinesis Firehose delivery stream

First, see the Amazon Kinesis Firehose forwarder documentation for the steps to create a new delivery stream. This sends data from Kinesis Firehose to the Observe Kinesis HTTP endpoint as a “Direct PUT” source.

Create a CloudWatch metric stream

Next, create a metric stream that sends data to the Firehose delivery stream you just made. In the AWS CloudWatch console, under Metrics, select Streams to go to the metric streams page. Click Create stream to start configuring a new metric stream. 1. Under Metrics to be streamed, choose one of the two available ways to create a list of namespaces: • Select All metrics to send metrics from all namespaces, optionally excluding some or • Select Selected namespaces to choose namespaces individually. 2. Under Configuration:

64 Chapter 5. Data ingestion Observe

• Choose Select an existing Firehose owned by your account • Select the delivery stream from the list

• Select JSON output format 3. Under Custom metric stream name, optionally choose a custom name for this metric stream. 4. Click Create metric stream to create the stream.

AWS CloudWatch Logs

Use Amazon CloudWatch Logs to monitor, store, and access your log files from Amazon Elastic Compute Cloud (EC2) instances, AWS CloudTrail, Route 53, and other sources. Send those logs to Observe with one of the options described below.

Collection methods

Observe supports three methods of collecting AWS CloudWatch Logs.

5.4. Endpoints 65 Observe

To choose the right method, consider the needs of your environment: 1. For quick setup, use the Lambda forwarder, available as part of the Observe AWS Integration. 2. For production traffic, evaluate AWS Kinesis Firehose. A pre-configured Firehose delivery stream is included with the AWS Integration. 3. For higher throughput or sending data to multiple upstreams, use AWS Kinesis Data Streams.

Observe Lambda

Configuring Observe’s lambda function is the simplest way of getting started, incurs the lowest end-to-end latency,and is competitively priced for lower data volumes. Our lambda can handle multiple sources, and submits data to Observe using the http endpoint. A downside of using our lambda is it will not retry on failure. This avoids prolonging execution time which would impact cost. In the event of network failures or an outage on our end, the data will not be resubmitted.

AWS Console

CloudWatch Logs allows you to set up Subscription filters which forward logs within a Log Group to a destination. 1. Install the Observe Lambda, either as a standalone forwarder or as part of our AWS integration. 2. Navigate to CloudWatch and view your Log Groups. 3. Select the Log Group you would like to send to Observe. 4. Click Actions and select Create Lambda subscription filter. 5. Under Choose destination, select your Observe Lambda function. 6. Under Configure log format and filters, select Log format Other. This forwards all logs. 7. In Subscription filter name provide a name for this filter. The name is used to identify the subscription within the context of the log group. 8. Click Start Streaming

66 Chapter 5. Data ingestion Observe

Video instructions

Terraform

The observe_lambda Terraform module provides a helper submodule for subscribing log groups to the Observe Lambda forwarder. Instantiate the Lambda forwarder by providing values for observe_customer and observe_token and the list of log groups you would like to forward: module"observe_lambda"{ source="github.com/observeinc/terraform-aws-lambda" observe_customer= var.observe_customer observe_token= var.observe_token } module"observe_lambda_cloudwatch_logs_subscription"{ source="github.com/observeinc/terraform-aws-lambda//cloudwatch_logs_subscription" lambda = module.observe_lambda.lambda_function log_group_names=[ "your-log-group-name" ] }

Kinesis Firehose

AWS Kinesis Firehose backs up data in S3 in the case of failed delivery. It is also more cost effective than Lambda functions for higher event volumes. Since data is batched, there can be a significant buffering delay sending low volumes of data. For this reason, we recommend starting with the Lambda forwarder. Then evaluate your traffic profile and requirements to determine if AWS Kinesis Firehose is appropriate for your environment.

AWS Console

CloudWatch Logs uses Subscription filters to send logs to a Kinesis Firehose delivery stream, which then forwardsto Observe. 1. If needed, create a Kinesis Firehose delivery stream, either as a standalone forwarder or as part of the Observe AWS integration.

Note: If you created a delivery stream using one of our CloudFormation templates, you do not need to create an additional stream. The name of the role includes the stack that created it, for example ObserveAWSIntegration-CloudWatchLogsRole-1A2B3C4D5E.

2. Configure the log group to send to this delivery stream: 1. Navigate to CloudWatch Log Groups. 2. Select the Log Group you would like to export to Observe. 3. From the Actions dropdown, select Subscription filters and Create Kinesis Firehose subscription filter. 4. Under Choose destination, search for your desired Kinesis Firehose delivery stream. Leave the Desti- nation account at Current account.

5.4. Endpoints 67 Observe

5. Select a Kinesis Firehose delivery stream to send to. 3. Set the necessary permissions: 1. Under Grant permission, select an IAM role that permits CloudWatch Logs to write to your delivery stream. 2. Under Configure log format and filters, provide a name for this filter. The name is used to identify the subscription within the context of the log group. 4. Click Start Streaming

Video instructions

Terraform

The observe_kinesis_firehose Terraform module provides a helper submodule for subscribing log groups to the Observe Lambda forwarder. Instantiate a delivery stream by providing values for observe_customer and observe_token and the list of log groups you would like to forward: module"observe_kinesis_firehose"{ source="github.com/observeinc/terraform-aws-kinesis-firehose" observe_customer= var.observe_customer observe_token= var.observe_token } module"observe_kinesis_firehose_cloudwatch_logs_subscription"{ source="github.com/observeinc/terraform-aws-kinesis_firehose//cloudwatch_

˓→logs_subscription" kinesis_firehose= module.observe_kinesis_firehose log_group_names=[ "your-log-group-name" ] }

GitHub

Github allows for the creation of webhook triggers at the organization, repository or app level. Further details regarding the specifics of what GitHub makes available is detailed intheir Webhook Documentation To create a GitHub webhook to send events to Observe you will need: • A GitHub account and organistation or repository • An Observe ingest token

68 Chapter 5. Data ingestion Observe

Setting up the webhook

1. For a repo or organization, go to settings 2. Click Webhooks and then Add 3. Payload URL is to our collector using basic auth, replace CUSTOMER_ID and TOKEN with your Customer ID and Ingest Token respectivly the /github at the end allows observe to identify this data as coming from GitHub: 1. https://CUSTOMER_ID:[email protected]/v1/http/github 4. Set the Content Type to application/json 5. Leave Secret blank, this value is not used by Observe 6. Enable SSL verification 7. Decide what events you want, it is okay to pick everything, these events can be modeled in Observe post ingestion 8. Make sure active is checked. 9. Click Add Webhook

Verify GitHub data is being ingested

There are two places to validata, the first is to check GitHub has successfully send the data, and second is tocheckdata is being recieved in Observe. To check that GitHub is sending the data: 1. For a repo or organization, go to Settings 2. Click Webhooks 3. Click the entry for the webhook created above 4. Scroll to Recent Deliveries and check the recent calls recived a 202 response code To check in Observe: 1. Log into Observe and open the Observation event stream in a worksheet 2. Open the OPAL console and apply the following filters: filter OBSERVATION_KIND="http" filter (string(EXTRA.path)="/github")

3. Verify GitHub exists

Google Workspace Audit Logs

Ingest Google Workspace Audit logs into Observe To do this, you will need: • Administrator access to your Google Workspace instance • A GCP organization • A GCP user with logging.sinks.create create permissions for your GCP organization • GCP command line tools installed • An Observe ingest token

5.4. Endpoints 69 Observe

The following are the main steps to get Google Workspace Audit logs into Observe: 1. Setup GSuite to send audit logs to GCP 2. Create a PubSub Topic and Subscription to send logs to Observe 3. Create a Sink to send Google Workspace Audit Logs from GCP Logging to PubSub

Detailed Steps

1. Setup Google Workspace to send audit logs to GCP. 1. Google Workspace can be configured to send the following logs to GCP, the specifics around whichlogs and events get send to GCP depend on your Google Workspace subscription, but in all case the steps to achieve this are the same. 2. Follow the steps in Share data with services 2. Create a PubSub Topic and Subscription to send logs to Observe 1. Log in to GCP 2. Select a project, or create one 3. Go to PubSub > Topics 4. Create a new Topic, for example GWorkspaceTopic. Unselect the Add a default Subscription box

5. Create a subscription following the steps in Google Cloud Pub/Sub. 3. Create a Sink to send Google Workspace Audit logs from GCP Logging to PubSub 1. This operation has to be performed using the gcloud command. Google Workspace Audit logs are stored at the organization level and not at a project level, so can not be configured through the GCP console. 2. As a user that has logging.sinks.create permissions execute this command, replacing: • with your GCP organization ID • with the project ID which contains the PubSub topic • with the name of the topic created earlier

70 Chapter 5. Data ingestion Observe

gcloud logging sinks create observe-audit-sink \ pubsub.googleapis.com/projects//topics/\ --include-children--organization=\ --log-filter='logName:”organizations//logs/cloudaudit.googleapis.

˓→com”'

For example, if: • organization_id is 123456789 • topic_project_id is lunar-magic-24780 • topic_name is GWorkspaceTopic The command would look as follows: gcloud logging sinks create observe-audit-sink \ pubsub.googleapis.com/projects/lunar-magic-24780/topics/GWorkspaceTopic \ --include-children--organization=123456789\ --log-filter='logName:”organizations/123456789/logs/cloudaudit.googleapis.com”'

Verify Google Workspace data is being ingested

1. In the GCP console, go to Logging > Logs Exporter 2. Filter Log Name in the dropdown to cloudaudit.googleapis.com, selected all files that have this prefix, regardless of suffix 3. Check that new log lines have arrived in GCP 4. Log into Observe and open the Observation event stream in a worksheet 5. Open the OPAL console and apply the following filters. If GWorkspaceSubscription in the final line isn’t the name of the subscription above, change that to match. filter OBSERVATION_KIND="http" filter (string(EXTRA.path)="/pubsub") colmake subscription:string(string(FIELDS.subscription)) filter contains(subscription,"GWorkspaceSubscription")

6. Verify audit data exists

Jenkins Build Logs

Ingest Jenkins build log data with the Jenkins Logstash plugin.

5.4. Endpoints 71 Observe

Requirements

To do this you will need the following information: • Jenkins login • Your Observe Customer ID • Your Observe ingest token • Ability to restart Jenkins

Install the Logstash plugin

The first step is to install the Logstash plugin. 1. Login into Jenkins and go to Manage Jenkins > Manage Plugins, then click the Available tab. 2. Search for Logstash, then click the Enabled check box to the left. 3. Click Download now and Install after restart

Configure the Logstash plugin

Now that the Logstash plugin is installed configure it to point to your Observe instance. 1. Login back into Jenkins and go to Manage Jenkins > Configure System 2. Search for Logstash 3. In the Logstash configuration choose the Elastic Search collector 4. Input the following fields: • URI: https://collect.observeinc.com/v1/http/jenkins • Username: Your customer ID • Password: Your ingest token • MimeType: application/json • Enable Globally: enabled • Use millsecond time time stamps: enabled

72 Chapter 5. Data ingestion Observe

Verify Jenkins build data is being ingested

View the Jenkins build logs in Observe. 1. Trigger a Jenkins build. 2. Login into Observe and open the Observation event stream in a worksheet. 3. Open the OPAL console and apply the following filters: filter OBSERVATION_KIND="http" filter contains(string(EXTRA.path),"jenkins")

4. Verify you are seeing Jenkins build information.

Jira Tickets

Ingest Jira ticket information using the Jira REST API and an outgoing webhook. To do this, you will need: • Your Jira login credentials (NOTE: you will need administrative access) • Your Observe Customer ID • An Observe ingest token

5.4. Endpoints 73 Observe

Configure an outgoing webhook automation rule

View the Jira Automation Rule documentation 1. Login to Jira and go to Settings > Systems > Automation Rules 2. On the Automation Rules page, click Create rule to display a list of triggers. 3. Click Issue created and then Save. 4. From the add component page, select New action from the list. Scroll to the notifications section and choose Send web request. 5. Fill in the HTTP target detail • Webhook URL: https://collect.observeinc.com/v1/http/jira • Header name: Authorization • Header value*: Bearer {Observe Customer ID} {Observe Ingest Token} • HTTP method: POST • Webhook body: issue data

6. Validate the webhook

74 Chapter 5. Data ingestion Observe

• Click to expand the Validate your webhook configuration section • Enter an open ticket or project id to populate the test message • Click Validate • Confirm that the response is HTTP/1.1 202 Accepted 7. Click Save to save the automation rule and webhook configuration • Give your automation rule a name such as “Observe-webhook” • Click Turn it on to enable.

Verify ticket data is being ingested

1. Wait for (or initiate) a ticket status change 2. Login to Observe and open the Observation event stream in a worksheet 3. Open the OPAL console and apply the following filters: filter OBSERVATION_KIND="http" filter contains(string(EXTRA.path),"jira")

4. Verify ticket data exists

5.4. Endpoints 75 Observe

Windows Servers

Observe supports ingesting log data from Windows servers, event logs and application logs on the filesystem are both supported. Fluentd is the recommended forwarder.

Install Fluentd on the Windows system

Install the fluentd forwarder. Update the fluentd configuration file (C:/opt/td-agent/etc/td-agent/td-agent.conf) to add the following sources: #### ## Source descriptions: ## ## ## Filesystem logs ## @type tail @id input_tail tag fslog.#{Socket.gethostname} path C:/logs/observe.log pos_file /var/log/td-agent/tmp/observe.log.pos path_key tailed_path @type regexp expression /^(?.*)$/ ## ## Windows event logs ## (continues on next page)

76 Chapter 5. Data ingestion Observe

(continued from previous page) @type windows_eventlog @id windows_eventlog tag winevent.#{Socket.gethostname} channels application,system,security @type local persistent true path /var/log/td-agent/tmp/winevt.pos

Zendesk Tickets

Ingest Zendesk ticket information by configuring a webhook and trigger in the Zendesk UI. To do this, you will need: • Your Zendesk login credentials • Your Observe Customer ID • An Observe ingest token

Configure an Outgoing Webhook

Zendesk documentation 1. Login to Zendesk and go to Admin > Settings > Extensions 2. On the Extensions page, click “add target” to display a list of target types. Click “HTTP target” 3. Fill in the HTTP target detail • Title: ObserveWebhook • Url: https://collect.observeinc.com/v1/http/zendesk • Method: POST • Content type: JSON • Basic Authentication: – Check “Enabled” – Username: your Observe Customer ID – Password: your Observe Ingest Token

5.4. Endpoints 77 Observe

4. Test the webhook • Select “Test target” from the dropdown menu • Click Submit • In the JSON body text field, provide a valid JSON body (like {"message":"Hello Zendesk!"}) • Click Submit • Confirm that the response is HTTP/1.1 202 Accepted

78 Chapter 5. Data ingestion Observe

If the JSON body text is highlighted in red, this means you did not get a success response. Correct your config- uration and try again. 5. Save the webhook configuration • Select “Create target” from the dropdown menu • Click Submit

Configure the trigger in Zendesk

Zendesk documentation 1. Go to Admin > Business Rules > Triggers 2. In the Triggers page, click “Add trigger” 3. Fill in the trigger detail • Trigger name: Observe Trigger • Description: Send webhook to Observe • Category: Notifications • Conditions: – Under “Meet ANY of the following conditions”, select the desired condition. Example: “Status Changed” • Actions: – Select “Notify target” in the left dropdown menu – Select the name of your Observe webhook in the right dropdown menu. Example: “ObserveWebhook” – In the JSON body text field, provide the desired payload Example: Sample JSON { "title":"{{ticket.title}}", "description":"{{ticket.description}}", "url":"{{ticket.url}}", "id":"{{ticket.id}}", "external_id":"{{ticket.external_id}}", "via":"{{ticket.via}}", "status":"{{ticket.status}}", "priority":"{{ticket.priority}}", "requester":"{{ticket.requester.details}}" }

(Click “View available placeholders” to see a list of available fields – Click Create

5.4. Endpoints 79 Observe

80 Chapter 5. Data ingestion Observe

Verify ticket data is being ingested

1. Wait for (or initiate) a ticket status change 2. Login to Observe and open the Observation event stream in a worksheet 3. Open the OPAL console and apply the following filters: filter OBSERVATION_KIND="http" filter contains(string(EXTRA.path),"zendesk")

4. Verify ticket data exists

5.4.3 Forwarders

A forwarder is a process that collects data from a source and forwards it to a given destination. Forwarders may perform additional functions, such as: • aggregation of data from multiple inputs • filtering and processing of data in flight • routing to multiple destinations • buffering and retransmission handling This section documents how to configure different forwarders to stream data to Observe.

5.4. Endpoints 81 Observe

Amazon Kinesis Firehose

Amazon Kinesis Data Firehose allows you to reliably deliver streaming data from multiple sources within AWS. Ob- serve supports ingesting data through our Kinesis HTTP endpoint.

Note: If you would like to ingest a Kinesis Data Stream, see Kinesis Data Stream to Observe for information about configuring a Data Stream source using Terraform.

Setup

Installation

AWS Console

Use our CloudFormation template to automate creating a Kinesis Firehose delivery stream to send data to Observe. To install via the AWS Console: 1. Navigate to the CloudFormation console and view existing stacks. 2. Click Create stack. If prompted, select With new resources. 3. Provide the template details: 1. Under Specify template, select Amazon S3 URL. 2. In the Amazon S3 URL field, enter https://observeinc.s3-us-west-2.amazonaws.com/ cloudformation/firehose.yaml. 3. Click Next to continue. (You may be prompted to view the function in Designer. Click Next again to skip.) 4. Specify the stack details: 1. In Stack name, provide a name for this stack. It must be unique within a region, and is used to name created resources. 2. Under Required Parameters, provide your Customer ID in ObserveCustomer and ingest token in Ob- serveToken. 3. Click Next 5. Under Configure stack options, there are no required options to configure. Click Next to continue. 6. Review your stack options: 1. Under Capabilities, check the box to acknowledge that this stack may create IAM resources. 2. Click Create stack

82 Chapter 5. Data ingestion Observe

Video instructions

Alternatively, you can deploy the CloudFormation template using the awscli utility:

Caution: If you have multiple AWS profiles, make sure you configure the appropriate AWS_REGION and AWS_PROFILE environment variables in addition to OBSERVE_CUSTOMER and OBSERVE_TOKEN.

$ curl -Lo firehose.yaml https://observeinc.s3-us-west-2.amazonaws.com/cloudformation/

˓→firehose.yaml $ aws cloudformation deploy --template-file ./firehose.yaml \ --stack-name ObserveLambda \ --capabilities CAPABILITY_NAMED_IAM \ --parameter-overrides ObserveCustomer="${OBSERVE_CUSTOMER?}" ObserveToken="$

˓→{OBSERVE_TOKEN?}"

Terraform

You may also use our observe_kinesis_firehose Terraform module to create a Kinesis Firehose delivery stream. The following is an example instantiation of this module: module "observe_kinesis_firehose"{ source= "github.com/observeinc/terraform-aws-kinesis-firehose"

name= "observe-kinesis-firehose" observe_customer= "${OBSERVE_CUSTOMER}" observe_token= "${OBSERVE_TOKEN}" }

We recommend that you pin the module version to the latest tagged version.

Getting Started

You can now use your Kinesis Firehose delivery stream to collect a variety of sources: • CloudWatch Logs • CloudWatch Metrics Stream • EventBridge

FAQ

Retry on failure

Amazon Kinesis Firehose supports retries with the Retry duration time period. If a request fails repeatedly, the contents are stored in a pre-configured S3 bucket. See the Amazon Kinesis Firehose data delivery documentation for more information.

5.4. Endpoints 83 Observe

Elastic Beats

Elastic has a collection of data forwarders, known as Beats, built on a common underlying library. Data from Beats can be ingested into Observe using the Elastic endpoint

Installation

Before you start, ensure you are running the Apache2 OSS licensed version of the appropriate Beat. The corresponding binaries can be downloaded from the following links: • Filebeat • Packetbeat • Winlogbeat • Metricbeat • Heartbeat • Auditbeat • Journalbeat

Note: Functionbeat does not have an Apache2 licensed version. Use our lambda function to collect AWS data instead.

Getting Started

To send data to Observe, configure an elasticsearch output in your configuration file: setup.dashboards.enabled: false setup.template.enabled: false setup.ilm.enabled: false

output.elasticsearch: hosts:["https://collect.observeinc.com/v1/elastic"] username: ${OBSERVE_CUSTOMER:?OBSERVE_CUSTOMER not set} password: ${OBSERVE_TOKEN:?OBSERVE_TOKEN not set} compression_level: 4 slow_start: true

Important: Observe accepts data over the Elastic endpoint, but does not run Elasticsearch software under the hood. As such, you must disable any configuration unrelated to raw data ingestion. This includes Kibana dashboards, template loading, pipelines, index lifecycle management, as well as any modules which use the aformentioned features.

The above snippet expects OBSERVE_CUSTOMER and OBSERVE_TOKEN values to be provided as environment variables. compression_level is not required, but we recommend setting it to reduce egress traffic. We also recommend setting slow_start in order to reduce the number of events in a batch if a request fails due to the payload exceeding the maximum body size limit for our API. This section contains examples of working configurations for different Beats. These are intended as starting points, and should be modified as needed.

84 Chapter 5. Data ingestion Observe

Note: The examples below assume that OBSERVE_CUSTOMER and OBSERVE_TOKEN values are available as environment variables.

Filebeat

The following configuration reads data from a local file and sends it to Observe: name: docs-example

# Disable several unneeded features setup.ilm.enabled: false setup.dashboards.enabled: false setup.template.enabled: false

# Send logs from file example.log filebeat.inputs: - type: log enabled: true max_bytes: 131072 paths: - example.log

# Where to send the inputs defined above output.elasticsearch: hosts:["https://collect.observeinc.com:443/v1/elastic"] username: ${OBSERVE_CUSTOMER:?OBSERVE_CUSTOMER not set} password: ${OBSERVE_TOKEN:?OBSERVE_TOKEN not set} compression_level: 4

# Add additional metadata # host recommended for everyone, cloud and/or docker if using processors: - add_host_metadata: ~ # - add_cloud_metadata: ~ # - add_docker_metadata: ~

To use this example, save the above snippet as example.yaml and run: $ filebeat -e -c example.yaml2>&1 | tee -a example.log ... INFO log/harvester.go:302 Harvester started for file: example.log INFO [publisher_pipeline_output] pipeline/output.go:143 Connecting to␣

˓→backoff(elasticsearch(https://collect.observeinc.com:443/v1/elastic)) INFO [esclientleg] eslegclient/connection.go:314 Attempting to connect to␣

˓→Elasticsearch version 7.0.0 INFO [publisher_pipeline_output] pipeline/output.go:151 Connection to␣

˓→backoff(elasticsearch(https://collect.observeinc.com:443/v1/elastic)) established

The -e flag directs Filebeat to log to stderr, which is then piped to a file. This file, containing Filebeat’s ownlogs,is then monitored by Filebeat. We recommend using the max_bytes option to cap the maximum size of a log line sent to Observe. While Observe

5.4. Endpoints 85 Observe

accepts a maximum log line size of 1MB, we suggest a more conservative limit of 128KB for most usecases.

Metricbeat

The following configuration sends CPU data to Observe: name: docs-example

setup.dashboards.enabled: false setup.template.enabled: false setup.ilm.enabled: false

metricbeat.modules: - module: system metricsets:[cpu] cpu.metrics:[percentages, normalized_percentages, ticks] output.elasticsearch: hosts:["https://collect.observeinc.com:443/v1/elastic"] username: ${OBSERVE_CUSTOMER:?OBSERVE_CUSTOMER not set} password: ${OBSERVE_TOKEN:?OBSERVE_TOKEN not set} compression_level: 4 slow_start: true

To use this example, save the above snippet as example.yaml and run: $ metricbeat -e -c example.yaml ... INFO [publisher] pipeline/module.go:113 Beat name: docs-example INFO instance/beat.go:468 metricbeat start running. INFO [monitoring] log/log.go:117 Starting metrics logging every 30s INFO [publisher_pipeline_output] pipeline/output.go:143 Connecting to␣

˓→backoff(elasticsearch(https://collect.observeinc.com/v1/elastic)) INFO [publisher] pipeline/retry.go:219 retryer: send unwait signal to consumer INFO [publisher] pipeline/retry.go:223 done INFO [esclientleg] eslegclient/connection.go:314 Attempting to connect to␣

˓→Elasticsearch version 7.0.0 INFO [publisher_pipeline_output] pipeline/output.go:151 Connection to␣

˓→backoff(elasticsearch(https://collect.observeinc.com/v1/elastic)) established

FAQ

How are failures handled?

Each Beat has its own retry mechanism. For example, Filebeat uses a max_retries setting. See the Beats documen- tation for more information.

86 Chapter 5. Data ingestion Observe

Fluent Bit

Fluent Bit is a lightweight log processor and forwarder.

Installation

Fluent Bit provides detailed installation instructions on their website. For convenience, we provide pointers for the most frequently requested platforms:

Linux

Fluent Bit distributes td-agent-bit for officially supported distributions: • Amazon Linux • Redhat / CentOS • Debian • Ubuntu • Raspbian / Raspberry PI Alternatively, you can build from source.

Windows

Fluent Bit for Windows requires a manually provided root certificate to be able to send data to Observe. 1. Install the appropriate Fluent Bit td-agent-bit package, available at https://docs.fluentbit.io/manual/ installation/windows. 2. Download the self-signed ISRG Root X1 pem certificate from the Let’s Encrypt Certificates page and copy it to the directory of your choice on the Windows host. 3. Add the path to isrgrootx1.pem to the [OUTPUT] section of your fluent-bit.conf file: tls.ca_file C:\td-agent-bit\isrgrootx1.pem

Docker

Fluent Bit maintains and regularly releases container images: $ docker run -ti fluent/fluent-bit:1.7 Fluent Bit v1.7.2 * Copyright (C) 2019-2021 The Fluent Bit Authors * Copyright (C) 2015-2018 Treasure Data * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd * https://fluentbit.io

[ info] [engine] started (pid=1) [ info] [storage] version=1.1.1, initializing... [ info] [storage] in-memory [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128 [ info] [sp] stream processor started

5.4. Endpoints 87 Observe

MacOS

Fluent Bit is available through homebrew: $ brew install fluent-bit

Configuration

The following snippet contains a minimal configuration to send CPU metrics to Observe: [SERVICE] flush 5 daemon Off log_level info

[INPUT] name cpu interval_sec 1

[OUTPUT] name http match * host collect.observeinc.com port 443 tls on

# For Windows: provide path to root cert #tls.ca_file C:\td-agent-bit\isrgrootx1.pem

http_user ${OBSERVE_CUSTOMER} http_passwd ${OBSERVE_TOKEN} uri /v1/http/fluentbit

format msgpack header X-Observe-Decoder fluent compress gzip

We rely on Fluent Bit’s http output to forward data towards Observe’s HTTP endpoint. We can export data in Fluent Bit’s native msgpack format directly.

FAQ

Retry on failure

Fluent Bit retries on 5XX and 429 Too Many Requests errors. It will stop reading new log data when its buffer fills and resume when possible. Fluent Bit does not retry on other 4XX errors. See the Fluent Bit Scheduling and Retries documentation for more about retry configuration.

88 Chapter 5. Data ingestion Observe

Fluentd

Fluentd is a log processor and forwarder with an extensive plugin ecosystem. It is written in Ruby, and is therefore less efficient than Fluent Bit. We generally recommend using Fluent Bit for most usecases, unless you need a plugin only available for Fluentd.

Before Installation

Before installing Fluentd, prepare the system by following preinstall instructions

Installation

Fluentd provides detailed installation instructions on their website. For convenience, we provide pointers for the most frequently requested platforms:

Linux

Fluentd distributes td-agent for officially supported distributions: • Amazon Linux • Red Hat / CentOS • Debian/Ubuntu Alternatively, you can install from ruby gem.

Windows

Fluentd is distributed as td-agent on Windows. Available as an MSI.

Docker

For Kubernetes deployments, it’s recommended to use Observe’s Kubernetes daemonset. Fluentd maintains and regularly releases container images.

MacOS

Fluentd is available through dmg package.

5.4. Endpoints 89 Observe

Configuration

The default locations of the configuration file are: • Linux/MacOS - /etc/td-agent/td-agent.conf • Windows - C:/opt/td-agent/etc/td-agent/td-agent.conf The following snippet contains a minimal configuration to send a log file observe.log to Observe: @type tail tag logs path /var/log/observe.log path_key filename @type none chunk_limit_size 2MB

pos_file /var/log/td-agent/observe.log.pos read_from_head true limit_recently_modified 24h

@type record_transformer hostname "${hostname}"

@type http endpoint https://collect.observeinc.com/v1/http/fluentd method basic username "#{ENV['OBSERVE_CUSTOMER']}" password "#{ENV['OBSERVE_TOKEN']}" flush_interval 5s num_threads 3

We rely on Fluentd’s http output to forward data to Observe’s HTTP endpoint. Fluentd does not support compression for the http output. For the tail input plugin, you may wish to modify the following attributes: • pos_file is used by Fluentd to track logs processed so far. This allows Fluentd to resume forwarding across restarts without submitting duplicate log entries. The file must be writable by Fluentd.

90 Chapter 5. Data ingestion Observe

• read_from_head should be enabled if you wish to begin ingesting a file from the head rather than tail. Thiscan be useful when bulk uploading files on a first run. • limit_recently_modified restricts the files which are tailed by Fluentd to those modified recently. This protects against opening too many files concurrently when using a wildcard on a directory with many archived logs. Note that this sample config gets values for username and password from OBSERVE_CUSTOMER and OBSERVE_TOKEN environment variables. Set these to your username and password.

FAQ

Retry on failure

Fluentd supports exponential backoff for retries. See the Fluentd Buffer Plugins documentation for more about buffer configuration.

Google Cloud Pub/Sub

Google Cloud Pub/Sub allows you to reliably deliver streaming data from multiple sources within GCP. Observe sup- ports ingesting data through a variant of our HTTP endpoint.

Authorization

Google Cloud Pub/Sub supports pushing data in JSON format directly over HTTP, but does not support basic authen- tication nor configuring authorization headers for outbound requests. Your credentials must therefore be included the URI as a base64-encoded string.

Configuration

Navigate to the GCP Cloud Pub/Sub console to create a Pub/Sub push subscription. Note that some options are required and others are optional. 1. Select subscriptions

• Go to the Pub/Sub page in the GCP console. • Select Subscriptions in the left pane, and then click CREATE SUBSCRIPTION. 2. Configure required subscription details At minimim, you must configure these required options:

5.4. Endpoints 91 Observe

• Subscription ID and Pub/Sub topic: Enter a Subscription ID and a topic. You may choose an existing topic from the menu, or enter a new one. • Delivery type

The Delivery type Endpoint URL requires a Basic Authentication token as part of the URL. Construct this token using your Observe Customer ID and ingest token. – Select Push – Create a Basic Authorization token by Base64 encoding a string containing the customer ID and ingest token separated by a colon :. Bash on MacOS and Linux: $ echo -n "12345:my_ingest_key" | base64 MTIzNDU6bXlfaW5nZXN0X2tleQ==

PowerShell on Windows: > [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes(

˓→"12345:my_ingest_key")) MTIzNDU6bXlfaW5nZXN0X2tleQ==

– Enter the Endpoint URL https://collect.observeinc.com/v1/pubsub, appending the Base64 token you created in the previous step. https://collect.observeinc.com/v1/pubsub/MTIzNDU6bXlfaW5nZXN0X2tleQ==

– Leave the Enable authentication box unchecked. This option is for authentication between GCP services, not external ones. • Message retention duration: Select your desired retention period. To reduce storage cost, we recommend no more than 24 hours. This also determines how long an outage your subscription can recover from. • Acknowledgement deadline: Choose a deadline time of at least 60 seconds. • Retry policy: Select Retry after exponential backoff delay, with the default minimum of 10 seconds and maximum of 600 seconds. 3. Configure optional subscription details You may also configure these optional settings: • Expiration period

92 Chapter 5. Data ingestion Observe

• Subscription filter • Dead lettering Message ordering is not required. 4. When you are finished, click Create subscription.

FAQ

Retry on failure

Google Cloud Pub/Sub supports exponential backoff for retries. See the Pub/Sub Handling message failures documen- tation for more information.

Log4j

Send application logs to Observe with Log4j. For more about configuring and using Log4j, see its documentation here Log4j Documentation.

Requirements

To do this you will need the following information: • Your Observe Customer ID • Your Observe ingest token • Ability to restart your application

Configuring Log4j

1. Open the log4j2.xml file for your application. 2. Add the following Http appender to your log4j.xml file, inserting your Customer ID and ingest token into the value attribute of the Authorization node.

3. Add an AppenderRef inside the AsyncRoot node for the Observe Http appender from the previous step.

4. Restart the application to pick up the changes made to log4j2.xml.

5.4. Endpoints 93 Observe

Example Log4j Configurations

Below are some examples using the Http appender to send data to Observe.

Observe Only

Send log data to one destination, Observe, using the Http appender.

Multiple Log Appenders

Use both the file appender and the Http appender to send logs to two destinations: a local file andObserve. (continues on next page)

94 Chapter 5. Data ingestion Observe

(continued from previous page)

Logstash

Logstash is a lightweight agent for forwarding logs from a variety of different sources. It’s sources are configured via input plugins and where it forwards data configured with output plugins. For more information on input plugins see the Logstash Input Plugins documentation.

Installation

Observe works with the OSS version of Logstash, which may be downloaded at Logstash Downloads. The Logstash setup and run guides can be found here: Logstash Setup and Run Guide.

Important: Logstash versions 7.13+ can no longer forward to Elasticsearch-compatible 3rd party APIs. To use Logstash with Observe, please ensure you are using an Apache2 licensed Logstash version 7.12 or earlier.

Requirements

To use Logstash you will need the following information: 1. Your Observe Customer ID 2. Your Observe ingest token

Configuration

1. Configure an ouput plugin to forward data to Observe: Add the following output configuration to your logstash.conf file, providing your Observe Customer IDand Ingest Token where indicated. output { elasticsearch { hosts => [ "https://collect.observeinc.com:443/v1/elastic" ] user => "" password => "" ssl => true (continues on next page)

5.4. Endpoints 95 Observe

(continued from previous page) } }

FAQ

Retry on failure

Logstash supports retries. See the Logstash documentation for more information.

Observe Lambda

The Observe Lambda forwarder is a general-purpose Lambda function that forwards data to Observe. It can handle multiple types of events, including those generated by S3, DynamoDB and CloudWatch Logs, as well as read objects in an S3 bucket.

Setup

Installation

To install the Observe Lambda, you must have a valid Observe customer ID and ingest token.

AWS Console

Use our CloudFormation template to automate creating the Lambda function and its permissions. To install via the AWS Console: 1. Navigate to the CloudFormation console and view existing stacks. 2. Click Create stack. If prompted, select With new resources. 3. Provide the template details: 1. Under Specify template, select Amazon S3 URL. 2. In the Amazon S3 URL field, enter https://observeinc.s3-us-west-2.amazonaws.com/ cloudformation/lambda.yaml. 3. Click Next to continue. (You may be prompted to view the function in Designer. Click Next again to skip.) 4. Specify the stack details: 1. In Stack name, provide a name for this stack. It must be unique within a region, and is used to name created resources. 2. Under Required Parameters, provide your Customer ID in ObserveCustomer and ingest token in Ob- serveToken. 3. Click Next 5. Under Configure stack options, there are no required options to configure. Click Next to continue. 6. Review your stack options: 1. Under Capabilities, check the box to acknowledge that this stack may create IAM resources.

96 Chapter 5. Data ingestion Observe

2. Click Create stack

Video instructions

Alternatively, you may deploy the template with the awscli tool:

Caution: If you have multiple AWS profiles, make sure you configure the appropriate AWS_REGION and AWS_PROFILE environment variables in addition to OBSERVE_CUSTOMER and OBSERVE_TOKEN.

$ curl -Lo lambda.yaml https://observeinc.s3-us-west-2.amazonaws.com/cloudformation/

˓→lambda.yaml $ aws cloudformation deploy --template-file ./lambda.yaml \ --stack-name ObserveLambda \ --capabilities CAPABILITY_NAMED_IAM \ --parameter-overrides ObserveCustomer="${OBSERVE_CUSTOMER?}" ObserveToken="$

˓→{OBSERVE_TOKEN?}"

Terraform

You may also use our observe_lambda Terraform module hosted on github to configure the Lambda function. module "observe_lambda"{ source= "github.com/observeinc/terraform-aws-lambda"

name= "observe-lambda" observe_customer= "${OBSERVE_CUSTOMER}" observe_token= "${OBSERVE_TOKEN}" }

We recommend you pin the module version to the latest tagged version.

Getting Started

Each time the Lambda function is triggered, it invokes a handler for the type of event ingested. This section describes how to configure triggers for different data sources.

S3 buckets

An S3 trigger sends an event to the Lambda function when an object is created in a bucket. The function then reads the file and uploads it to Observe. Files are parsed according to their type. For example, .json files should contain a single JSON object or array, and .jsonl files should contain one or more newline delimited JSON objects.

Note: The S3 bucket must be in the same region as the Lambda function, and a bucket can only send data to one function. If you need to send to multiple functions, see Amazon Simple Notification Service.

5.4. Endpoints 97 Observe

AWS Console

1. Navigate to the Lambda console and view your functions. 2. Select your Observe Lambda function. 3. Click Add trigger. 4. Under Trigger configuration, search for S3. 5. Select the bucket you wish to subscribe to your Lambda function. 6. Optionally add a prefix or suffix filter. 7. Under Recursive invocation check the box to acknowledge the function does not both read and write to the same bucket. 8. Click Add.

Video instructions

Terraform

We provide a submodule which subscribes S3 buckets to a Lambda configured via terraform-aws-lambda. The fol- lowing is an example instantiation, assuming aws_s3_bucket.bucket references a bucket resource managed by Ter- raform: module "observe_lambda"{ source= "github.com/observeinc/terraform-aws-lambda" observe_customer= var.observe_customer observe_token= var.observe_token observe_domain= var.observe_domain name= var.name }

module "observe_lambda_s3_subscription"{ source= "github.com/observeinc/terraform-aws-lambda//s3_bucket_subscription" lambda= module.observe_lambda.lambda_function bucket= aws_s3_bucket.bucket }

For more information, please visit the submodule documentation.

CloudWatch Logs

AWS Console

1. Navigate to the Lambda console and view your functions. 2. Select your Observe Lambda function. 3. Click Add trigger. 4. Under Trigger configuration, search for CloudWatch Logs. 5. Select the desired source Log group from the dropdown. 6. In Filter name, provide a name for this filter.

98 Chapter 5. Data ingestion Observe

7. Click Add.

Video instructions

Terraform

We provide a submodule which subscribes CloudWatch Log Groups a Lambda configured via terraform-aws-lambda. The following is an example instantiation, assuming aws_cloudwatch_log_group.group references a log group resource managed by Terraform: module "observe_lambda"{ source= "github.com/observeinc/terraform-aws-lambda" observe_customer= var.observe_customer observe_token= var.observe_token observe_domain= var.observe_domain name= var.name } module "observe_lambda_cloudwatch_logs_subscription"{ source= "github.com/observeinc/terraform-aws-lambda//cloudwatch_logs_subscription" lambda= module.observe_lambda.lambda_function log_group_names=[ aws_cloudwatch_log_group.group.name ] }

For more information, see the submodule documentation.

EventBridge

You may ingest EventBridge events with the Observe Lambda forwarder, although this method is no longer recom- mended. See the EventBridge ingest documentation for alternate methods.

FAQ

How are failures handled?

The Lambda forwarder does not retry on error. This reduces the risk of unexpected AWS charges from a long-running Lambda function.

What permissions does the Lambda forwarder need?

The Lambda forwarder requires an IAM Role with permission to be invoked. The CloudFormation template allows the Lambda forwarder to be invoked from S3, SNS, or SQS, as well as read from S3. These permissions are not scoped to individual resources. For more fine grained control over permissions, we recommend the Terraform modules, which more strictly limited permissions.

5.4. Endpoints 99 Observe

What external entities does the Lambda forwarder interact with?

The Lambda forwarder only posts data over HTTPS to the Observe API. There is no mechanism for sending data from Observe to the Lambda forwarder.

Troubleshooting

• CREATE_FAILED while creating the CloudFormation stack Check that you have the correct S3 URL, customer ID, and ingest token. The template verifies the connection to Observe as part of the install process, and will fail if it is not able to authenticate.

OpenTelemetry

Send OpenTelemetry data directly from your application, or with the OpenTelemetry Collector. OpenTelemetry is a framework for creating and managing telemetry data such as traces, metrics and logs. It has several components, including SDKs for instrumenting your application and a vendor agnostic Collector. The Collector sends data to Observe using the OpenTelemetry endpoint.

Note: If you use Kubernetes, we recommend the Observe Kubernetes proxy and agent rather than manually config- uring OpenTelemetry. The proxy offers several additional configuration options tailored to Kubernetes environments, including additional metadata for ingested observations.

Collector Installation

Please see the OpenTelemetry documentation for installation instructions.

Collector Configuration

The Collector supports various configuration options. The example below describes a configuration for the Collector to receive data in OTLP format and export it to Observe. receivers: otlp: protocols: grpc: http: processors: batch: exporters: logging: logLevel: debug otlphttp: endpoint:"https://collect.observeinc.com/v1/otel" headers: (continues on next page)

100 Chapter 5. Data ingestion Observe

(continued from previous page) 'Authorization': 'Bearer my_customer_id my_ingest_token' prometheusremotewrite: endpoint:"https://collect.observeinc.com/v1/prometheus" headers: 'Authorization': 'Bearer my_customer_id my_ingest_token' extensions: health_check: pprof: zpages: service: extensions:[health_check,pprof,zpages] pipelines: traces: receivers:[otlp] processors:[batch] exporters:[otlphttp] metrics: receivers:[otlp] processors:[batch] exporters:[prometheusremotewrite] logs: receivers:[otlp] processors:[batch] exporters:[logging]

The Authorization header requires your customer ID and ingest token as a single space-delimited string: 'Authorization': 'Bearer 1234567890 my_ingest_token'

Instrument using an SDK

OpenTelemetry provides SDKs for several languages. An instrumented application can send data directly to Observe, or use the Collector. Note that it is generally recommended to use the Collector in production environments for perfor- mance and scalability. The example below describes a configuration for a Go application to export data to Observe: ctx := context.Background() driver := otlphttp.NewDriver( otlphttp.WithEndpoint("collect.observeinc.com/v1/otel"), otlphttp.WithTracesURLPath("v1/traces"), otlphttp.WithHeaders(map[string]string{ "Authorization":"Bearer my_customer_id my_ingest_token", }), ) exp, err := otlp.NewExporter(ctx, driver) handleErr(err,"failed to create exporter")

5.4. Endpoints 101 Observe

Prometheus

Prometheus is an open source metric collection agent. It has a lot of different components, but at its heart is the pull- based Prometheus server. Ingest Prometheus data with either a POST request to an API endpoint or with a Kubernetes agent. Observe implements the Prometheus remote write API for both of these methods.

Installation

Please see the Prometheus documentation for installation instructions.

Configuration

The examples below describe several types of configurations for sending Prometheus data to Observe. Add the appro- priate remote_write block to your prometheus.yml configuration file.

API Endpoint remote_write: - url: https://collect.observeinc.com/v1/prometheus bearer_token:"my_customer_id my_ingest_token" remote_timeout:"30s" queue_config: min_backoff:"1s" max_backoff:"30s" max_shards: 20 max_samples_per_send: 5000 capacity: 15000 bearer_token is your customer ID and ingest token as a single space-delimited string: bearer_token:"1234567890 my_ingest_token"

Note: To add additional labels to every observation, append one or more query parameters to the ingest URL: - url: https://collect.observeinc.com/v1/prometheus?alt_host=this_host&alt_

˓→namespace=that_namespace

Kubernetes Agent remote_write: - url: http://proxy.observe.svc.cluster.local:2001/ remote_timeout:"30s" queue_config: min_backoff:"1s" max_backoff:"30s" max_shards: 20 (continues on next page)

102 Chapter 5. Data ingestion Observe

(continued from previous page) max_samples_per_send: 5000 capacity: 15000

This method does not require bearer_token, as the daemonset already has it.

Hot/Hot Cluster

If you run two instances as a hot/hot cluster (using Prometheus Operator), you may wish to ingest only one set of metrics. Use writeRelabelConfigs to drop data from one of the instances. For example, the configuration below looks for instance names in the prometheus_replica field and drops metrics from prometheus-k8s-1: remote_write: - url: 'http://proxy.observe.svc.cluster.local:2001/' remote_timeout: 30s queue_config: min_backoff: 1s max_backoff: 30s max_shards: 20 max_samples_per_send: 5000 capacity: 15000 writeRelabelConfigs: - action: drop regex: prometheus-k8s-1 sourceLabels: - prometheus_replica

Observation Format

Observations are of kind prometheus. Fields contains a timestamp in milliseconds, and a float64 metric value. Any additional tags are added to the JSON object in the EXTRA field.

Column Value OBSERVATION_KIND prometheus FIELDS {"timestamp":,"value": 0.1}

EXTRA {"_name_": "my_metric", "other_tags": "foo

˓→"}

5.4. Endpoints 103 Observe

FAQ

Retry on failure

Prometheus supports exponential backoff for retries. See the Prometheus Remote Write Tuning documentation for more information.

Telegraf

Telegraf is a server agent for forwarding metrics, logs, and events. It accepts data from a variety of sources via input plugins and forwards them to other destinations with output plugins. Send data to Observe using the Telegraf HTTP output plugin and JSON data.

Note: For logs and events, we recommend Filebeat, Fluent Bit, or fluentd. But you may use whichever option is appropriate for your environment.

Installation

If you are not already using Telegraf, see Get started with Telegraf in the Telegraf documentation for installation details.

Configuration

1. Configure the HTTP output plugin Add the following output configuration to your telegraf.conf file, where OBSERVE_CUSTOMER and OBSERVE_TOKEN are environment variables containing your customer ID and ingest token. [[outputs.http]] url = "https://collect.observeinc.com/v1/http/telegraf"

username = "${OBSERVE_CUSTOMER}" password = "${OBSERVE_TOKEN}"

data_format = "json" content_encoding = "gzip"

[outputs.http.headers] Content-Type = "application/json" X-Observe-Decoder = "nested"

You may also provide the customer ID and token as strings instead: username = "12345" password = "my_ingest_token"

2. Configure input plugins If needed, configure the appropriate Telegraf input plugin so Telegraf can forward data from the original sources.

104 Chapter 5. Data ingestion Observe

3. Troubleshooting Telegraf configurations To debug Telegraf inputs or outputs, see Troubleshoot Telegraf in the Telegraf documentation.

5.4.4 Endpoints

Unlike most legacy systems, Observe does not have a preferred format over which data must be exchanged. Instead, Observe strives to accept any data format by natively supporting existing wire protocols. Observe maintains multiple collection endpoints, each implementing a concrete protocol or API. The currently sup- ported endpoints are:

HTTP

The http API is unique among our endpoints in that it does not implement an existing specification. It is a generic endpoint and a convenient method of ingesting data over HTTP.

Endpoint http Canonical URL collect.observeinc.com/v1/http Legacy Subdomain http.collect.observeinc.com

How it works

Just send a POST request our way! We will try our best to make sense of it based on the following principles: • The request body is parsed according to the content type header. • The path component of the URL is encoded as a tag. • Query paramaters are encoded as tags. As an example, the following POST: $ curl -X POST https://collect.observeinc.com/v1/http/first/example?key=value \ --user ${OBSERVE_CUSTOMER?}:${OBSERVE_TOKEN?} \ -H 'Content-Type: application/json' \ -d '{"message":"Hello World!"}'

Results in an observation with the following values:

Column Value OBSERVATION_KIND http FIELDS {"message": "Hello World!"}

EXTRA {"key": "value", "path": "/first/example"}

The observation fields are based on the request body, while path and query parameters are encoded in EXTRA. All observations in a payload are given the same EXTRA metadata. HTTP headers determine how the content is parsed.

5.4. Endpoints 105 Observe

Supported content types

The HTTP endpoint supports the following content type values:

Content-Type Description application/json Parses a single JSON object or array of objects. Each object is a unique observation. application/x- Parses a stream of newline delimited JSON objects. Each object is a unique observation. ndjson application/xml Internally converts XML object to JSON, and processes it according to application/json handling. applica- Parses an array of objects. tion/msgpack text/csv Generates one observation per CSV record, using the fields defined in the header row. Empty values are omitted. text/plain Generates one observation per line.

Note: There are several additional headers that control data parsing. Until these are fully documented, if you need additional options for ingesting your data, please contact support for assistance.

JSON examples

Object

You can submit an object by setting the Content-Type header to application/json. The object will result in a single observation.

Payload

{ "message": "Hello World!" }

Curl

$ curl -X POST https://collect.observeinc.com/v1/http/example \ --user ${OBSERVE_CUSTOMER?}:${OBSERVE_TOKEN?} --data-binary @payload \ -H 'Content-Type: application/json'

106 Chapter 5. Data ingestion Observe

Array of objects

You can submit an array of objects by setting the Content-Type header to application/json. Each object will result in a separate observation.

Payload

[ { "id":1 }, { "id":2 } ]

Curl

$ curl -X POST https://collect.observeinc.com/v1/http/example \ --user ${OBSERVE_CUSTOMER?}:${OBSERVE_TOKEN?} --data-binary @payload \ -H 'Content-Type: application/json'

Newline delimited objects

If you have a stream of newline delimited objects, you can submit your data as content type application/x-ndjson. Each object will result in a separate observation.

Payload

{ "id":1 } { "id":2 } { "id":3 }

5.4. Endpoints 107 Observe

Curl

$ curl -X POST https://collect.observeinc.com/v1/http/example \ --user ${OBSERVE_CUSTOMER?}:${OBSERVE_TOKEN?} --data-binary @payload \ -H 'Content-Type: application/x-ndjson'

DDTrace

Endpoint ddtrace Canonical URL collect.observeinc.com/v1/ddtrace Legacy Subdomain ddtrace.collect.observeinc.com

Datadog provides its own opensource tracing libraries for different runtimes. Each library offloads traces to a running instance of the Datadog agent, which in turn batches and uploads the data to Datadog. Observe provides limited support for ingesting traces directly from a ddtrace compatible tracing client. In particular, Observe does not plan on providing support for dynamic sampling, which was added in v0.4 of the datadog-agent trace API.

Elastic

Endpoint elastic Canonical URL collect.observeinc.com/v1/elastic Legacy Subdomain elastic.collect.observeinc.com

Elasticsearch is a popular search and analytics engine that exposes a large number of REST APIs. To enable ingesting data, Observe implements a subset of the Bulk API, with the following caveats: • /{target}/_bulk endpoint is not supported. • update and delete actions are ignored. For backward compatibility with existing Elasticsearch clients, the following endpoints were also implemented: • HEAD requests for /_template/{name} always return 200 OK. This avoids clients attempting to create tem- plates. Templates are not necessary in Observe’s architecture. • requests to /_xpack endpoints return a 400 error. These requests are usually due to client misconfiguration, and this allows the request to fail in a more visible way.

Kinesis

Endpoint kinesis Canonical URL collect.observeinc.com/v1/kinesis Legacy Subdomain kinesis.collect.observeinc.com

This endpoint implements the HTTP Endpoint Delivery Request and Response Specification for Kinesis Firehose. The protocol defines that credentials are encoded in the X-Amz-Firehose-Access-Key header. This header is only accepted when directly sent to kinesis.collect.observeinc.com. Our load balancer will rewrite this header to

108 Chapter 5. Data ingestion Observe

Authorization: Bearer ${X-Amz-Firehose-Access-Key} before forwarding the request towards the canoni- cal URL. The contents of X-Amz-Firehose-Access-Key must therefore match the contents of your bearer token.

LogPlex

Endpoint logplex Canonical URL collect.observeinc.com/v1/logplex Legacy Subdomain logplex.collect.observeinc.com

LogPlex distributes log entries generated by applications on Heroku. Logs can be forwarded to HTTP drain endpoints for further processing. To route messages from LogPlex to Observe, configure your HTTP drain endpoint tobe https://collect. observeinc.com/v1/logplex.

Additional notes

LogPlex only supports HTTP Basic Authentication. Logplex does not currently validate the server certificate during TLS handshake. We strongly suggest you review their security considerations to determine if this is appropriate for your environment.

OpenTelemetry

Endpoint otel Canonical URL collect.observeinc.com/v1/otel Legacy Subdomain otel.collect.observeinc.com

OpenTelemetry is an observability framework for software. Observe implements the write-only endpoint for OpenTelemetry traces at the URL path v1/traces. The request body should be encoded in the Protobuf format and sent with a content-type of application/x-protobuf.

Prometheus

Endpoint prometheus Canonical URL collect.observeinc.com/v1/prometheus Legacy Subdomain prometheus.collect.observeinc.com

Prometheus is an open source metric collection agent. Prometheus has a lot of different components, but at its heart is the pull based Prometheus server. Observe allows users to push into Prometheus by either an API endpoint or a Kubernetes Agent. Observe implements the Prometheus remote write API for both of these methods. For more about Prometheus ingest, see the Prometheus forwarder documentation.

5.4. Endpoints 109 Observe

Additional notes

Prometheus timestamps support millisecond granularity.

Zipkin

Endpoint zipkin Canonical URL collect.observeinc.com/v1/zipkin Legacy Subdomain zipkin.collect.observeinc.com

Zipkin is a distributed tracing system. Observe implements the write-only endpoints for both Zipkin V1 and V2. For more information, plese refer to the zipkin-api repository on GitHub.

Endpoint behavior

While each endpoint implements a different API, all endpoints behave in a consistent manner. • Data must be exchanged over TLS. Data emitted using protocols that do not support TLS (such as collectd and statsd) must be proxied through a forwarder. • Each endpoint has a unique path under the base domain https://collect/observeinc.com/v1 This path identifies the type of data being ingested, and is the preferred method. • For legacy clients, endpoints also have a subdomain under collect.observeinc.com. Many clients only support configuring the domain name of the endpoint data is sent to. For this reason, each endpoint is available througha subdomain of collect.observeinc.com. An endpoint $foo will be available on https://$foo.collect. observeinc.com. Internally, requests to https://$foo.collect.observeinc.com/ are rewritten to https://collect. observeinc.com/v1/$foo/. While subdomains are practical for legacy clients, paths can be simpler for new integrations or SDKs. Both methods implement the same underlying API, but may differ in how they handle authentication. • The endpoint used dictates the observation kind. Data ingested through endpoint $foo creates observations of kind $foo. • Query parameters are encoded as tags. Query parameters, except for those that are part of the endpoint protocol, are encoded as tags on the ingested data. This provides a simple, generic mechanism for tagging data without the need to modify sources. • A single observation cannot exceed 1MB. Independently of what endpoint data comes in through, a single, uncompressed observation cannot exceed 1MB. This is a hard limit of our stream processing pipeline.

110 Chapter 5. Data ingestion Observe

Authentication

All requests to Observe’s collection API must be authenticated. Authentication requires the following attributes:

Variable Description ${OBSERVE_CUSTOMER}A 12 digit number which identifies your account, present in the URL you use to log into Observe (https://${OBSERVE_CUSTOMER}.observeinc.com) ${OBSERVE_TOKEN}A token with write-access to the collector service in your customer account. Please contact support if you are missing this information.

Requests must provide these values in an HTTP header. If the Authentication header is missing, any request to endpoints under the https://collect.observeinc.com base domain will return 401 Unauthorized. $ curl -f https://collect.observeinc.com curl:(22) The requested URL returned error: 401

Observe supports three authentication methods: Basic auth, Bearer token, and Endpoint-specific auth.

Basic auth

For Basic authentication, provide your customer ID as the username, and your token as the password. In curl, this can be done through the --user argument. You can verify your credentials by issuing a GET against https://collect. observeinc.com. $ curl https://collect.observeinc.com --user ${OBSERVE_CUSTOMER?}:${OBSERVE_TOKEN?}

Bearer token

For Bearer authentication, append your customer ID and token to the authorization header as a bearer token. The following curl command verifies the provided credentials are valid: $ curl https://collect.observeinc.com -H Authorization:"Bearer ${OBSERVE_CUSTOMER?}$

˓→{OBSERVE_TOKEN?}"

Endpoint-specific

Requests to the base domain https://collect.observeinc.com require either Basic auth or a Bearer token. End- point subdomains, on the other hand, are designed to mimic third party protocols. They may need to conform to a concrete specification. For example, the AWS Kinesis Firehose protocol encodes credentials in the X-Amz-Firehose-Access-Key HTTP header. This header is recognized by https://kinesis.collect.observeinc.com, but not by the more specific https://collect.observeinc.com/v1/kinesis path. The subdomain version respects external requirements de- fined by the protocol specification, while the path version behaves as part of Observe’s API, and therefore expectscre- dentials to be encoded in the Authorization header. The documentation for each type of endpoint describes which authentication methods it accepts.

5.4. Endpoints 111 Observe

112 Chapter 5. Data ingestion CHAPTER SIX

WORKSHEETS

Explore the details of your data in a Worksheet: filter to rows of interest, extract new fields, create a visualization, and more. Since the underlying data isn’t changed, you can try multiple scenarios as you shape and transform.

Note: A future version of this tutorial will include a sample dataset.

113 Observe

6.1 Introduction to data modeling

This example uses data from a set of IoT devices. They report measurements like temperature and humidity, when doors are opened or closed, and power usage of appliances. Like other data sources, the data is visible in the Firehose, also known as the Observation Event Stream. The Firehose has basic search capabilities, but a spreadsheet-like Worksheet gives you many more options, from searching and filtering to creating additional Event Streams and Resource Sets.

Open a Worksheet for the Observation Event Stream using Quick Search, located in the left rail.

This basic Worksheet contains a single Stage, a table showing the results of the transformations applied to it. In this case, there are none yet. It is still an undifferentiated collection of raw observations. Since there is so much data here, the first step is to filter to what you want to look at: the IoT sensor data.

114 Chapter 6. Worksheets Observe

6.1.1 Filter

When this HTTP data source was configured, it included a unique path: smartthings-bridge. This became a value for path in the EXTRA field.

To filter on this value, select Filter JSON from the EXTRA column menu to open the Filter JSON dialog.

The Filter JSON dialog allows you to select which data you want to see, either by field name or individual value. Select Value from the dropdown and search for smartthings-bridge to show only the matching rows.

6.1. Introduction to data modeling 115 Observe

6.1.2 Add a second Stage

With the data filtered, OBSERVATION_KIND and EXTRA now contain redundant information. You could delete them right now, but perhaps they might be useful for a later investigation. Click Link New Stage to create a second Stage based on the first one. Stages progressively build on one another, but they are not themselves independent datasets. Every Stage in a Worksheet maintains a history of the actions applied to it, inheriting the state of the parent. All the data remains in the base dataset. Worksheet Stages are transient views of that data.

Now you can use Delete Column from the column heading menu to remove OBSERVATION_KIND and EXTRA, without affecting the previous Stage.

6.1.3 Create a dataset

The second Stage now shows only the data of interest: the ingest timestamp and the JSON payload. This is a good point to think about how you might want to use this data. The underlying source for this Worksheet is the Observation Event Stream, which contains everything from all your data sources. Even with only a few sources, this is an enormous amount of data. Operating directly on all of it, every time, won’t give the best performance. A better option is to create a new Event Stream dataset, so subsequent operations are only applied to the relevant data. With the second Stage selected (the right rail should say “Linked from Observation”) click Publish New Event Stream. Give this dataset a unique name, such as “Worksheet example IoT raw events”, and click Publish. The two Stages in the Worksheet are consolidated to a single Stage, with data from the newly created Event Stream. The history is not lost, but is maintained in the Event Stream Definition.

116 Chapter 6. Worksheets Observe

6.1.4 Extract fields

Next, explore the contents of FIELDS:

Select Extract from JSON in the FIELDS column menu to create new fields from the JSON payload. For this data, deviceEvent.attribute, deviceEvent.deviceID, and deviceEvent.value are a good place to start. This gives you the ID of the sensor, what type of information it reports, and the value of that reading.

6.1. Introduction to data modeling 117 Observe

But some of these Observations aren’t actually valid sensor data. To show just the good readings, remove the null value rows by selecting Remove Empty from the deviceId column menu.

6.1.5 Create a new Resource Set

These extracted fields show a basic picture of the data, but theraw deviceId isn’t particularly meaningful. There is a deviceInfo.label in the payload, you could extract a field for that as well. But if you later want to shape this same data in different ways, each new Worksheet would need to extract label every time. A more convenient option is to create a Device Resource each can link to. To model each sensor as a Resource, start with a new Worksheet for the “Worksheet example IoT raw events” dataset you created earlier. But this time, extract fields related to the sensor itself: deviceInfo.deviceId and deviceinfo. label. Use Remove Empty on deviceId to show only valid values. At the bottom of the right rail, select Create Resource in the Actions menu to open the Create Resource dialog. (If you don’t see it, make sure you don’t have a column selected.) Select the label and deviceId fields, and also deviceId as the Primary Key so there is a unique key for other datasets to link to. Click Create. This creates a second Stage for deviceId.

118 Chapter 6. Worksheets Observe

Next, indicate which field contains the Resource names. This allows other Worksheets or Landing Pages toshowa meaningful device name instead of deviceId. From the label column menu, select Set as Resource Set Label.

If the list of label values look good, click Publish New Resource Set to create the Resource Set dataset. Give it a unique name, like “Worksheet example IoT Device” and publish.

6.1. Introduction to data modeling 119 Observe

6.1.6 Link to another dataset

Back in the original Worksheet, you can now show a more readable label instead of deviceId.

From the deviceId column menu, select Link to Resource Set and choose the Resource Set you just created. The correct Linked Resource Key should already be set to deviceId. Update the Link Name to give the linked field a more meaningful name, such as “Device”. Click Apply to finish. The deviceId column is now the Device column, with device names instead of ID values.

120 Chapter 6. Worksheets Observe

6.1.7 Save the updated Worksheet (optional)

You now have two Worksheets, an Event Stream, and a Resource Set. The Worksheets use data from the Event Stream and Resource Set, but those datasets no longer depend on the Worksheets they were created from. You may save them for later use, or create a new Worksheet next time you need one.

6.2 Explore more complex data

You may have Event Streams and Resource Sets created by other members of your team. In this case, shaping and linking existing datasets may be a more common way to use a Worksheet than working with raw data from a new source.

6.2. Explore more complex data 121 Observe

Here is an example of the same IoT data, modeled as a collection of linked Event Streams, Resource Sets, and Resources. More detailed shaping and linking makes it easier to understand the relationship between different types. From here, you can perform many different types of investigations with the data.

6.2.1 Filter and visualize

Start with a question: When did someone last make coffee? Coffee at Flamingo House starts by turning on the electric kettle to boil water. Starting with a Worksheet for Device Events, filter Device to just the Water Kettle events.

122 Chapter 6. Worksheets Observe

The kettle has two sensors: power and switch. The power sensor reports how much power the kettle is using, so when this value spikes that means someone has turned it on. That is a good choice for the next filter. With only the power readings, the timechart now shows a spike in usage a few hours ago. To see this in more detail, create a visualization. Select Add Visualization from the More menu in the upper right and select value for the Y-Axis and Average for Function. Since there’s only one kettle, there is no need to select a GROUP BY option.

6.2. Explore more complex data 123 Observe

The large spike confirms this is when someone turned on the kettle for coffee, at approximately 8AM.

6.2.2 Explore linked datasets

Next question: How does the temperature vary throughout the house? Open a new Worksheet for Device Events, change the time range to Past 24 hours, and filter sensor to just the tem- perature sensors. Then add a visualization to show temperature in each room.

The temperature readings show a clear pattern throughout the day: slowly cooling through evening and early morning, and then quickly rising around 6:30 in the morning. The living room appears to have the widest range. How do you view just those readings in the visualization? Go back to its parent Stage and filter Room to only show Living Room. The visualization also shows only the living room. Each value in Room is a link, right click on one and select the Living Room Resource to view other details about the living room. The Living Room Resource Landing Page shows that the living room belongs to, or is a type of, the Resource Set Room. If you want to know more about other rooms, click Room to open the Room Resource Set Landing Page. Back on the Living Room Resource Landing Page, the Room Overview board in the Fields tab shows details about the humidity, temperature, and other metrics. The Events tab looks much like the Device Events Event Stream in the Worksheet, although everything here only applies to the Living Room. While you work with data in a Worksheet, you can jump to other related datasets to investigate a detail, and then return to modeling in the Worksheet.

124 Chapter 6. Worksheets Observe

6.3 The OPAL console

So far, all this modeling has only used the UI. But every UI action also generates an OPAL statement, visible at the bottom of the page in the console. OPAL, the Observe Processing and Analysis Language, is a full featured query and modeling language. You can perform the same UI actions with OPAL statements, or add additional statements to your OPAL script for operations not yet available in the UI. Choose whichever you are most comfortable with.

For example, the Filter JSON menu action to filter to smartthings-bridge is equivalent to this OPAL filter statement: filter (string(EXTRA.path) = "/smartthings-bridge")

For more about OPAL, see Observe Processing and Analysis Language. For additional OPAL examples, see Helpful hints: OPAL. Please contact us if you have questions about data modeling or OPAL, and look for more more resources in the near future.

6.3. The OPAL console 125 Observe

126 Chapter 6. Worksheets CHAPTER SEVEN

OPAL — OBSERVE PROCESSING AND ANALYSIS LANGUAGE

7.1 Data types and operators

OPAL is a strongly typed language, with basic and advanced data types. The sections below describe these types, available operators for constructing expressions, and additional syntax details such as comments and valid field names. The verb and function references describe the types accepted by each verb or function. Passing an argument of an unexpected type generates an error, so take care to convert values when necessary.

127 Observe

7.1.1 Basic data types

Basic type Examples Related functions and expressions bool true, false, null bool() creates a boolean value from strings and inte- gers, such as bool("true"). (Also: 1, “1”, “True”, “T”, “t”, or “on”.) Any expression that returns a boolean can be used to cre- ate one, even if it is not a string or integer. Example: myNonzeroFloat != 0 evaluates to true. duration 5m, 300s, 300000ms Verbs and functions accept string durations using ns, ms, s, m, h, or d. For other uses, the duration()family of functions cre- ates durations from nanosecond, millisecond, second, minute, or hour integer values. Duration values may be added or subtracted, and the re- sult of adding or subtracting two timestamp values is a duration. float64 123.45, 1.234e-5 float64() creates or converts to a 64-bit float value. int64 12345 int64() creates or converts to a 64-bit integer value. string “abcde” string() creates or converts to a string value. Al- lowed special characters are \n, \r, \t, \', \", and \ \. Example: 'This string contains a tab (\t) character' You may escape ' and " with a backslash or by doubling them. Examples: "doublequote (\") character" 'That''s awesome!"', "I've been informed this is called ""coffee.""" timestamp 1609459200000000000 The timestamp() family of functions creates times- tamps from integer values of nanoseconds, milliseconds, or seconds since 1970-01-01T00:00:00Z (Unix epoch.) Timestamps are stored internally in nanoseconds, so conversions to other types are based on nanosecond time. Timestamps are displayed in the UI as MM/DD/YY HH:MM:SS.FF3 in your local timezone. Example: 12/ 31/20 16:00:00.000 (for GMT-8.)

128 Chapter 7. OPAL — Observe Processing and Analysis Language Observe

7.1.2 Advanced data types

Composite type Examples Related functions and expressions Any A special-purpose type to Used primarily with verbs and functions operating on aid type safety JSON data. Corresponds to the Snowflake VARIANT, for semi-structured data that may itself contain data of sev- eral types. Important: Some verbs and functions describe inputs or outputs as “any” when they are capable of handling sev- eral possible data types. This may not include the any type. Also, any() is a window or aggregate function that re- turns type any. Array array(parsejson("[1, array() converts a value of type any to an array. Fre- 2, 3]")) quently used to convert JSON values for operations that require type array. Object makeobject("resident_set_size":rss,makeobject() creates an object from key-value pairs, often as an intermediate step to select a set of fields for "cpu_utilization":pcpu)further, operations. options(empty_bins:true) Options options(empty_bins:true)options() creates an object of type options, used to provide settings or metadata for certain verbs. options is similar to object, but the two are not inter- changeable. Verbs and functions always require a value of the correct type. Regex literal /^DEBUG/ For verbs and functions that accept a regular expression, a pattern to match delimited by / slashes. For more about syntax, see POSIX extended regular expressions.

Most types have a corresponding type_null() function that creates a null value. To pass null as an argument, use the appropriate function to create a value with the correct type. Example: colmake foo:string_null().

7.1.3 Operators

OPAL supports many common operators, as well as several additional ones for searching and accessing data within fields. Some have equivalent functions or alternate forms, which may be used interchangeably.

Arithmetic

Operator Operation + addition - subtraction * multiplication / division (, ) group, for precedence

Note: The output of dividing by zero is null. The OPAL parser does not return an error or NaN, but instead returns null for the expression. This is so an unexpected divide by zero doesn’t cause the entire pipeline to fail.

7.1. Data types and operators 129 Observe

Comparison

Operator Equivalent Function Operation = eq() equals !=, <> ne() not equal to < lt() less than > gt() greater than <= lte() less than or equal to >= gte() greater than or equal to

Logical

Operator Operation and logical AND or logical OR not logical NOT

Other

Operator Operation . nested field access for JSON [] subscript for element in an array or JSON : name a field or value: colmake intVal: round(floatVal) ~ search within the specified field, or inside JSON search for the specified literal text

See Examples for more on using these operators.

7.1.4 Additional syntax details

Comments

OPAL allows single line comments beginning with // anywhere whitespace is permitted, except inside a string literal. For multi-line comments, you may also use /* and */ start and end delimiters. // only need deviceId and label colmake deviceId:string(FIELDS.deviceInfo.deviceId), label:string(FIELDS.deviceInfo.

˓→label)

/* * TODO: better handle null deviceID values */

filter not isnull(deviceId) // ignore anything without a deviceId

130 Chapter 7. OPAL — Observe Processing and Analysis Language Observe

Multi-line statements

Indent to continue a statement on the next line: // select only the needed fields colpick time, deviceId:string(fields.deviceEvent.deviceId), sensor:string(fields.deviceEvent.attribute), value:float64(fields.deviceEvent.value),

Also, regular expressions may be broken into smaller units on multiple lines. Note that each component of a larger regex must itself be a valid regex, and are whitespace delimited: // these two statements are equivalent colregex data, /(?P[^|]*)\|count:(?P[^|]*)\|env:(?P[^|]*)/ colregex data, /(?P[^|]*)\|/ /count:(?P[^|]*)\|/ /env:(?P[^|]*)/

Field names

In most cases, field (column) names may contain any character except double quote ", period ., or colon :. Underscores are displayed as spaces in the UI. colmake "T":float64(field3) colmake "":float64(field4) colmake "0_3µm":float64(um03) colmake "":bool(done)

To reference a field with non-alphanumeric characters in an OPAL statement, use double quotes and prepend @.. colmake temp_difference:@."T"

Regex extracted columns from colregex are limited to alphanumeric characters (A-Z, a-z, 0-9).

7.2 Performance

Working directly in OPAL allows a wider range of options when modeling data. Below are some recommendations for better performance from your OPAL pipelines.

7.2. Performance 131 Observe

7.2.1 Limit query time window size

By default, worksheets read 4 hours of data. Depending on the input dataset, that can be a lot of data. Consider reducing the query time window to 1 hour or less while actively modeling.

7.2.2 Create intermediate datasets

Where possible, create an intermediate event dataset by publishing partially shaped data as a new event dataset. Queries and further derived datasets will typically have to read much less data than if they were created directly on top of the original input dataset. This technique is especially effective if the intermediate dataset applies a selective filter to the input dataset, picksonly a subset of input columns, or extracts JSON paths from an input column and then drops the original column. Avoid defining datasets directly on the Observation dataset, as it contains all ingested data in the workspace.

7.2.3 Limit makeresource time range

By default, the makeresource verb reads a large time range of input events: 24 hours. The reason for this behavior is that makeresource must compute the state of each resource at the beginning of the query time range, and, by default, it looks for events up to 24 hours in the past. Thus, a query with makeresource that has a query time range of 4 hours actually reads at least 28 hours of input data. 24+ hours can be a lot of data, especially if the input dataset is the Observation dataset. So especially avoid defining resource datasets directly on the Observation dataset. Most resource types receive events much more frequently than every 24 hours. We recommend adding options(expiry:duration_hr(...)) to your makeresource command to reduce its lookback where appropriate. For example, if it is known that the live instances of some resource dataset receive events at least every 15 minutes, it would be appropriate to set the resource expiration to 1 hour, thereby greatly reducing the amount of data read by makeresource: makeresource options(expiry:duration_hr(1)), col1:col1, primarykey(pk1, pk2)

7.3 Examples

7.3.1 Filtering

One of the most common OPAL operations is searching for data matching (or not matching) a condition. The filter verb accepts a predicate expression (filter condition) and returns all matching events in the query time window. Addi- tional verbs provide specialized matching conditions such as uniqueness, existence or non-existence, and top values.

132 Chapter 7. OPAL — Observe Processing and Analysis Language Observe

Filter expressions

The simplest filter expressions use common arithmetic and logical operators, suchas + and not. You may also use the equivalent function for those operators that have them. Construct more complex conditions with POSIX extended regular expressions, full text search, and OPAL functions such as isnull(). • Query every searchable text field in the event with the <...> text search operator. filter filter filter filter

searches for the given value as literal text. Multiple space-delimited words are individual search terms, with and implied. To search for a phrase, enclose it in quotes. Note that or is special in the search syntax, it means “the thing on the left, or the thing on the right.” If you want to search for the word or, enclose it in quotes: filter <"or">

Note: See below for how to search for text inside JSON with ~.

• Filter on a specific field with ~ The ~ operator allows searching within the specified column, which may also be done with theOPAL search function. In other words, these two statements are equivalent: filter log ~ filter search(log, "foo", "bar", "baz")

To specify multiple columns to search: filter message + error ~ filter (message ~ ) or (error ~ )

The ~ operator also allows you to search for text inside JSON blobs, which are not standard searchable text: // look for "fatal" and "error" filter json_payload ~

• Comparisons and logical expressions filter temperature > 60 and temperature < 80 filter temperature < 30 or temperature > 100 filter hostname="www" or (hostname="api" and user="root") filter not severity="DEBUG" filter not log ~ /^DEBUG/ filter not

• Unicode characters There are several ways to use non-ASCII text with filter: – Text containing Unicode characters may be typed or pasted into the OPAL console like any other text. Examples:

7.3. Examples 133 Observe

filter < > filter @."" < 5

// These are equivalent filter <""> filter <\x{1F600}> filter <"\x{1F600}">

– Unicode or special characters in a regular expresson may be either character or hex value, but you must also specify the columns to search with ~: Examples: filter message ~ // filter message ~ /\x{1F600}/

filter message ~ /\x{000d}\x{000a}/ filter message + name ~ /\x{000d}\x{000a}/ filter (message ~ /\x{000d}\x{000a}/) or (name ~ /\x{000a}/)

Handling null values

In OPAL, null values always have a type. But they are not handled in the same way as a regular value. This is particularly important in comparisons. This statement returns events with a severity not equal to DEBUG, but only for events that have a severity value: filter not severity="DEBUG"

An event that does not have a severity (in other words: the value is null), will never match. Use isnull or ifnull to explicitly include them: // exclude "DEBUG" but include null filter not severity="DEBUG" or isnull(severity)

// replace null with empty string, then check filter ifnull(severity, '') != "DEBUG"

For some comparisons, you may also compare with a null value of the appropriate type. colmake positive_or_null:case(value > 0, value, true, int64_null())

Specialized filter verbs

In addition to filter, there are several additional verbs for different types of filter operations. See the OPAL verb documentation for details. (Note that only dedup is streamable.) • always • dedup • ever • never

134 Chapter 7. OPAL — Observe Processing and Analysis Language Observe

• topk

7.3.2 Fields

Change a field’s type

To change the type of an existing field, create a new field with the desired type. Use a new name to keep both, orreplace the existing one by giving it the same name. This is useful when creating metrics, which require numeric fields to be float64. Example: colmake temperature:float64(temperature)

Extract from JSON

Reference properties in a JSON payload with either the dot or bracket operators: colmake data:string(FIELDS.data), kind:string(FIELDS["name"])

Quote the string if the property name has special characters: colmake userName:someField["user name"] colmake userCity:someField."user city" colmake requestStatus:someField.'request.status'

You may also combine methods: // Sample data: {"fields": {"deviceStatus": {"timestamp": "2019-11-15T00:00:06.984Z"}}} colmake timestamp1:fields.deviceStatus.timestamp colmake timestamp2:fields["deviceStatus"]["timestamp"] colmake timestamp3:fields.deviceStatus.["timestamp"] colmake timestamp4:parsejson(string(fields.deviceStatus)).timestamp

Extract and modify values using regex_replace(): colmake state:regex_replace(string(FIELDS.device.date), /^.*([0-9]{4,4})-([0-9]{1,2})-

˓→([0-9]{1,2}).*$/, '\\3/\\2/\\1', 1) colmake state:regex_replace(string(FIELDS.device.state), //, "error", 0) colmake state:regex_replace(string(FIELDS.device.manufacturer), /\x{2122}/, "TM", 0)

Extract with a regex

Use colregex to extract fields from a string. colregex data, /(?P[^|]*)\|count:(?P[^|]*)\|env:(?P[^|]*)/

Note: colregex allows named capture groups, unlike filter expressions.

7.3. Examples 135 Observe

7.3.3 Metrics

Registering with addmetric

• addmetric registers a single metric. It accepts an options object containing details of its type, unit, how it should be aggregated, and other options. addmetric options(label:"Temperature", type:"gauge", unit:"C", rollup:"avg",␣

˓→aggregate:"avg", interval:5m), "temperature" addmetric options(label:"Power", description:"Power in watts", type:"gauge", rollup:

˓→"avg", aggregate:"avg"), "power"

– The type of a metric determines how its values are interpreted.

Metric type Description cumulativeCounter A monotonically increasing total over the life of the metric. A cumulative- Counter value is never negative. delta The difference between the current metric value and its previous value. gauge A measurement at a single point in time.

– A metric’s rollup method determines how multiple data points for the same metric are summarized over time. A single value is created for multiple values in each rollup time window.

Rollup method Description avg The average (arithmetic mean) of all values in the window. count The number of non-null values in the window. max The largest value. min The smallest value. rate The rate of change across the window, which may be negative for delta and gauge types. A negative rate for a cumulativeCounter is treated as a reset. sum The sum of all values in the window.

– The aggregate type determines how values are aggregated across multiple metrics of the same type. For example, temperature metrics from multiple devices. Aggregate types correspond to the aggregate function of the same name.

136 Chapter 7. OPAL — Observe Processing and Analysis Language Observe

Aggregate type Description any An arbitrary value from the window, nondeterministically selected. Useful if you need a representitive value, may be (but not guaranteed to be) faster to calculate than other methods. any_not_null Like any, but guaranteed to be not null. avg The average (arithmetic mean.) count The number of non-null values. countdistinct An estimate of the number of unique values in the window. Faster than count- distinctexact. countdistinctexact The number of unique values in the window, slower but more accurate than countdistinct. max The largest value in the window. median An approximation of the median value, faster than medianexact. medianexact The median value across the window. min The smallest value in the window. stddev The standard deviation across all values in the window. sum The sum of all values in the window.

Note: For more about units, see Introduction to Metrics.

The Observe temporal relational model considers time, and tracking system data over time, an integral part of data modeling. Yet traditional attempts to model the time varying nature of data on top of relational databases have ended up with non-standard SQL extensions. These mechanisms are often fragile and hard to use. The Observe platform solves this problem by providing a language for expressing the kinds of operations you want to do as a user of the system, taking care of the time-dependent factors. And since every UI action generates an OPAL equivalent, writing code by hand vs using the UI is not one or the other. You may choose to perform some operations in the UI, some in code, and some by starting with the UI and expanding in code. This guide is designed to get you started with OPAL. It is divided into several sections: • Anatomy of an OPAL pipeline: Introduction to pipelines, verbs, and functions • Data types and operators: Working with values and constructing expressions • Performance: Best practices for queries and data modeling • Examples: Common OPAL operations • List of OPAL verbs: Verb reference • List of OPAL functions: Function reference In addition, you may find the following pages helpful: • Introduction to Worksheets • Observe glossary • Observe basic data processing model • Documentation search, available at the top left of every page We recommend starting with Anatomy of an OPAL pipeline and Data types and operators to understand the basics. Then begin exploring your own data in a Worksheet. We continue to improve OPAL and appreciate your feedback. Let us know how we can help!

7.3. Examples 137 Observe

7.4 Anatomy of an OPAL pipeline

An OPAL pipeline is a sequence of statements where the output of one is the input for the next. This could be a single line with one statement, or many lines of complex shaping and filtering. A pipeline contains four types of elements: • Inputs, defining which datasets to look at • Verbs, defining what processing to do with those datasets • Functions, defining how to transform individual values in the data • Outputs, passing a dataset on to the next verb or the final result A complete pipeline, also called an OPAL script, consists of a series of inputs, verbs, functions, and outputs that define the desired result. The diagram below illustrates combining multiple elements together: the first verb statement passes the results of its lookup operation to a second verb, which uses a function to remove null values.

7.5 Inputs

Pipeline inputs are datasets, such as an Event Stream. Pipelines may use as many datasets as required, although indi- vidual verbs vary in how many they accept. For example, lookup accepts a main input dataset containing the field of interest, and a lookup table dataset that maps those values to more useful ones. Keep in mind that each pipeline is a single sequence of steps from input to output. If a verb accepts multiple datasets as input, those datasets may not be individually processed as part of the statement. You may, however, create an intermediate dataset in a different pipeline and use that as an input.

7.6 Verbs

Verbs are the main actors in a pipeline. Each takes a primary input, either the initial dataset or the output of the verb before it in the pipeline. Some verbs accept multiple dataset inputs, such as a join operation. A verb outputs exactly one output dataset.

Tip: See the List of OPAL verbs for details on individual verbs.

The most important verb is filter, which takes the default input and returns data matching the condition defined in the filter expression. This is analogous to the WHERE clause in a SQL query.

138 Chapter 7. OPAL — Observe Processing and Analysis Language Observe

7.6.1 Streamable vs unstreamable

An important consideration is if the verb you are using is streamable. Most Observe datasets are really data streams. New data is always being added, but any particular operation is only interested in some of it. Most OPAL verbs are therefore streaming operators: they transform one (or more) input data streams to an output data stream, and only then identify which results are within the desired query time window. The simplest example is filter. When filter is applied to an input data stream, the filtering condition check is applied to each event. All events that pass the check form an output data stream. filter then queries the results, essentially selecting the desired set of events from the data stream. The data stream itself isn’t changed. This works because filter is streamable: its behavior is the same for any size query time window. Streamable verbs create streamable output datasets, which can be accelerated for better performance. A few verbs are unstreamable, meaning their output is different for different size query time windows. The resulting unstreamable dataset can’t be accelerated, so the original filter must be applied each time the dataset is queried. Unstreamable verbs perform many useful functions, particularly for ad hoc analysis in a Worksheet. But you can’t create a new dataset from those Worksheet results, as it can’t be accelerated. To create a new dataset from a Worksheet, ensure that all its OPAL is streamable before you publish a new Event Stream.

7.6.2 Types of verbs

Verbs are organized into several categories, based on the action they perform. Some verbs have more than one category. • Aggregate Aggregate verbs work with aggregate functions to summarize data. • Filter Filter verbs select events matching an expression or condition, similar to SQL SELECT WHERE.A filter state- ment might match a pattern (literal or regular expression) or return the top values for a group of values. • Join Join verbs combine data from multiple datasets to generate an output value. For example, a union operation adds new merged and appended fields from other event datasets to the primary input dataset. The flatten family of verbs are also included in the Join category, as a special case of joining a dataset with itself to create new output events. • Metadata Metadata verbs add information about the dataset itself, rather than act on the data it contains. These verbs add additional context about the dataset’s contents, or define relationships between datasets. Common metadata operations are configuring foreign keys, registering types of metrics, and creating resources from event streams. • Metrics Metrics verbs specify how metrics are defined and aggregated, such as specifying the units of reported values. • Projection Projection verbs create or remove fields based on existing fields or values. For example, colpick selects only the desired fields, dropping all others.

7.6. Verbs 139 Observe

7.7 Functions

Functions act on individual values rather than datasets. Where verbs are set operations, acting upon inputs sets and returning output sets, a function is a scalar operation. It returns a single value.

Tip: See the List of OPAL functions for details on individual functions.

7.7.1 Types of functions

There are three types of functions: • Plain, or scalar functions Act on values from an input event field, such as converting a timestamp or comparing two values. Scalar functions always output a single value per input event. Example: regex_replace() colmake foo:"foo4-bar2" // input text colmake bar:regex_replace(foo, /^([A-Za-z]{3})([0-9]{1})-([A-Za-z]{3})([0-9]{1})$/,

˓→'\\3\\2-\\1\\4', 0) // result: new column bar containing "bar4-foo2"

• Summarizing, or aggregate functions Within an aggregating verb statement (such as statsby), calculate a summary of multiple values across multiple input events. For example, avg() calculates the average of a field’s values across all input events that match the statsby groupby field. (This is similar to GROUP BY in a SQL query.) Aggregate functions typically output fewer events than are in the input. Example: count() with verb statsby statsby "reportsPerSensor":count(sensor), groupby(sensor)

• Window functions Within a window() statement, a window function looks at the input events in the window and calculates an output value for each input event. For example, avg(), when applied to a window, calculates a moving average for a fixed window size over time. window() and window functions are used with colmake and similar verbs, where the window() statement is an argument defining the contents of the output column. Example: first() // get name of the earliest sensor to report in the current window colmake FirstToReportData:window(first(sensor))

Generally, functions take expressions as arguments, and can themselves be part of an expression. max(num_hosts+3) is just as valid as max(num_hosts)+3. Scalar functions may be used anywhere an expression can be used. Aggregate and window functions are used with aggregating verbs to perform more complex operations. Some functions may be either aggregating or windowing, depending on the verb they are used with.

140 Chapter 7. OPAL — Observe Processing and Analysis Language Observe

7.8 Output

The results of a pipeline may be presented in a variety of ways. It could be statistics like top K values, histograms, or small line charts (sparklines) for each column in the output dataset. When you are querying or modeling in the UI, many of these details are handled for you. With OPAL pipelines, you control how to display your output.

7.8. Output 141 Observe

142 Chapter 7. OPAL — Observe Processing and Analysis Language CHAPTER EIGHT

LIST OF OPAL VERBS

8.1 addfk (Add Foreign Key)

Type of operation: Metadata Description Add a foreign key to the output. The foreign key identifies the target dataset and columns used to find a target resource. Is this verb streamable? Always. Usage addfk [ label ], keyfield, ...

Examples addfk "Related Resource", src1:@target.dst1, src2:@target.dst2

Adds a foreign key that links to @target, matching dst1 and dst2 in the target to src1 and src2 in the current dataset. The key label is set to “Related Resource”. addfk src1:@foo.dst1

Adds a foreign key that links to @foo, matching dst1 in the target to src1 in the current dataset. The key label is set from the existing label of the target @foo. Arguments

Argument Type Required Multiple label string False False keyfield fieldref True True

8.2 addkey (Add Candidate Key)

Type of operation: Metadata Description Add a candidate key to the output. The candidate key describes a combination of columns that together identify a resource instance, and can be the target of a foreign key. Is this verb streamable? Always.

143 Observe

Usage addkey keyfield, ...

Examples addkey cluster_uid, resource_uid

Adds a candidate key that says that cluster_uid plus resource_uid together uniquely identify the resource instance. Arguments

Argument Type Required Multiple keyfield fieldref True True

8.3 addmetric (Addmetric)

Type of operation: Metrics, Metadata Description Register a metric, with its metadata defined in an options object. name should be an expected value in the metric name field. Values for options metadata: label, unit, and description are strings type is one of: ‘cumulativeCounter’, ‘delta’, ‘gauge’ interval a duration representing the reporting interval of the metric, such as 1m, 15s rollup is one of: ‘count’, ‘max’, ‘min’, ‘rate’, ‘sum’, ‘avg’ aggregate is one of: ‘any’, ‘any_not_null’, ‘avg’, ‘count’, ‘countdistinct’, ‘counddistinctexact’, ‘max’, ‘median’, ‘me- dianexact’, ‘min’, ‘stddev’, ‘sum’ unit is optional. See the Metrics documentation for more information. Is this verb streamable? Always. Usage addmetric options, name

Examples addmetric options(label:"Ingress Bytes", type:"cumulativeCounter", unit:"bytes",␣

˓→description:"Ingress reported from somewhere", rollup:"rate", aggregate:"sum",␣

˓→interval: 15s), "ingress_bytes"

Register the metric ‘ingress_bytes’ within this dataset. The dataset must already implement the “metric” interface. addmetric options(label:"Temperature", type:"gauge", unit:"C", description:"Storage room␣

˓→B temperature", rollup:"avg", aggregate:"avg", interval: 5m), "temp"

Register the metric ‘temp’ within this dataset. The dataset must already implement the “metric” interface, for example with ‘interface “metric”, metric:sensor_type, value:value’

144 Chapter 8. List of OPAL verbs Observe

Arguments

Argument Type Required Multiple options options True False name string True False

8.4 aggregate (Aggregate)

Type of operation: Aggregate, Metrics Aliases: reaggregate(deprecated) Description Aggregates metrics across tag dimensions. Is this verb streamable? Sometimes. Usage aggregate [ groupby, ... ], groupOrAggregateFunction, ...

Examples aggregate tx_bytes:sum(tx_bytes), groupby(podName, namespace, clusterUid)

Group the tx_bytes metric by ‘podName’, ‘namespace’ and ‘clusterUid’ on each time bin, calculating the sum of the values in each bin. Arguments

Argument Type Required Multiple groupby fieldref False True groupOrAggregateFunction expression True True

8.5 always (Filter where always)

Type of operation: Filter Description Select data for resources that matched the predicate at all times Is this verb streamable? Never. Usage always predicate

Examples always string(status_code) ~ /^2.*/

8.4. aggregate (Aggregate) 145 Observe

Select only resources where the ‘status_code’ column, converted to string, always started with ‘2’, at all points of the time window. Arguments

Argument Type Required Multiple predicate bool True False

8.6 changelog (Turn Resource into Events)

Type of operation: Metadata Description Given a resource, changelog will demote it to a series of update events that would create that resource. Is this verb streamable? Always. Usage changelog

Examples changelog

Given an input resource dataset, will un-mark the “valid to” timestamp field, and make the output dataset be an event dataset.

8.7 coldrop (Drop Column)

Type of operation: Projection Description Exclude one or more columns from the input dataset to the output dataset. Is this verb streamable? Always. Usage coldrop columnname, ...

Examples coldrop debug_info, status_code

Exclude the columns ‘debug_info’ and ‘status_code’ from the data passed downstream. Arguments

Argument Type Required Multiple columnname fieldref True True

146 Chapter 8. List of OPAL verbs Observe

8.8 colenum (Column Is Enum)

Type of operation: Metadata Description Mark columns as enumerations, or not, by name. Arguments are colname:bool where the bool value must be known at compile time. Columns that are enumerations are treated differently in GUI and visualization, such as using top-k summaries instead of histograms or sparklines. Is this verb streamable? Always. Usage colenum col, ...

Examples colenum cluster_uid:false, cluster_index:false, cluster_name: true

Marks the columns cluster_uid and cluster_index as scalar values, and marks the column cluster_name as an enumer- ation value. Arguments

Argument Type Required Multiple col expression True True

8.9 colimmutable (Column Is Immutable)

Type of operation: Metadata Description Mark resource columns as time immutable a.k.a. time-invariant, or not. Arguments are colname:bool where the bool value must be known at compile time. A time immutable column is a column that does not change for a given resource instance (as identified by the resource primary key). All key columns are implicitly immutable. Columns thatare immutable can be stored and processed more efficiently. Beware: manually marking mutable columns as immutable can lead to wrong query results. Is this verb streamable? Always. Usage colimmutable col, ...

Examples colimmutable hostname:true, IP:false

Marks the column hostname as immutable, and the column IP as mutable. Arguments

Argument Type Required Multiple col expression True True

8.8. colenum (Column Is Enum) 147 Observe

8.10 colmake (New Columns)

Type of operation: Projection, Aggregate Description Add one or more new columns from the input dataset to the output dataset. See also: ‘colregex’. Is this verb streamable? Sometimes. Usage colmake columnbinding, ...

Examples colmake message:string(data.payload.message), ok:string(data.payload.ok)

Create the columns ‘message’ and ‘ok’ by coercing various data column object fields to strings. Arguments

Argument Type Required Multiple columnbinding expression True True

8.11 colpick (Pick Columns)

Type of operation: Projection Description Exclude all columns except the specified columns from the input dataset to the output dataset. Be careful toinclude necessary primary key fields and time fields needed for downstream analysis. Is this verb streamable? Always. Usage colpick columnbinding, ...

Examples colpick event_time:input_time, uid:data.request.sourceHost, status_code:int64(data.

˓→request.httpStatus), message:message, ok:int64(data.request.httpStatus) < 400

Re-shape the data to contain exactly the five columns ‘event_time’, ‘uid’, ‘status_code’, ‘message’, and ‘ok’. Arguments

Argument Type Required Multiple columnbinding expression True True

148 Chapter 8. List of OPAL verbs Observe

8.12 colregex (RegEx New Columns)

Type of operation: Projection Description Add one or more columns by matching capture names in a regular expression against a given source expression. Regex extractions create string columns. Named capture groups are an extension to POSIX extended regular expressions. If the column already exists, and the regular expression finds nothing, the previous value is preserved. See also: ‘colmake’. Is this verb streamable? Always. Usage colregex path, regex

Examples colregex message, /status=(?P\d+)/

Create the column ‘statuscode’ by matching for status=numbers in the field ‘message’. colregex inputcol, /(?P[^|]*)\|count:(?P[^|]*)\|env:(?P[^|]*)/

Given an input column value like: “studio-aqi|count:654 201 28 0 0 0|env:3 4 4a”, generate three output columns: “sensor” with the value “studio-aqi”, “counts” with the value “654 201 0 0 0”, and “env” with the value “3 4 4a”. Arguments

Argument Type Required Multiple path expression True False regex regex True False

8.13 colrename (Rename Columns)

Type of operation: Projection Description Include all columns while renaming the specified columns from the input dataset to the output dataset. Argument structure is newname:oldname. Includes necessary primary key fields and time fields needed for downstream analysis. Is this verb streamable? Always. Usage colrename columnbinding, ...

Examples colrename event_time:input_time, uid:sourceHost, status_code:httpStatus

Renames the input columns to ‘event_time’, ‘uid’, ‘status_code’ while still retaining the rest of the columns in the table Arguments

8.12. colregex (RegEx New Columns) 149 Observe

Argument Type Required Multiple columnbinding expression True True

8.14 colshow (Show/Hide Columns)

Type of operation: Metadata Description Show or hide columns by name. Arguments are colname:bool where the bool value must be a literal true or false. Is this verb streamable? Always. Usage colshow col, ...

Examples colshow cluster_uid:false, cluster_index:false, cluster_name: true

Hides the columns cluster_uid and cluster_index, and shows the column cluster_name. Arguments

Argument Type Required Multiple col expression True True

8.15 dedup (dedup)

Type of operation: Aggregate Aliases: distinct Description dedup collapses all rows in an event dataset with identical values in specified columns to a single row. For the remaining columns, an arbitrary value from the collapsed rows is picked while preferring non-null values. When no column names are given, dedup collapses rows with identical values in all the columns to a single row. Is this verb streamable? Always. Usage dedup [ columnname, ... ]

Examples dedup vf, message

Collapse the rows with identical values in vf and message columns to a single row. dedup

150 Chapter 8. List of OPAL verbs Observe

Remove duplicate rows in the input dataset Arguments

Argument Type Required Multiple columnname expression False True

8.16 droptime (Drop Time)

Type of operation: Metadata Description Clear the ‘valid from’ (and the ‘valid to’), turning the output rows from the current query window into non-temporal rows. The output of this verb is not streamable. Is this verb streamable? Never. Usage droptime

Examples droptime

Drops the ‘valid from’ and ‘valid to’ designations of any such columns in the input dataset.

8.17 ever (Filter where ever)

Type of operation: Filter Description Select data for resources that at some point matched the predicate Is this verb streamable? Never. Usage ever predicate

Examples ever string(status_code) ~ /^5.*/

Select only resources where the ‘status_code’ column, converted to string, starts with ‘5’, at any point of the time window. Arguments

Argument Type Required Multiple predicate bool True False

8.16. droptime (Drop Time) 151 Observe

8.18 exists (Exists)

Type of operation: Join Description Return the rows from the default dataset that have a match anywhere in the query time window. (Untemporal semijoin) Is this verb streamable? Never. Usage exists predicate, ...

Examples exists [email protected]_id

Semijoin the default dataset with the ‘right’ dataset, returning rows from ‘default’ where there exists a key match at any point in time within the query window. Arguments

Argument Type Required Multiple predicate bool True True

8.19 filter (Filter)

Type of operation: Filter Description Exclude rows from the input dataset that do not match the given predicate expression. Is this verb streamable? Sometimes. Usage filter predicate

Examples filter string(status_code) ~ /^5.*/

Keep only rows where the ‘status_code’ column, converted to string, starts with ‘5’. Arguments

Argument Type Required Multiple predicate bool True False

152 Chapter 8. List of OPAL verbs Observe

8.20 fkdrop (Drop a Foreign Key)

Type of operation: Metadata Description Drop a foreign key by specifying the foreign keys label. If the specified label does not exist, fkdrop will not return an error, if multiple foreign keys exist with the same label, each of them will be dropped Is this verb streamable? Always. Usage fkdrop label, ...

Examples fkdrop foreign_key_label

Drops a foreign key that matches the specified label. Arguments

Argument Type Required Multiple label string True True

8.21 flatten (Flatten)

Type of operation: Misc, Join Description Given an input of object or array type, recursively flatten all child elements into ‘_c_NAME_path’ and ‘_c_NAME_value’ columns, generating null values for intermediate object/array values. The default is to not sug- gest column types (‘suggesttypes’ = ‘false’.) See also flattensingle. Is this verb streamable? Always. Usage flatten pathexpression, [ suggesttypes ]

Examples flatten foo

Produce new columns that contain every possible path and its corresponding value, with null values for intermediate key paths so the full tree is returned. Column ‘foo’ will be removed. flatten foo, true

Produce new columns that contain every possible path and its corresponding value. It will also attempt to determine the value’s type, creating a third column, ‘_c_foo_type’, containing the name of the identified type. Column ‘foo’ will be removed. Arguments

8.20. fkdrop (Drop a Foreign Key) 153 Observe

Argument Type Required Multiple pathexpression fieldref True False suggesttypes bool False False

8.22 flattenall (Flatten All)

Type of operation: Misc, Join Description Given an input of object or array type, recursively flatten all child elements into ‘_c_NAME_path’ and ‘_c_NAME_value’ columns, including intermediate object/array values. (This is expensive – consider flattenleaves instead.) The default is to not suggest column types (‘suggesttypes’ = ‘false’.) Is this verb streamable? Always. Usage flattenall pathexpression, [ suggesttypes ]

Examples flattenall foo

Produce new columns that contain every possible path and its corresponding value. Column ‘foo’ will be removed. flattenall foo, true

Produce new columns that contain every possible path and its corresponding value. It will also attempt to determine the value’s type, creating a third column, ‘_c_foo_type’, containing the name of the identified type. Column ‘foo’ will be removed. Arguments

Argument Type Required Multiple pathexpression fieldref True False suggesttypes bool False False

8.23 flattenleaves (Flatten Leaves)

Type of operation: Misc, Join Description Given an input of object or array type, recursively flatten all child elements into ‘_c_NAME_path’ and ‘_c_NAME_value’ columns, returning only leaf values. The default is to not suggest column types (‘suggesttypes’ = ‘false’.) See also flattensingle. Is this verb streamable? Always. Usage flattenleaves pathexpression, [ suggesttypes ]

154 Chapter 8. List of OPAL verbs Observe

Examples flattenleaves foo

Produce new columns that contain every leaf path and its corresponding value. Column ‘foo’ will be removed. flattenleaves foo, true

Produce new columns that contain every leaf path and its corresponding value. It will also attempt to determine the value’s type, creating a third column, ‘_c_foo_type’, containing the name of the identified type. Column ‘foo’ will be removed. Arguments

Argument Type Required Multiple pathexpression fieldref True False suggesttypes bool False False

8.24 flattensingle (Flatten Single)

Type of operation: Misc, Join Description Given an input of object or array type, flatten the first level of child elements into ‘_c_NAME_path’ and ‘_c_NAME_value’ columns. The default is to not suggest column types (‘suggesttypes’ = ‘false’.) Is this verb streamable? Always. Usage flattensingle pathexpression, [ suggesttypes ]

Examples flattensingle foo

Produce new columns that contain the path and values of the top level of keys in foo. Column ‘foo’ will be removed. flattensingle foo, true

Produce new columns that contain the path and values of the top level of keys in foo. It will also attempt to determine the value’s type, creating a third column, ‘_c_foo_type’, containing the name of the identified type. Column ‘foo’ will be removed. Arguments

Argument Type Required Multiple pathexpression fieldref True False suggesttypes bool False False

8.24. flattensingle (Flatten Single) 155 Observe

8.25 follow (Follow)

Type of operation: Join Description Return the rows from the additional joined dataset that have a match anywhere in the query time window. (Untemporal semijoin) Is this verb streamable? Never. Usage follow predicate, ...

Examples follow [email protected]_id

Semijoin the default dataset with the ‘right’ dataset, returning rows from ‘right’ where there exists a key match at any point in time within the query window. Arguments

Argument Type Required Multiple predicate bool True True

8.26 fulljoin (Outer Join)

Type of operation: Join Description Temporal full join, adding new columns in the output dataset. Is this verb streamable? Sometimes. Usage fulljoin predicate, ..., [ columnbinding, ... ]

Examples fulljoin [email protected], hostname:@host.name

Temporal full join with dataset ‘host’, and extract the ‘name’ column from that ‘host’ table, calling the new column ‘hostname’ in the output. Arguments

Argument Type Required Multiple predicate bool True True columnbinding expression False True

156 Chapter 8. List of OPAL verbs Observe

8.27 interface (Interface)

Type of operation: Metadata Description Map fields of this dataset to a pre-defined interface. Is this verb streamable? Always. Usage interface interfaceName, fieldBinding, ...

Examples interface "notification", kind:myKindStr, description:logText, importance:sevInt

Make this dataset implement the ‘notification’ interface, binding the existing column ‘myKindStr’ to the ‘kind’ inter- faceName, the existing column ‘logText’ to the ‘description’ interfaceName, and the existing column ‘sevInt’ to the ‘importance’ interfaceName. interface "metric", metric:metricNameColumn, value:metricValueColumn

Make this dataset implement the ‘metric’ interface. Bind the existing column containing metric names (‘metric- NameColumn’) to the ‘metric’ interfaceName, and the existing column containing ‘float64’ metric values (‘metric- ValueColumn’) to the ‘value’ interfaceName. Arguments

Argument Type Required Multiple interfaceName string True False fieldBinding expression True True

8.28 join (Inner Join)

Type of operation: Join Description Temporal inner join, adding new columns in the output dataset. Is this verb streamable? Sometimes. Usage join predicate, ..., [ columnbinding, ... ]

Examples join [email protected], hostname:@host.name

Temporal inner join with dataset ‘host’, and extract the ‘name’ column from that ‘host’ table, calling the new column ‘hostname’ in the output. Arguments

8.27. interface (Interface) 157 Observe

Argument Type Required Multiple predicate bool True True columnbinding expression False True

8.29 leftjoin (Left Join)

Type of operation: Join Description Temporal left join, adding new columns in the output dataset. Is this verb streamable? Sometimes. Usage leftjoin predicate, ..., [ columnbinding, ... ]

Examples leftjoin [email protected], hostname:@host.name

Temporal left join with dataset ‘host’, and extract the ‘name’ column from that ‘host’ table, calling the new column ‘hostname’ in the output. Arguments

Argument Type Required Multiple predicate bool True True columnbinding expression False True

8.30 lookaround_join (Lookaround Join)

Type of operation: Join Description Lookaround join is a type of inner join. Where this differs from “join” is that a row from the input dataset ismatched only with rows from other dataset that are within the specified timeframe around the input row. This join can only be used to join two event datasets. The first argument “frame” specifies the timeframe. Usage of “predicate” and “columbinding” arguments isidentical to that in “join” verb. Is this verb streamable? Sometimes. Usage lookaround_join frame, predicate, ..., [ columnbinding, ... ]

Examples lookaround_join frame_exact(back: 2s, ahead: 2s), [email protected], location:@host.location

158 Chapter 8. List of OPAL verbs Observe

For every row in the input dataset, fetch the rows from “host” that are within 2s of the input row and if there’s a match on IP addresses, extract location field from the “host” dataset Arguments

Argument Type Required Multiple frame frame True False predicate bool True True columnbinding expression False True

8.31 lookup (Look-up)

Type of operation: Join Description Find matching rows in a resource, making new columns in the output dataset. Is this verb streamable? Always. Usage lookup foreignkeyequalitypredicate, ..., columnbinding, ...

Examples lookup [email protected], hostname:@host.name

Look up the ‘host_uid’ value as the ‘uid’ column in the input table named ‘host’, and extract the ‘name’ column from that ‘host’ table, calling the new column ‘hostname’ in the output. Arguments

Argument Type Required Multiple foreignkeyequalitypredicate bool True True columnbinding expression True True

8.32 makeresource (Make Resource)

Type of operation: Metadata Description Convert an event table to a resource with the specified primary key. Collapse adjacent events that contain thesame primary key value and use the first time such event is observed as ‘valid_from’. The ‘valid_to’ of the row isdetermined by the minimal of the following three things: 1) the timestamp of the next distinct event; 2) ‘valid_from’ + the expression optionally specified in validfor(); 3) ‘valid_from’ + the ‘expiry’ option value Is this verb streamable? Sometimes. Usage makeresource [ options ], columnbinding, ..., primarykey, [ validfor ]

Examples

8.31. lookup (Look-up) 159 Observe

makeresource options(expiry:duration_hr(1)), col1:col1, primarykey(pk1, pk2)

Produces a resource with the primary key (pk1, pk2) and column col1 with an expiry period of 1 hour. makeresource col1:col1, primarykey(pk1, pk2), validfor(duration(col2))

Produces a resource with the primary key (pk1, pk2) and column col1 with an expiry period determined by column col2. Arguments

Argument Type Required Multiple options options False False columnbinding expression True True primarykey primarykey True False validfor validfor False False

8.33 makesession (Make Session)

Type of operation: Metadata, Aggregate Description Group events or intervals that are close to each other into sessions, and calculate aggregations over each session. Two events or intervals would be assigned to the same session if the time period between them is below the session expiry time (default to one day). Two overlapped intervals will always be mapped to the same session. The output’s ‘valid_from’ and ‘valid_to’ fields would mark the start and end time of the session. Note that applying makesession over existing sessions may result in unexpected results. For instance, calculating average again over an already averaged column will not result in the correct average overall. Is this verb streamable? Sometimes. Usage makesession [ options ], [ groupby, ... ], groupOrAggregateFunction, ...

Examples makesession cnt:count(1), groupby(server_name)

Group input events or intervals into per server name sessions, and count the number of events or intervals in each session. Return a dataset with 4 columns ‘valid_from’, ‘valid_to’, ‘server_name’, and ‘cnt’. makesession options(expiry:10m), cnt:count(1), groupby(server_name)

Similar to the above example, but expire each session after 10 minute’s inactivity (no new event falls into the session). Arguments

Argument Type Required Multiple options options False False groupby fieldref False True groupOrAggregateFunction expression True True

160 Chapter 8. List of OPAL verbs Observe

8.34 mergeevent (Merge Event)

Type of operation: Join Description Merge an event (or point) table with the current resource. Is this verb streamable? Always. Usage mergeevent [ options ], pkequalitypredicate, ..., columnbinding, ...

Examples mergeevent options(expiry:duration_hr(1)), [email protected], cpu:@cpuload.load

Look up the ‘host_uid’ value as the ‘host’ column in the event table named ‘cpuload’, and extract the ‘load’ column from that ‘cpuload’ table, calling the new column ‘cpu’ in the output resource. Arguments

Argument Type Required Multiple options options False False pkequalitypredicate bool True True columnbinding expression True True

8.35 never (Filter where never)

Type of operation: Filter Description Select data for resources that at no point matched the predicate Is this verb streamable? Never. Usage never predicate

Examples never string(status_code) ~ /^5.*/

Select only resources where the ‘status_code’ column, converted to string, never started with ‘5’, at any point of the time window. Arguments

Argument Type Required Multiple predicate bool True False

8.34. mergeevent (Merge Event) 161 Observe

8.36 rollup (rollup)

Type of operation: Aggregate, Metrics Description Rollup raw metrics into aligned metrics Is this verb streamable? Sometimes. Usage rollup [ options ], metric, ...

Examples rollup options(resolution:300s), requests:metric("requests_total")

Generates a column named “requests” holding “requests_total” metric and align them with 300s time bins. rollup options(buckets:2000), failed_requests:metric("requests_total")

Generates a column named “failed_requests” holding “requests_total” metric and align them with 2000 uniform time bins in the query window. rollup options(resolution:300s), failed_requests:metric("requests_total", filter:status_

˓→code >= 400 and status_code <= 599)

Generates a column named “failed_requests” holding “requests_total” metric where status_code is in [400, 599], and align them with 300s time bins. rollup options(resolution:300s), failed_requests:metric("requests_total",␣

˓→type:cumulativeCounter, rollup:avg, aggregate:sum)

Generates a column named “failed_requests” holding “requests_total” metric and align them with 300s time bins with the provided method. Arguments

Argument Type Required Multiple options options False False metric expression True True

8.37 setlabel (Set Label)

Type of operation: Metadata Description Declare the ‘label’ of the output to be the designated column. The column must contain strings. Is this verb streamable? Always. Usage setlabel name

162 Chapter 8. List of OPAL verbs Observe

Examples setlabel device_name

Sets ‘label’ of the output dataset as the ‘device_name’ column. Arguments

Argument Type Required Multiple name fieldref True False

8.38 setpk (Set Primary Key)

Type of operation: Metadata Description Declare the primary key of the output as consisting of one or more named columns. All rows with the same value in this column (or these columns) will be considered part of the same resource. This is a low-level function that will generate confusing results if not used as part of a larger context. It is recommended to instead use ‘makeresource’ or ‘mergeevent’ or ‘timechart’ to go from event to resource, and ‘changelog’ to go from resource to event shape. Is this verb streamable? Always. Usage setpk columnname, ...

Examples setpk device_uid

Sets the primary key designation of the output dataset as the ‘device_uid’ field. Arguments

Argument Type Required Multiple columnname fieldref True True

8.39 setvf (Set ‘Valid From’)

Type of operation: Metadata Description Declare the ‘valid from’ of the output to be the named column. Beware changing time to a field that is too far off from the current timestamp field, because it may end up falling outside of the processing time window. Is this verb streamable? Always. Usage setvf [ options ], columnname

Examples

8.38. setpk (Set Primary Key) 163 Observe

setvf ts_col

Sets the ‘valid from’ designation of the output dataset as the ‘ts_col’ field. setvf options(max_time_diff:duration_hr(1)), ts_col

Sets the ‘valid from’ designation of the output dataset as the ‘ts_col’ field, and the maximum time difference between the original ‘valid from’ field and ‘ts_col’ is less than one hour. Arguments

Argument Type Required Multiple options options False False columnname fieldref True False

8.40 setvt (Set ‘Valid To’)

Type of operation: Metadata Description Declare the ‘valid to’ of the output to be the named column. Omitting the column name will clear the ‘valid to’, changing an interval input to a point-time output. This is a low-level function that will generate confusing results if not used as part of a larger context. It is recommended to instead use ‘makeresource’ or ‘mergeevent’ or ‘timechart’ to go from event to resource, and ‘changelog’ to go from resource to event shape. If you absolutely need this: Beware changing time to a value that is too far off from the current timestamp field, because it may end up falling outsideof the processing time window. Also, setting a “valid to” that’s before the “valid from” time will cause the datum to be filtered out by subsequent packing. Is this verb streamable? Always. Usage setvt [ options ], [ columnname ]

Examples setvt ts_col

Sets the ‘valid to’ designation of the output dataset as the ‘ts_col’ field. setvt options(max_time_diff:duration_hr(1)), ts_col

Sets the ‘valid to’ designation of the output dataset as the ‘ts_col’ field, and the maximum time difference between the original ‘valid to’ and ‘ts_col’ is less than one hour. setvt

Removes the ‘valid to’ designation from the output dataset Arguments

Argument Type Required Multiple options options False False columnname fieldref False False

164 Chapter 8. List of OPAL verbs Observe

8.41 statsby (Stats By)

Type of operation: Aggregate Description Calculate statistics of columns with aggregate functions, based on (optional) grouping columns. Is streamable if the input is an event dataset and the time stamp column is part of the grouping set. Is this verb streamable? Sometimes. Usage statsby [ groupby, ... ], groupOrAggregateFunction, ...

Examples statsby Count:count(1), groupby(server_name)

Group input data by server name, calculating a count of rows per server name, returning a dataset with the two columns ‘server_name’ and ‘Count’. Arguments

Argument Type Required Multiple groupby fieldref False True groupOrAggregateFunction expression True True

8.42 surrounding (Surrounding)

Type of operation: Join Description Rows from the “right” dataset that fall within the specified frame of at least one row in the default dataset are unioned with the input dataset. The shape of output would be as if the right dataset and left dataset were combined using the union verb. The column bindings are applied only to the rows of the input dataset. Rows from the right dataset will have these new columns set to null in the output. Is this verb streamable? Sometimes. Usage surrounding frame, source, [ column bindings, ... ]

Examples filter | surrounding frame(back: 2s, ahead: 2s), @logs, panic:true

After filtering to the rows matching “panic”, this pulls in all the rows from “logs” whose timestamp is within2softhe filtered rows. In the output, rows from the input dataset will have “panic” field populated with true. Rows fromthe “logs” dataset will have this field set to null. Arguments

8.41. statsby (Stats By) 165 Observe

Argument Type Required Multiple frame frame True False source datasetref True False column bindings expression False True

8.43 timechart (Time Chart)

Type of operation: Aggregate Description Bin (in time) and aggregate point or interval table columns through time, based on (optional) grouping columns. An optional window frame can be specified to compute hopping window aggregation. Is this verb streamable? Sometimes. Usage timechart [ options ], bin_duration, [ frame ], [ groupby, ... ],␣

˓→groupOrAggregateFunction, ...

Examples timechart 1h, Count:count(1), groupby(server_name)

Group input point table by server name, calculating a count of rows through time per server name per hour, returning a dataset with the 5 columns ‘valid_from’, ‘valid_to’, ‘bucket’, ‘server_name’, and ‘Count’. timechart 1h, frame(back:24h), Count:count(1), groupby(server_name)

Group input point table by server name, calculating a moving count of rows through time per server name per hour, with each count covering the 24 hour window ending at the hour. timechart options(empty_bins:true), 1h, Count:count(1), groupby(server_name)

Similar to the first example, but generate a row with NULL value for each time bin in the query window withno matching input rows. The query may run slowly if the input data points are sparse. Arguments

Argument Type Required Multiple options options False False bin_duration duration True False frame frame False False groupby fieldref False True groupOrAggregateFunction expression True True

166 Chapter 8. List of OPAL verbs Observe

8.44 timestats (Time Stats)

Type of operation: Aggregate Description Aggregate resource columns at every point in time, based on (optional) grouping columns Is this verb streamable? Sometimes. Usage timestats [ groupby, ... ], groupOrAggregateFunction, ...

Examples timestats Count:count(1), groupby(server_name)

Group input resource by server name, calculating a count of rows for each slice of time per server name, returning a dataset with the 4 columns ‘valid_from’, ‘valid_to’, ‘server_name’, and ‘Count’. As opposed to timechart, this calculates values that change at any point in time, whereas timechart calculates aggregates per fixed bucket. Arguments

Argument Type Required Multiple groupby fieldref False True groupOrAggregateFunction expression True True

8.45 topk (Topk)

Type of operation: Filter Description Selects data for top k ranked groups. If no rank method is provided, a default one will be used. If no grouping is specified, the set of primary key columns will be used as the grouping. Is this verb streamable? Never. Usage topk k, [ rank ], [ groupby ]

Examples topk 100

Select the top 100 groups using the default rank method: the hash of the group identifiers (the set of primary key columns). topk 100, groupby(clusterUid, namespace)

Similar to the first example, but explicitly specifying the grouping topk 100, max(restartCount)

Similar to the first example, but using a custom rank method to find the groups with most restarts

8.44. timestats (Time Stats) 167 Observe

topk 1, groupby()

This topk operates on empty grouping, where all rows belong to the same group, and hence all rows will be selected Arguments

Argument Type Required Multiple k int64 True False rank expression False False groupby fieldref False False

8.46 union (Union Event Datasets)

Type of operation: Join Description Create a new event dataset, consisting of events from two or more datasets, where the datasets are mapped onto the shape of the main input through column name matching, and filling in with NULL for mismatched column names. The event time column does not need to be explicitly mapped. It is an error to map columns of different types to the same name. Is this verb streamable? Always. Usage union dataset, ...

Examples union @second, @third

Create a new dataset that is the union of the main input dataset, and the @second and @third datasets, where names that are not shared are given NULL values in the opposite dataset. Arguments

Argument Type Required Multiple dataset datasetref True True

168 Chapter 8. List of OPAL verbs CHAPTER NINE

LIST OF OPAL FUNCTIONS

9.1 abs (Abs)

Description Returns the absolute value of ‘val’. Return type numeric Domain This is a scalar function (acts on values from each row individually.) Usage abs( val )

Examples colmake all_positive:abs(@.mixed)

Create a column ‘all_positive’ with the absolute values of column ‘mixed’ Arguments

Argument Type Required Multiple val numeric True False

9.2 any (Any)

Description Return any value of one column across a group Return type any Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage

169 Observe

any( expression )

Arguments

Argument Type Required Multiple expression any True False

9.3 any_not_null (Any Not Null)

Description Return any non-null value of one column across a group. Can still return null if all values in the group are null Return type any Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage any_not_null( expression )

Arguments

Argument Type Required Multiple expression any True False

9.4 any_null (Any Null)

Description Returns a null value of type any. This is important, because some functions, like case(), return more convenient outputs if all their arguments are of the same type. Return type any Domain This is a scalar function (acts on values from each row individually.) Usage any_null()

Examples colmake positive_or_null:case(x > 0, x, true, any_null())

Create a column ‘positive_or_null’ which is either a positive any, or the null any.

170 Chapter 9. List of OPAL functions Observe

9.5 array (To Array)

Description Convert a datum into an array or NULL if conversion is impossible Return type array Domain This is a scalar function (acts on values from each row individually.) Usage array( value )

Examples colmake arr:array(variant)

Make a new column ‘arr’ from casting variant to array. Arguments

Argument Type Required Multiple value any True False

9.6 array_agg (Array Aggregation)

Description Returns an array of concatenated input values. The expression must be a valid expression (not a regex or other type). If no ordering is specified, the default ordering is by ‘valid_from’, ascending. Return type array Domain This is an aggregate function (aggregates rows over a group in statsby.) Usage array_agg( expr, [ orderby ] )

Examples statsby nicknames:array_agg(nickname, orderby(email)), groupby(uid, fullname)

An array containing all nicknames for each individual in the organization; for users with more than one nickname they will be sorted by email address. Arguments

9.5. array (To Array) 171 Observe

Argument Type Required Multiple expr any True False orderby ordering False False

9.7 array_length (Array Length)

Description returns the number of elements in an array, or null if input is not an array Return type int64 Domain This is a scalar function (acts on values from each row individually.) Usage array_length( array )

Arguments

Argument Type Required Multiple array array True False

9.8 array_null (Array Null)

Description Returns a null value of type array. This is important, because some functions, like case(), return more convenient outputs if all their arguments are of the same type. Return type array Domain This is a scalar function (acts on values from each row individually.) Usage array_null()

Examples colmake positive_or_null:case(x > 0, x, true, array_null())

Create a column ‘positive_or_null’ which is either a positive array, or the null array.

172 Chapter 9. List of OPAL functions Observe

9.9 array_pivot (Array Pivot)

Description Converts an array of “key”-“value” pairs into an object with key-value attributes. Note: int64 numbers will be converted to float64 and may lose precision. Return type object Domain This is a scalar function (acts on values from each row individually.) Usage array_pivot( array, keyFieldName, valueFieldName )

Examples array_pivot(array(FIELD.foo), "key", "value")

Converts ‘[{“key”: “k1”, “value”: “v1”} {“key”: “k2”, “value”: “v2”}, ...]’ in column values to ‘{“k1”: “v1”, “k2”: “v2”, ...}’ Arguments

Argument Type Required Multiple array array True False keyFieldName string True False valueFieldName string True False

9.10 array_unpivot (Array Unpivot)

Description Convert an object into an array of “key”-“value” pairs. Note: int64 numbers will be converted to float64 and may lose precision. Return type array Domain This is a scalar function (acts on values from each row individually.) Usage array_unpivot( object, keyFieldName, valueFieldName )

Examples array_unpivot(object(FIELD.foo), "key", "value")

9.9. array_pivot (Array Pivot) 173 Observe

Converts ‘{“k1”: “v1”, “k2”: “v2”, ...}’ to ‘[{“key”: “k1”, “value”: “v1”}, {“key”: “k2”, “value”: “v2”}, ...]’ Arguments

Argument Type Required Multiple object object True False keyFieldName string True False valueFieldName string True False

9.11 avg (Average)

Description Calculate the arithmetic average of the input expression across the group. Return type numeric Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage avg( value, ... )

Arguments

Argument Type Required Multiple value numeric True True

9.12 bin_end_time (Bin End Time)

Description Returns the end time of the current bin (exclusive). Can only be used with timechart. Return type timestamp Domain This is a scalar function (acts on values from each row individually.) Usage bin_end_time()

Examples timechart 10m, last_to_bin_end:min(bin_end_time() - timestamp), groupby(key)

Create a column named “last_to_bin_end” that contains the difference between the last data point in the bin andthe bin’s end time.

174 Chapter 9. List of OPAL functions Observe

9.13 bin_start_time (Bin Start Time)

Description Returns the start time of the current bin (inclusive). Can only be used with timechart. Return type timestamp Domain This is a scalar function (acts on values from each row individually.) Usage bin_start_time()

Examples timechart 10m, first_to_bin_start:min(timestamp - bin_start_time()), groupby(key)

Create a column named “first_to_bin_start” that contains the difference between the first data point in the binand bin’s start time.

9.14 bool (Make Boolean)

Description Generate a boolean value of the argument value. Return type bool Domain This is a scalar function (acts on values from each row individually.) Usage bool( value )

Arguments

Argument Type Required Multiple value any True False

9.13. bin_start_time (Bin Start Time) 175 Observe

9.15 bool_null (Bool Null)

Description Returns a null value of type bool. This is important, because some functions, like case(), return more convenient outputs if all their arguments are of the same type. Return type bool Domain This is a scalar function (acts on values from each row individually.) Usage bool_null()

Examples colmake positive_or_null:case(x > 0, x, true, bool_null())

Create a column ‘positive_or_null’ which is either a positive bool, or the null bool.

9.16 case (Case)

Description Return result# if condition# is true. If no condition matches, return NULL. Conditions and results are evaluated in pairs in order of argument. Return type any Domain This is a scalar function (acts on values from each row individually.) Usage case( condition#, ..., result#, ... )

Examples filter b=case(a=true, 'foo', a=false, 'bar')

Filter to return rows where, if a is ‘true’, b equals ‘foo’, else if a is false, b equals ‘bar’. Arguments

Argument Type Required Multiple condition# bool True True result# any True True

176 Chapter 9. List of OPAL functions Observe

9.17 ceil (Ceil)

Description Returns ‘val’ rounded up to the given ‘precision’. Precision defaults to 0, meaning the value will be rounded to the nearest integer. Return type numeric Domain This is a scalar function (acts on values from each row individually.) Usage ceil( val, [ precision ] )

Examples colmake rounded:ceil(@.temperature, 2)

Return the rounded up value of column temperature with 2 decimals Arguments

Argument Type Required Multiple val numeric True False precision int64 False False

9.18 coalesce (Coalesce to first Non-Null)

Description Return the first non-null argument or null if all are null. Arguments must have the sametype. Return type any Domain This is a scalar function (acts on values from each row individually.) Usage coalesce( arg1, arg2, ... )

Examples colmake foo:coalesce(bar, baz, 0)

Replace the value of column ‘foo’ with ‘bar’ if it isn’t null or ‘baz’ if it isn’t null or 0. Arguments

9.17. ceil (Ceil) 177 Observe

Argument Type Required Multiple arg1 any True False arg2 any True True

9.19 contains (Contains)

Description Returns true if string contains expr. Return type bool Domain This is a scalar function (acts on values from each row individually.) Usage contains( str, expr )

Examples filter contains(@.bundle_kind, "kube")

Pass through all bundle kinds that contain the string ‘kube’. Arguments

Argument Type Required Multiple str string True False expr string True False

9.20 count (Count Values)

Description Count the number of non-null items in the group. Return type int64 Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage count( item )

Arguments

178 Chapter 9. List of OPAL functions Observe

Argument Type Required Multiple item any True False

9.21 countdistinct (Count Distinct Fast)

Description Estimate the approximate number of distinct values in the input using hyper-log-log. Return type int64 Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage countdistinct( item )

Arguments

Argument Type Required Multiple item any True False

9.22 countdistinctexact (Count Distinct Exact)

Description Count the exact number of distinct values in the input using complete enumeration (slower than countdistinct). Return type int64 Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage countdistinctexact( item )

Arguments

Argument Type Required Multiple item any True False

9.21. countdistinct (Count Distinct Fast) 179 Observe

9.23 decode_uri (Decode URI)

Description Replace %-encoded escape sequences in a string with unencoded plain text. However, encoded characters in the set #$&+,/:;=?@ are kept encoded. NULL if input contains an invalid encoded sequence. Return type string Domain This is a scalar function (acts on values from each row individually.) Usage decode_uri( str )

Examples colmake d:decode_uri('%32%5E%33%20%3D%20%38%3B')

Result is ‘2^3 %3D 8%3B’ Arguments

Argument Type Required Multiple str string True False

9.24 decode_uri_component (Decode URI Component)

Description Replace all %-encoded escape sequences in a string with unencoded plain text. NULL if input contains an invalid encoded sequence. Return type string Domain This is a scalar function (acts on values from each row individually.) Usage decode_uri_component( str )

Examples colmake dc:decode_uri_component('%32%5E%33%20%3D%20%38%3B')

Result is ‘2^3 = 8;’ Arguments

Argument Type Required Multiple str string True False

180 Chapter 9. List of OPAL functions Observe

9.25 decodebase64 (Decode Base64)

Description DecodeBase64 decodes a base64 encoded input Return type string Domain This is a scalar function (acts on values from each row individually.) Usage decodebase64( str, [ urlSafe ] )

Examples colmake decoded:decodebase64(data)

Decodes the value of the base64 encoded field data colmake decoded:decodebase64(log, true)

Decodes the value of the URL safe base64 encoded field data Arguments

Argument Type Required Multiple str string True False urlSafe bool False False

9.26 denserank (Dense Rank)

Description Returns the dense rank within an ordered group of values. Default ordering is in ascending time, so the first value has the lowest rank. Return type int64 Domain This is a window function (calculates over a group without aggregating rows in window.) Usage denserank()

Examples colmake index:window(denserank(), groupby(category_id), orderby(item_cost))

9.25. decodebase64 (Decode Base64) 181 Observe

Assigns a dense rank to items, when ordered by cost, grouped by category. The cheapest item in each category will be given the rank 1, items with the same cost within the same category will receive the same rank. Dense means there are no gaps assigned between ranks.

9.27 drop_fields (Drop Fields)

Description Drop one or more fields from an object. Return type object Domain This is a scalar function (acts on values from each row individually.) Usage drop_fields( column, key, ... )

Examples colmake smaller:drop_fields(obj, 'key1', 'key2')

Create a column ‘smaller’ based on column ‘obj’ with ‘key1’ and ‘key2’ dropped. Arguments

Argument Type Required Multiple column object True False key string True True

9.28 duration (Make Duration (ns))

Description Convert a number or timestamp, or a time interval, to a duration. Numbers are assumed to be nanoseconds Return type duration Domain This is a scalar function (acts on values from each row individually.) Usage duration( numberOrTimestamp ) duration( start_timestamp, end_timestamp )

182 Chapter 9. List of OPAL functions Observe

Arguments

Argument Type Required Multiple numberOrTimestamp timestamp True False

Arguments

Argument Type Required Multiple start_timestamp timestamp True False end_timestamp timestamp True False

9.29 duration_hr (Make Duration (hr))

Description Convert a number of hours to a duration. Return type duration Domain This is a scalar function (acts on values from each row individually.) Usage duration_hr( hr )

Arguments

Argument Type Required Multiple hr int64 True False

9.30 duration_min (Make Duration (min))

Description Convert a number of minutes to a duration. Return type duration Domain This is a scalar function (acts on values from each row individually.) Usage duration_min( min )

Arguments

9.29. duration_hr (Make Duration (hr)) 183 Observe

Argument Type Required Multiple min int64 True False

9.31 duration_ms (Make Duration (ms))

Description Convert a number of milliseconds to a duration. Return type duration Domain This is a scalar function (acts on values from each row individually.) Usage duration_ms( ms )

Arguments

Argument Type Required Multiple ms int64 True False

9.32 duration_null (Duration Null)

Description Returns a null value of type duration. This is important, because some functions, like case(), return more convenient outputs if all their arguments are of the same type. Return type duration Domain This is a scalar function (acts on values from each row individually.) Usage duration_null()

Examples colmake positive_or_null:case(x > 0, x, true, duration_null())

Create a column ‘positive_or_null’ which is either a positive duration, or the null duration.

184 Chapter 9. List of OPAL functions Observe

9.33 duration_sec (Make Duration (s))

Description Convert a number of seconds to a duration. Return type duration Domain This is a scalar function (acts on values from each row individually.) Usage duration_sec( sec )

Arguments

Argument Type Required Multiple sec int64 True False

9.34 encode_uri (Encode URI)

Description Replace certain characters in a string with %-encoded escape sequences. Letters A-Z a-z, digits 0-9, and !#$&’()*+,- ./:;=?@ remain unchanged. All others are encoded. NULL if input contains an invalid UTF-8 surrogate sequence. Return type string Domain This is a scalar function (acts on values from each row individually.) Usage encode_uri( str )

Examples colmake e:encode_uri('2^3 = 8;')

Result is ‘2%5E3%20=%208;’ Arguments

Argument Type Required Multiple str string True False

9.33. duration_sec (Make Duration (s)) 185 Observe

9.35 encode_uri_component (Encode URI Component)

Description Replace certain characters in a string with %-encoded escape sequences. Letters A-Z a-z, digits 0-9, and !’()*-. remain unchanged. All others are encoded. NULL if input contains an invalid UTF-8 surrogate sequence. Return type string Domain This is a scalar function (acts on values from each row individually.) Usage encode_uri_component( str )

Examples colmake ec:encode_uri_component('2^3 = 8;')

Result is ‘2%5E3%20%3D%208%3B’ Arguments

Argument Type Required Multiple str string True False

9.36 encodebase64 (Encode Base64)

Description EncodeBase64 encodes the input in the base64 format Return type string Domain This is a scalar function (acts on values from each row individually.) Usage encodebase64( str, [ urlSafe ] )

Examples colmake encoded:encodebase64(log)

Encodes the value of the field log in the base64 encoding colmake encoded:encodebase64(log, true)

186 Chapter 9. List of OPAL functions Observe

Encodes the value of the field log in the url safe base64 encoding Arguments

Argument Type Required Multiple str string True False urlSafe bool False False

9.37 endswith (Ends With)

Description Returns true if string ends with expr. Return type bool Domain This is a scalar function (acts on values from each row individually.) Usage endswith( str, expr )

Examples filter endswith(@.bundle_kind, "kube")

Pass through all bundle kinds that end with the string ‘kube’. Arguments

Argument Type Required Multiple str string True False expr string True False

9.38 eq (=)

Description Return true if A is equal to B. Return type bool Domain This is a scalar function (acts on values from each row individually.) Usage eq( a, b )

9.37. endswith (Ends With) 187 Observe

Arguments

Argument Type Required Multiple a any True False b any True False

9.39 exp (Exp)

Description Returns Euler’s number e raised to the given number. Return type float64 Domain This is a scalar function (acts on values from each row individually.) Usage exp( value )

Examples colmake exp_temperature:exp(@.temperature)

Returns a new column exp_temperature with Euler’s number e being raised to the values of the temperature column. Arguments

Argument Type Required Multiple value numeric True False

9.40 first (First)

Description Return the first value of one column across an ordered group. Default ordering is ascending time, so thefirstvalueis the earliest Return type any Domain This is a window function (calculates over a group without aggregating rows in window.) Usage first( expression )

188 Chapter 9. List of OPAL functions Observe

Arguments

Argument Type Required Multiple expression any True False

9.41 first_not_null (First Not Null)

Description Return the first non-null value of one column across an ordered group. Default ordering is ascending time, sothefirst value is the earliest Return type any Domain This is a window function (calculates over a group without aggregating rows in window.) Usage first_not_null( expression )

Arguments

Argument Type Required Multiple expression any True False

9.42 float64 (Make Float)

Description Generate a float representation of the argument value. Return type float64 Domain This is a scalar function (acts on values from each row individually.) Usage float64( value )

Arguments

Argument Type Required Multiple value any True False

9.41. first_not_null (First Not Null) 189 Observe

9.43 float64_null (Float64 Null)

Description Returns a null value of type float64. This is important, because some functions, like case(), return more convenient outputs if all their arguments are of the same type. Return type float64 Domain This is a scalar function (acts on values from each row individually.) Usage float64_null()

Examples colmake positive_or_null:case(x > 0, x, true, float64_null())

Create a column ‘positive_or_null’ which is either a positive float64, or the null float64.

9.44 floor (Floor)

Description Returns ‘val’ rounded down to the given ‘precision’. Precision defaults to 0, meaning the value will be rounded to the nearest integer. Return type numeric Domain This is a scalar function (acts on values from each row individually.) Usage floor( val, [ precision ] )

Examples colmake rounded:floor(@.temperature, 2)

Return the rounded down value of column temperature with 2 decimals Arguments

Argument Type Required Multiple val numeric True False precision int64 False False

190 Chapter 9. List of OPAL functions Observe

9.45 format_time (Format Timestamp)

Description Format the timestamp value in UTC according to the specified format. Format specifiers are detailed in Snowflake’s time format specifiers documentation Return type string Domain This is a scalar function (acts on values from each row individually.) Usage format_time( time, format )

Examples colmake year:format_time(@."Valid From", "YYYY")

Extracts the year from the “Valid From” column colmake formatted:format_time(@."Valid From", 'YYYY-MM-DD"T"HH24:MI:SSTZH:TZM')

Format the “Valid From” column according to ISO 8601 colmake formatted:format_time(@."Valid From", 'DY MON DD HH24:MI:SS YYYY')

Format the “Valid From” column similar to ctime’s format colmake formatted:format_time(@."Valid From", 'MM/DD/YYYY HH12:MI:SS')

Format the “Valid From” column using US’s date format and a 12-hour clock. Arguments

Argument Type Required Multiple time timestamp True False format string True False

9.46 frame (Frame)

Description Specify the relative time frame for a window context. The frame will start from the current row’s “Valid From” time minus “back”, and end at “Valid From” plus “ahead” (both ends are inclusive). For better performance, the window frame boundaries may not be exact and can deviate by at most 1/120th of the total frame size (or 10 seconds, whichever is larger). To make the window boundaries exact, at the cost of slower performance, use the frame_exact() function. Return type frame Domain

9.45. format_time (Format Timestamp) 191 Observe

This is a scalar function (acts on values from each row individually.) Usage frame( back, ahead )

Examples colmake avg:window(avg(load), groupby(host), orderby(time), frame(back:10m))

Compute the moving average of system load within the past 10 minutes of each event Arguments

Argument Type Required Multiple back expression True False ahead expression True False

9.47 frame_exact (Frame Exact)

Description Specify the relative time frame for a window context. This is the exact version of frame(), where the window frame start and end times are exactly “Valid From” minus “back” and “Valid To” plus “ahead”. Evaluation of exact window frames can be slow when the data volume is large. Return type frame Domain This is a scalar function (acts on values from each row individually.) Usage frame_exact( back, ahead )

Examples colmake avg:window(avg(load), groupby(host), orderby(time), frame_exact(back:10m))

Compute the moving average of system load within the past 10 minutes of each event Arguments

Argument Type Required Multiple back expression True False ahead expression True False

192 Chapter 9. List of OPAL functions Observe

9.48 groupby (Group By)

Description Grouping/partitioning in which to process data Return type grouping Domain This is a scalar function (acts on values from each row individually.) Usage groupby( columnname, ... )

Arguments

Argument Type Required Multiple columnname expression True True

9.49 gt (>)

Description Return true if A is strictly greater than B. Return type bool Domain This is a scalar function (acts on values from each row individually.) Usage gt( a, b )

Arguments

Argument Type Required Multiple a any True False b any True False

9.48. groupby (Group By) 193 Observe

9.50 gte (>=)

Description Return true if A is greater than or equal to B. Return type bool Domain This is a scalar function (acts on values from each row individually.) Usage gte( a, b )

Arguments

Argument Type Required Multiple a any True False b any True False

9.51 hash (Hash)

Description Accepts a variable number of arguments of arbitrary types and returns a signed 64-bit hash. Never returns null. Not a cryptographic hash function and should not be used as such. Return type numeric Domain This is a scalar function (acts on values from each row individually.) Usage hash( args, ... )

Examples colmake h:hash(@.x, @.y)

Create a column ‘h’ with the result of hashing the columns ‘x’ and ‘y’ Arguments

Argument Type Required Multiple args any True True

194 Chapter 9. List of OPAL functions Observe

9.52 if (If)

Description Return the second argument if condition# is true, otherwise return third argument. Arguments must have the same general type. Return type any Domain This is a scalar function (acts on values from each row individually.) Usage if( condition, ontrue, onfalse )

Examples filter b=if(a=true, 'foo', 'bar')

Filter input to rows where b is equal to ‘foo’ if ‘a’ is true and ‘bar’ otherwise Arguments

Argument Type Required Multiple condition bool True False ontrue any True False onfalse any True False

9.53 ifnull (Replace Null)

Description Return the second argument if the first argument has the null value. Arguments must have the same type. Return type any Domain This is a scalar function (acts on values from each row individually.) Usage ifnull( arg, replacement )

Examples colmake foo:ifnull(foo, 0)

Replace the value of column ‘foo’ with the value 0 if the value is null. Arguments

9.52. if (If) 195 Observe

Argument Type Required Multiple arg any True False replacement any True False

9.54 int64 (Make Integer)

Description Generate a int representation of the argument value. Float values are rounded to the nearest integer, and timestamps are converted to nanosecond epoch values. Return type int64 Domain This is a scalar function (acts on values from each row individually.) Usage int64( value )

Arguments

Argument Type Required Multiple value any True False

9.55 int64_null (Int64 Null)

Description Returns a null value of type int64. This is important, because some functions, like case(), return more convenient outputs if all their arguments are of the same type. Return type int64 Domain This is a scalar function (acts on values from each row individually.) Usage int64_null()

Examples colmake positive_or_null:case(x > 0, x, true, int64_null())

Create a column ‘positive_or_null’ which is either a positive int64, or the null int64.

196 Chapter 9. List of OPAL functions Observe

9.56 isnull (Test Null)

Description Return true if the argument has the null value. Return type bool Domain This is a scalar function (acts on values from each row individually.) Usage isnull( arg )

Arguments

Argument Type Required Multiple arg any True False

9.57 lag (Lag)

Description Return the lag of one column across an ordered group. Default ordering is ascending time, so the lag value is the most recent prior value Return type any Domain This is a window function (calculates over a group without aggregating rows in window.) Usage lag( expression, lagby )

Arguments

Argument Type Required Multiple expression any True False lagby int64 True False

9.56. isnull (Test Null) 197 Observe

9.58 last (Last)

Description Return the last value of one column across an ordered group. Default ordering is ascending time, so the last value is the latest Return type any Domain This is a window function (calculates over a group without aggregating rows in window.) Usage last( expression )

Examples colmake last_customer:window(last(customer_id), groupby(category))

Find out the customer that appears last within each category. Arguments

Argument Type Required Multiple expression any True False

9.59 last_not_null (Last Not Null)

Description Return the last non-null value of one column across an ordered group. Default ordering is ascending time, so the last value is the latest Return type any Domain This is a window function (calculates over a group without aggregating rows in window.) Usage last_not_null( expression )

Examples colmake last_value:window(last_not_null(value), groupby(kind))

Find out the last non-null within each kind. Arguments

Argument Type Required Multiple expression any True False

198 Chapter 9. List of OPAL functions Observe

9.60 lead (Lead)

Description Return the lead of one column across an ordered group. Default ordering is ascending time, so the lead value is the next value Return type any Domain This is a window function (calculates over a group without aggregating rows in window.) Usage lead( expression, leadby )

Arguments

Argument Type Required Multiple expression any True False leadby int64 True False

9.61 left (Left)

Description Returns a leftmost substring of its input. Return type string Domain This is a scalar function (acts on values from each row individually.) Usage left( value, length )

Examples colmake leftstring:left(somestring, 4)

Will make a ‘leftstring’ column with the left 4 characters of the text in the ‘somestring’ column. Arguments

Argument Type Required Multiple value string True False length int64 True False

9.60. lead (Lead) 199 Observe

9.62 like (Like)

Description Returns true if subject matches pattern (case-sensitive). Within pattern, escape can be included to denote that the character following it be interpreted literally. The arguments pattern and escape must be string literals. The argument pattern may include wildcards _ (matches exactly one character) and % (matches zero or more characters). If included, escape may only be a single character. If using backslash as the escape character, it must be escaped in the escape clause (see example). Return type bool Domain This is a scalar function (acts on values from each row individually.) Usage like( subject, pattern, [ escape ] )

Examples filter like(@.bundle_kind, "%kube%")

Pass through all bundle kinds that contain the string ‘kube’. filter like(@.bundle_kind, "k^%__", "^")

Pass through all bundle kinds that are 4 characters long and begin with ‘k%’. filter like(@.bundle_kind, "k\\%__", "\\")

Pass through all bundle kinds that are 4 characters long and begin with ‘k%’. Arguments

Argument Type Required Multiple subject string True False pattern string True False escape string False False

9.63 ln (Ln)

Description Returns natural logarithm of a numeric expression. The value should be greater than 0. Return type float64 Domain This is a scalar function (acts on values from each row individually.) Usage

200 Chapter 9. List of OPAL functions Observe

ln( value )

Examples colmake ln_val:ln(@.temperature)

Returns the natural logarithm of column temperature Arguments

Argument Type Required Multiple value numeric True False

9.64 log (Log)

Description Returns logarithm of a numeric expression (second argument) with the provided base (first argument). The base should be greater than 0 and not exactly 1.0 and the value should be greater than 0. Return type float64 Domain This is a scalar function (acts on values from each row individually.) Usage log( base, value )

Examples colmake log_val:log(10, @.temperature)

Returns the logarithm of column temperature at base 10 Arguments

Argument Type Required Multiple base numeric True False value numeric True False

9.65 lower (Lowercase)

Description Return the input string in lowercase. Return type string Domain

9.64. log (Log) 201 Observe

This is a scalar function (acts on values from each row individually.) Usage lower( value )

Arguments

Argument Type Required Multiple value string True False

9.66 lpad (Left Pad)

Description Left pads a string with characters from another string, default pad string is whitespace Return type string Domain This is a scalar function (acts on values from each row individually.) Usage lpad( str, length, [ pad ] )

Examples colmake padded_kind:lpad(bundle_kind, 100, '%')

Create a new column ‘padded_kind’ which has a length of 100 characters, with missing characters added as percents on the left.” Arguments

Argument Type Required Multiple str string True False length int64 True False pad string False False

9.67 lt (<)

Description Return true if A is strictly less than B. Return type bool Domain This is a scalar function (acts on values from each row individually.)

202 Chapter 9. List of OPAL functions Observe

Usage lt( a, b )

Arguments

Argument Type Required Multiple a any True False b any True False

9.68 lte (<=)

Description Return true if A is less than or equal to B. Return type bool Domain This is a scalar function (acts on values from each row individually.) Usage lte( a, b )

Arguments

Argument Type Required Multiple a any True False b any True False

9.69 ltrim (Left Trim)

Description LTrim removes leading characters from a string Return type string Domain This is a scalar function (acts on values from each row individually.) Usage ltrim( str, [ chars ] )

Examples colmake trimmed_kind:ltrim(bundle_kind, '')

9.68. lte (<=) 203 Observe

Removes leading spaces from bundle kinds Arguments

Argument Type Required Multiple str string True False chars string False False

9.70 make_fields (Make Fields)

Description Extend an existing object with new fields. Return type object Domain This is a scalar function (acts on values from each row individually.) Usage make_fields( column, entries, ... )

Examples colmake larger:make_fields(obj, key1:'value1', key2:'value2')

Create a column ‘larger’ based on column ‘obj’ with two key value pairs added (existing fields with the same name will be replaced). Arguments

Argument Type Required Multiple column object True False entries expression True True

9.71 makeobject (Make Object)

Description Turn a sequence of name:value elements into an object. Return type object Domain This is a scalar function (acts on values from each row individually.) Usage makeobject( [ key, ... ] )

204 Chapter 9. List of OPAL functions Observe

Examples colmake obj:makeobject(label:"speed", value:distance/(endtime-starttime), attime:endtime)

Make a new column ‘obj’ consisting of an object with keys ‘label’, ‘value’, and ‘attime’. colmake obj:makeobject()

Make a new column ‘obj’ consisting of an empty object that is not ‘null’. Arguments

Argument Type Required Multiple key expression False True

9.72 max (Maximum)

Description Compute the maximum of one column across a group (with one argument) or the scalar greatest value of its arguments (with more than one argument.) Return type any Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage max( expression, ... )

Arguments

Argument Type Required Multiple expression numeric True True

9.73 median (Median)

Description Return the fast approximate median value of one column. Return type float64 Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage

9.72. max (Maximum) 205 Observe

median( expression )

Arguments

Argument Type Required Multiple expression numeric True False

9.74 medianexact (Median Exact)

Description Return the exact median value of one column. Return type any Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage medianexact( expression )

Arguments

Argument Type Required Multiple expression numeric True False

9.75 metric (Metric)

Description Select the metrics in the rollup verb. Return type options Domain This is a scalar function (acts on values from each row individually.) Usage metric( name, [ filter ], [ label ], [ type ], [ unit ], [ description ], [ rollup ], [␣

˓→aggregate ] )

Examples metric("requests_total", return_code >= 400 and return_code <= 599)

Select the metric ‘requests_total’ within this dataset where return_code is between 400 and 599. The dataset must already implement the “metric” interface.

206 Chapter 9. List of OPAL functions Observe

metric("requests_total", label:"Request Rate", type:"cumulativeCounter", unit:"1/s",␣

˓→description:"Number of requests processed per second.", rollup:"rate", aggregate:"sum")

Select the metric ‘requests_total’ for rollup, and overwrite the new metric’s definition with the specified label, type, unit, description, rollup method and aggregate method. The dataset must already implement the “metric” interface. Arguments

Argument Type Required Multiple name string True False filter bool False False label expression False False type expression False False unit expression False False description expression False False rollup expression False False aggregate expression False False

9.76 milliseconds (Milliseconds)

Description Given a numeric value representing milliseconds since epoch, return a timestamp of that point in time. Return type timestamp Domain This is a scalar function (acts on values from each row individually.) Usage milliseconds( number )

Examples colmake out_t_ms:milliseconds(in_fld)

Treat in_fld as milliseconds since epoch, and set out_t_ms to a timestamp representing thattime. Arguments

Argument Type Required Multiple number numeric True False

9.76. milliseconds (Milliseconds) 207 Observe

9.77 min (Minimum)

Description Compute the minimum of one column across a group (with one argument) or the scalar least value of its arguments (with more than one argument.) Return type any Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage min( expression, ... )

Arguments

Argument Type Required Multiple expression numeric True True

9.78 mod (Modulo)

Description Returns the the remainder when dividend is divided by the divisor. Return type int64 Domain This is a scalar function (acts on values from each row individually.) Usage mod( dividend, divisor )

Examples colmake remaining:mod(@.billed_days, 7)

Create a column ‘remaining’ with the number of billed days that are not counted in weeks Arguments

Argument Type Required Multiple dividend int64 True False divisor int64 True False

208 Chapter 9. List of OPAL functions Observe

9.79 nanoseconds (Nanoseconds)

Description Given a numeric value representing nanoseconds since epoch, return a timestamp of that point in time. Return type timestamp Domain This is a scalar function (acts on values from each row individually.) Usage nanoseconds( number )

Examples colmake out_t_ns:nanoseconds(in_fld)

Treat in_fld as nanoseconds since epoch, and set out_t_ns to a timestamp representing thattime. Arguments

Argument Type Required Multiple number numeric True False

9.80 ne (<>)

Description Return true if A is not equal to B. Return type bool Domain This is a scalar function (acts on values from each row individually.) Usage ne( a, b )

Arguments

Argument Type Required Multiple a any True False b any True False

9.79. nanoseconds (Nanoseconds) 209 Observe

9.81 numeric_null (Numeric Null)

Description Returns a null value of type numeric. This is important, because some functions, like case(), return more convenient outputs if all their arguments are of the same type. Return type numeric Domain This is a scalar function (acts on values from each row individually.) Usage numeric_null()

Examples colmake positive_or_null:case(x > 0, x, true, numeric_null())

Create a column ‘positive_or_null’ which is either a positive numeric, or the null numeric.

9.82 object (To Object)

Description Convert a datum into an object or NULL if conversion is impossible Return type object Domain This is a scalar function (acts on values from each row individually.) Usage object( value )

Examples colmake obj:object(parsejson(json))

Make a new column ‘obj’ consisting of the JSON objects parsed from the ‘json’ column. Arguments

Argument Type Required Multiple value any True False

210 Chapter 9. List of OPAL functions Observe

9.83 object_agg (Object Aggregation)

Description Returns one OBJECT per group. For each (key, value) input pair, the resulting object contains a key:value field. The key column needs to be string. Duplicate keys within a group result in an error, and input tuples with NULL key and/or value are ignored. Return type object Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage object_agg( key, value )

Examples statsby clusterid, oa:object_agg(jobid, status), groupby(clusterid)

For each clusterid, return a JSON object containing the status of each jobid with the attached clusterid. Arguments

Argument Type Required Multiple key string True False value any True False

9.84 object_keys (Object Keys)

Description Get array of object keys (field names from object). Return type array Domain This is a scalar function (acts on values from each row individually.) Usage object_keys( value )

Arguments

Argument Type Required Multiple value object True False

9.83. object_agg (Object Aggregation) 211 Observe

9.85 object_null (Object Null)

Description Returns a null value of type object. This is important, because some functions, like case(), return more convenient outputs if all their arguments are of the same type. Return type object Domain This is a scalar function (acts on values from each row individually.) Usage object_null()

Examples colmake positive_or_null:case(x > 0, x, true, object_null())

Create a column ‘positive_or_null’ which is either a positive object, or the null object.

9.86 options (Options)

Description Specify options to change the verb’s behavior Return type options Domain This is a scalar function (acts on values from each row individually.) Usage options( [ keyvalue, ... ] )

Arguments

Argument Type Required Multiple keyvalue expression False True

212 Chapter 9. List of OPAL functions Observe

9.87 orderby (Order By)

Description Order in which to process data Return type any Domain This is a scalar function (acts on values from each row individually.) Usage orderby( [ columnname, ... ], [ descending, ... ] )

Arguments

Argument Type Required Multiple columnname expression False True descending bool False True

9.88 parsehex (Parse Hex)

Description Parses a string encoded hex number and returns an int64. Values should be strings, not start with 0x, and only include valid hex characters. NULL will be returned in case of an error Return type int64 Domain This is a scalar function (acts on values from each row individually.) Usage parsehex( hexstr )

Examples colmake hexid:parsehex(hexid)

Will change the hexid column from a hex string to an int64 value. Arguments

Argument Type Required Multiple hexstr string True False

9.87. orderby (Order By) 213 Observe

9.89 parseip (parseIp)

Description When the input is an IPv(4/6) address, returns a JSON object containing the following attributes - family (either “4” or “6”), host (passed host IP), ip_fields (Array of 4 32-bit integers each representing 32 bits from the given IP),ip_type (always “inet”), netmask_prefix_length (Always null). When the address is an IPv4 address, an attribute ipv4(integer representation of the address) is also added, when it’s an IPv6 address, an attribute hex_ipv6 (integer representation of the address in hexadecimal) is also added. When the input is an IPv4 subnet mask, along with the above attributes, the following are also returned - ipv4_range_start (integer representation of the least IP address in the given range) , ipv4_range_end (integer repre- sentation of the highest IP address in the given range), netmask_prefix_length (length of the subnet mask). When the input is an IPv6 subnet mask, these attributes are prefixed with “hex_ipv6” instead of “ipv4” and the corresponding values are in hexadecimal. Return type object Domain This is a scalar function (acts on values from each row individually.) Usage parseip( arg )

Examples colmake ip:parseip(@.x)

Creates a column named ‘ip’ containing the returned JSON object Arguments

Argument Type Required Multiple arg string True False

9.90 parseisotime (Parse ISO8601/RFC3339 Timestamp)

Description Parse a YYYY-MM-DDTHH:MM:SSZ-formatted string as a timestamp. Return type timestamp Domain This is a scalar function (acts on values from each row individually.) Usage parseisotime( value )

214 Chapter 9. List of OPAL functions Observe

Arguments

Argument Type Required Multiple value string True False

9.91 parsejson (Parse JSON)

Description Parse the argument value as a JSON string. Return type any Domain This is a scalar function (acts on values from each row individually.) Usage parsejson( value )

Arguments

Argument Type Required Multiple value string True False

9.92 parsekvs (Parse key=value Pairs)

Description Returns an object of key=value pairs extracted from an input string. Return type object Domain This is a scalar function (acts on values from each row individually.) Usage parsekvs( value )

Examples colmake keyvals:parsekvs(log)

Make a new object column ‘keyvals’ that contains key=value pairs extracted from string column ‘log’ Arguments

Argument Type Required Multiple value string True False

9.91. parsejson (Parse JSON) 215 Observe

9.93 parseurl (ParseUrl)

Description Returns a JSON object consisting of all the components (fragment, host, path, port, query, scheme). Return type object Domain This is a scalar function (acts on values from each row individually.) Usage parseurl( arg )

Examples colmake url:parseurl(@.x)

Creates a column named ‘url’ returning a JSON object that holds the components (fragment, host, path, port, query, scheme) of the input URL Arguments

Argument Type Required Multiple arg string True False

9.94 path_exists (Path Exists)

Description Given a column and path, return whether the JSON path exists in that column. Must have a valid column. Return type bool Domain This is a scalar function (acts on values from each row individually.) Usage path_exists( column, path )

Examples path_exists(A, 'C')

Returns whether JSON path ‘C’ exists in column A. path_exists(A, 'B.C')

Returns whether JSON path ‘B.C’ exists in column A. A row containing the path may be { B: { C: 1} }.

216 Chapter 9. List of OPAL functions Observe

path_exists(@aaa.B, 'C')

Returns whether JSON path ‘C’ exists in column A of input ‘aaa’ Arguments

Argument Type Required Multiple column object True False path string True False

9.95 percentile (Percentile)

Description Returns an approximated value for the specified percentile of the input expression across the group. Percentile needs to be specified in the range of 0 to1.0. Return type numeric Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage percentile( expression, percentile )

Arguments

Argument Type Required Multiple expression numeric True False percentile numeric True False

9.96 percentilecont (Percentile Cont)

Description Assuming a continuous distribution, it returns the value for the specified percentile of the input expression across the group. Percentile needs to be specified in the range of 0 to1.0. Return type numeric Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage percentilecont( expression, percentile )

9.95. percentile (Percentile) 217 Observe

Arguments

Argument Type Required Multiple expression numeric True False percentile numeric True False

9.97 percentiledisc (Percentile Disc)

Description Assuming a discrete distribution, it returns the value for the specified percentile of the input expression across the group. Percentile needs to be specified in the range of 0 to1.0. Return type numeric Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage percentiledisc( expression, percentile )

Arguments

Argument Type Required Multiple expression numeric True False percentile numeric True False

9.98 pick_fields (Pick Fields)

Description Pick one or more fields from an object. Return type object Domain This is a scalar function (acts on values from each row individually.) Usage pick_fields( column, key, ... )

Examples colmake smaller:pick_fields(obj, 'key1', 'key2')

218 Chapter 9. List of OPAL functions Observe

Create a column ‘smaller’ based on column ‘obj’ only containing ‘key1’ and ‘key2’ if existing. Arguments

Argument Type Required Multiple column object True False key string True True

9.99 position (Position)

Description Searches for the first occurrence of the second argument (needle) in the first argument (haystack) and, if successful, returns the needle’s position (0-based). Returns -1 if the needle is not found. Return type int64 Domain This is a scalar function (acts on values from each row individually.) Usage position( haystack, needle, [ start ] )

Examples colmake p:position('fuzzy wuzzy', 'uzzy', 5)

Looks for the first occurrence of the needle ‘uzzy’ starting at index 5 (0-based) in the haystack ‘fuzzy wuzzy’. Returns 7. Arguments

Argument Type Required Multiple haystack string True False needle string True False start int64 False False

9.100 pow (Pow)

Description Returns a number ‘base’ raised to the specified power ‘exponent’. Return type numeric Domain This is a scalar function (acts on values from each row individually.) Usage

9.99. position (Position) 219 Observe

pow( base, exponent )

Examples colmake power:pow(@.x, @.y)

Create a column ‘power’ with the result of column ‘base’ to the power of column ‘exponent’ Arguments

Argument Type Required Multiple base numeric True False exponent numeric True False

9.101 primarykey (Primary Key)

Description Specify the primary key for some verbs Return type primarykey Domain This is a scalar function (acts on values from each row individually.) Usage primarykey( columnname, ... )

Arguments

Argument Type Required Multiple columnname expression True True

9.102 queryendtime (Query End Time)

Description Returns the latest time of the query time window. Return type timestamp Domain This is a scalar function (acts on values from each row individually.) Usage queryendtime()

Examples

220 Chapter 9. List of OPAL functions Observe

colmake is_following:(queryendtime() < some_time_col)

Create a column is_following, which is true if the some_time_col column contains a time later than the query end time.

9.103 querystarttime (Query Start Time)

Description Returns the earliest time of the query time window. Return type timestamp Domain This is a scalar function (acts on values from each row individually.) Usage querystarttime()

Examples colmake is_previous:(querystarttime() > some_time_col)

Create a column is_previous, which is true if the some_time_col column contains a time earlier than the query start time.

9.104 rank (Rank)

Description Returns the rank within an ordered group of values. Default ordering is in ascending time, so the first value has the lowest rank. Return type int64 Domain This is a window function (calculates over a group without aggregating rows in window.) Usage rank()

Examples colmake index:window(rank(), groupby(category_id), orderby(item_cost))

Assigns a rank to items, when ordered by cost, grouped by category. The cheapest item in each category will be given the rank 1, items with the same cost within the same category will receive the same rank.

9.103. querystarttime (Query Start Time) 221 Observe

9.105 regex_match (RegEx Match)

Description Return true if the argument string matches the argument regular expression. Use ‘i’ option for case insensitive match. For more about syntax, see POSIX extended regular expressions. Return type bool Domain This is a scalar function (acts on values from each row individually.) Usage regex_match( candidate, regex, [ flags ] )

Arguments

Argument Type Required Multiple candidate string True False regex regex True False flags string False False

9.106 regex_replace (RegexReplace)

Description Replaces all instances of a matched regex pattern in the input string with a provided value. The first parameter specifies the input string, the second parameter specifies the regex pattern, the thirdparameter specifies the replacement. If the replacement is empty all matched patterns are removed. The fourth parameter specifies which occurrences are to be replaced. If 0 is specified all occurrences are replaced. The fifth parameter specifies optional regex flags: • c - Enables case-sensitive matching. • i - Enables case-insensitive matching. • m - Enables multi-line mode (i.e. meta-characters ^ and $ mark the beginning and end of any line of the subject). By default, multi-line mode is disabled (i.e. ^ and $ mark the beginning and end of the entire subject). • s - Enables the POSIX wildcard character . to match \n. By default, wildcard character matching is disabled. For more about syntax, see POSIX extended regular expressions. Return type string Domain This is a scalar function (acts on values from each row individually.) Usage regex_replace( str, pattern, replacement, [ occurrence ], [ flags ] )

222 Chapter 9. List of OPAL functions Observe

Examples colmake date:"2001-31-12" colmake new_date:regex_replace(date, /^.*([0-9]{4,4})-([0-9]{1,2})-([0-9]{1,2}).*$/,'\\3/

˓→\\2/\\1', 0, 's')

Use parenthesis to encapsulate groups and refer to them via the double backslash and their index, starting from 1. Arguments

Argument Type Required Multiple str string True False pattern regex True False replacement string True False occurrence int64 False False flags string False False

9.107 replace (Replace)

Description Replaces all instances of the substring in the input string with a provided value. Return type string Domain This is a scalar function (acts on values from each row individually.) Usage replace( value, substring, replacement )

Arguments

Argument Type Required Multiple value string True False substring string True False replacement string True False

9.108 right (Right)

Description Returns a rightmost substring of its input. Return type string Domain This is a scalar function (acts on values from each row individually.)

9.107. replace (Replace) 223 Observe

Usage right( value, length )

Examples colmake rightstring:right(somestring, 4)

Will make a ‘rightstring’ column with the right 4 characters of the text in the ‘somestring’ column. Arguments

Argument Type Required Multiple value string True False length int64 True False

9.109 round (Round)

Description Returns ‘val’ rounded to the given ‘precision’. Precision defaults to 0, meaning the value will be rounded to the nearest integer. Return type numeric Domain This is a scalar function (acts on values from each row individually.) Usage round( val, [ precision ] )

Examples colmake rounded:round(@.temperature, 2)

Return the rounded value of column temperature with 2 decimals Arguments

Argument Type Required Multiple val numeric True False precision int64 False False

224 Chapter 9. List of OPAL functions Observe

9.110 row_endtime (Row End Time)

Description Returns the time at which the state in the row ended, or null for non-resource datasets. Return type timestamp Domain This is a scalar function (acts on values from each row individually.) Usage row_endtime()

Examples colmake time_taken:ifnull(row_endtime() - row_timestamp(), 0s)

Create a column time_taken that is the duration of the state within the row.

9.111 row_timestamp (Row Timestamp)

Description Returns the timestamp (start time) of the row. Return type timestamp Domain This is a scalar function (acts on values from each row individually.) Usage row_timestamp()

Examples colmake time_taken:ifnull(row_endtime() - row_timestamp(), 0s)

Create a column time_taken that is the duration of the state within the row.

9.112 rownumber (Row Number)

Description Return the window index of the row within its groupby, when ordered by the orderby. Row indexes start at 1. Return type int64 Domain

9.110. row_endtime (Row End Time) 225 Observe

This is a window function (calculates over a group without aggregating rows in window.) Usage rownumber()

Examples colmake index:window(rownumber(), groupby(category_id), orderby(item_cost))

Assigns an index to items, when ordered by cost, grouped by category. The cheapest item in each category will be given the index 1.

9.113 rpad (Right Pad)

Description Right pads a string with characters from another string, default pad string is whitespace Return type string Domain This is a scalar function (acts on values from each row individually.) Usage rpad( str, length, [ pad ] )

Examples colmake padded_kind:rpad(bundle_kind, 100, '%')

Create a new column ‘padded_kind’ which has a length of 100 characters, with missing characters added as percents on the right.” Arguments

Argument Type Required Multiple str string True False length int64 True False pad string False False

9.114 rtrim (Right Trim)

Description RTrim removes trailing characters from a string Return type string Domain

226 Chapter 9. List of OPAL functions Observe

This is a scalar function (acts on values from each row individually.) Usage rtrim( str, [ chars ] )

Examples colmake trimmed_kind:rtrim(bundle_kind, '')

Removes trailing spaces from bundle kinds Arguments

Argument Type Required Multiple str string True False chars string False False

9.115 search (Search)

Description Return true if the ‘for’ text is matched to ‘in’ Return type bool Domain This is a scalar function (acts on values from each row individually.) Usage search( in, for, ..., value )

Arguments

Argument Type Required Multiple in string True False for string True True value any True False

9.116 seconds (Seconds)

Description Given a numeric value representing seconds since epoch, return a timestamp of that point in time. Return type timestamp Domain This is a scalar function (acts on values from each row individually.)

9.115. search (Search) 227 Observe

Usage seconds( number )

Examples colmake out_t_s:seconds(in_fld)

Treat in_fld as seconds since epoch, and set out_t_s to a timestamp representing thattime. Arguments

Argument Type Required Multiple number numeric True False

9.117 split (Split)

Description Splits the string into an array, based on the separator. Return type array Domain This is a scalar function (acts on values from each row individually.) Usage split( value, separator )

Arguments

Argument Type Required Multiple value string True False separator string True False

9.118 split_part (Split Part)

Description Splits a given string at a specified character and returns the requested part. Part is 1-based. If part is a negative value, the parts are counted backward from the end of the string. If any parameter is null, this function returns null. Return type string Domain This is a scalar function (acts on values from each row individually.) Usage

228 Chapter 9. List of OPAL functions Observe

split_part( value, delimiter, part )

Examples split_part("03/04/2021", "/", 2)

Returns “04”. Arguments

Argument Type Required Multiple value string True False delimiter string True False part int64 True False

9.119 sqrt (Sqrt)

Description Returns the square root for a given input and null if input is negative. Return type numeric Domain This is a scalar function (acts on values from each row individually.) Usage sqrt( arg )

Examples colmake squareroot:sqrt(@.x)

Create a column ‘squareroot’ with the result of the square root of column ‘x’ Arguments

Argument Type Required Multiple arg numeric True False

9.120 startswith (Starts With)

Description Returns true if string starts with expr. Return type bool Domain

9.119. sqrt (Sqrt) 229 Observe

This is a scalar function (acts on values from each row individually.) Usage startswith( haystack, needle )

Examples filter startswith(@.bundle_kind, "kube")

Pass through all bundle kinds that start with the string ‘kube’. Arguments

Argument Type Required Multiple haystack string True False needle string True False

9.121 stddev (Standard Deviation)

Description Calculate the standard deviation across the group. Return type numeric Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage stddev( value )

Arguments

Argument Type Required Multiple value numeric True False

9.122 strcat (String Concat)

Description Return the concatenation of all string arguments. Return type string Domain This is a scalar function (acts on values from each row individually.) Usage

230 Chapter 9. List of OPAL functions Observe

strcat( str, ... )

Examples colmake foo:strcat(colstr1, " ", colstr2)

Make a new column ‘foo’ that is the concatenation of ‘colstr1’, ” “, and ‘colstr2’ Arguments

Argument Type Required Multiple str string True True

9.123 string (Make STRING)

Description Generate a string representation of the argument value. Return type string Domain This is a scalar function (acts on values from each row individually.) Usage string( value )

Arguments

Argument Type Required Multiple value any True False

9.124 string_agg (String/List Aggregation)

Description Returns concatenated input values, separated by the delimiter. The expression must be a string column. The delimiter must be a string constant (it may be an empty string).If no ordering is specified, the default ordering is by ‘valid_from’, ascending. Return type string Domain This is an aggregate function (aggregates rows over a group in statsby.) Usage string_agg( expr, delimiter, [ orderby ] )

9.123. string (Make STRING) 231 Observe

Examples statsby nicknames:string_agg(nickname, ", ", orderby(email)), groupby(uid, fullname)

A list of all nicknames for each individual in the organization; for users with more than one nickname they will be sorted by email address. Arguments

Argument Type Required Multiple expr string True False delimiter string True False orderby ordering False False

9.125 string_null (String Null)

Description Returns a null value of type string. This is important, because some functions, like case(), return more convenient outputs if all their arguments are of the same type. Return type string Domain This is a scalar function (acts on values from each row individually.) Usage string_null()

Examples colmake positive_or_null:case(x > 0, x, true, string_null())

Create a column ‘positive_or_null’ which is either a positive string, or the null string.

9.126 strlen (String Length)

Description Compute the length of an input string. Return type int64 Domain This is a scalar function (acts on values from each row individually.) Usage strlen( value )

232 Chapter 9. List of OPAL functions Observe

Arguments

Argument Type Required Multiple value string True False

9.127 substring (Substring)

Description Extracts characters from a string, starting at an index. Negative indices count from the end of the string. Positive indices start at 0. Takes an optional length parameter. Return type string Domain This is a scalar function (acts on values from each row individually.) Usage substring( value, start, [ length ] )

Arguments

Argument Type Required Multiple value string True False start int64 True False length int64 False False

9.128 sum (Sum)

Description Calculate the sum of the argument across the group, or of the scalar arguments if more than one. Return type any Domain This is a window function (calculates over a group without aggregating rows in window.) This is also an aggregate function (aggregates rows over a group in statsby.) Usage sum( item, ... )

Arguments

Argument Type Required Multiple item numeric True True

9.127. substring (Substring) 233 Observe

9.129 timestamp_ms (Milliseconds)

Description Given a numeric value representing milliseconds since epoch, return a timestamp of that point in time. Return type timestamp Domain This is a scalar function (acts on values from each row individually.) Usage timestamp_ms( number )

Examples colmake out_ms:timestamp_ms(in_fld)

Treat in_fld as milliseconds since epoch, and set out_ms to a timestamp representing thattime. Arguments

Argument Type Required Multiple number numeric True False

9.130 timestamp_ns (Nanoseconds)

Description Given a numeric value representing nanoseconds since epoch, return a timestamp of that point in time. Return type timestamp Domain This is a scalar function (acts on values from each row individually.) Usage timestamp_ns( number )

Examples colmake out_ns:timestamp_ns(in_fld)

Treat in_fld as nanoseconds since epoch, and set out_ns to a timestamp representing thattime. Arguments

Argument Type Required Multiple number numeric True False

234 Chapter 9. List of OPAL functions Observe

9.131 timestamp_null (Timestamp Null)

Description Returns a null value of type timestamp. This is important, because some functions, like case(), return more convenient outputs if all their arguments are of the same type. Return type timestamp Domain This is a scalar function (acts on values from each row individually.) Usage timestamp_null()

Examples colmake positive_or_null:case(x > 0, x, true, timestamp_null())

Create a column ‘positive_or_null’ which is either a positive timestamp, or the null timestamp.

9.132 timestamp_s (Seconds)

Description Given a numeric value representing seconds since epoch, return a timestamp of that point in time. Return type timestamp Domain This is a scalar function (acts on values from each row individually.) Usage timestamp_s( number )

Examples colmake out_s:timestamp_s(in_fld)

Treat in_fld as seconds since epoch, and set out_s to a timestamp representing thattime. Arguments

Argument Type Required Multiple number numeric True False

9.131. timestamp_null (Timestamp Null) 235 Observe

9.133 tokenize (Tokenize)

Description Splits the string into an array based on separator, which is treated as a set of characters. The default separator is ” “. Return type array Domain This is a scalar function (acts on values from each row individually.) Usage tokenize( value, [ separator ] )

Examples tokenize("a b c d")

Returns [“a”, “b”, “c”, “d”]. tokenize("[email protected]", "@.")

Returns [“hello”, “example”, “com”]. Arguments

Argument Type Required Multiple value string True False separator string False False

9.134 tokenize_part (Tokenize Part)

Description Tokenizes the input string using the delimiter and returns the requested part. The delimiter is treated as a set of charac- ters. Each character in the delimiter string is a delimiter. The default delimiter is ” “. If the delimiter is empty, and the string is empty, then the function returns NULL. If the delimiter is empty, and the string is non empty, then the whole string will be treated as one token. The part argument is 1-based and the default value is 1. If the part number is out of range, then NULL is returned Return type string Domain This is a scalar function (acts on values from each row individually.) Usage tokenize_part( value, [ delimiter ], [ part ] )

Examples

236 Chapter 9. List of OPAL functions Observe

tokenize_part("a b c d")

Returns “a”. tokenize_part("[email protected]", "@.")

Returns “hello”. tokenize_part("[email protected]", "@.", 2)

Returns “example”. Arguments

Argument Type Required Multiple value string True False delimiter string False False part int64 False False

9.135 trim (Trim)

Description Trim removes leading and trailing characters from a string Return type string Domain This is a scalar function (acts on values from each row individually.) Usage trim( str, [ chars ] )

Examples colmake trimmed_kind:trim(bundle_kind, '')

Removes leading and trailing spaces from bundle kinds Arguments

Argument Type Required Multiple str string True False chars string False False

9.135. trim (Trim) 237 Observe

9.136 upper (Uppercase)

Description Return the input string in uppercase. Return type string Domain This is a scalar function (acts on values from each row individually.) Usage upper( value )

Arguments

Argument Type Required Multiple value string True False

9.137 validfor (Validfor)

Description Specify the validity period for each event for some verbs Return type validfor Domain This is a scalar function (acts on values from each row individually.) Usage validfor( expression, ... )

Arguments

Argument Type Required Multiple expression expression True True

9.138 window (Window)

Description Evaluates its argument in windowed context, partitioned over the given grouping and ordered by the given ordering (by default, input dataset timestamp.) Include ‘frame()’ to evaluate the window function inside a sliding window frame. For queries, you may omit ‘frame()’ to use the current query time window. Note that this causes the resulting dataset to be unstreamable. Return type

238 Chapter 9. List of OPAL functions Observe any Domain This is a scalar function (acts on values from each row individually.) Usage window( expr, [ groupby ], [ orderby ], [ frame ] )

Examples colmake name:window(first(name), groupby(section), orderby(time))

Name each row with the first name that appears in the current query time window colmake avg:window(avg(load), groupby(host), orderby(time), frame(back:10m))

Compute the moving average of system load within the past 10 minutes of each event Arguments

Argument Type Required Multiple expr any True False groupby grouping False False orderby ordering False False frame frame False False

9.138. window (Window) 239 Observe

240 Chapter 9. List of OPAL functions CHAPTER TEN

OBSERVE GLOSSARY

accelerate Increase search performance for a dataset by proactively executing its transforms and saving the results. This process is managed automaticaly by Observe. A dataset may be accelerated if the OPAL used in its definition is streamable. If a dataset cannot be accelerated, its transforms are applied to the underlying parent dataset each time it is queried. channel A set of channel actions (destinations) to which an alert is sent. A channel may contain multiple actions of different types, and be used by more than one monitor. channel action A single kind of alert, such as an email to a particular recipient. A channel action defines the type (email or webhook), the destination (an email address or webhook URL), and the message template. A channel action may be used by multiple channels. collector An API endpoint that accepts incoming data. Different types of data use different endpoints, such as https://prometheus.collect.observeinc.com for Prometheus observations or https://collect. observeinc.com/v1/http for HTTP JSON. console The OPAL editor and inspector at the bottom of a worksheet. Execute OPAL commands or view the OPAL equivalent of UI actions in the Definition tab. Click on a field value to view it in the Inspect tab. dataset A structured representation of data from a source. Event streams, resource sets, and lookup tables are types of Observe datasets. event stream A collection of events that share a common event stream definition. All resources with the same defini- tion are part of the same event stream. An event stream is a type of dataset. firehose An event stream that includes all of a customer’s ingested data, also called the Observation table. Accessible from the Firehose bookmark or as an event stream named “Observation”. landing page An interactive dashboard automatically generated for a dataset. link The connection between two datasets, via a resource key. metric A set of measurements over time, reported as a time series. Example: the sets of mea- surements for disk_usage{"device": "sda1", "type": "used"} and disk_usage{"device": "sda1", "type": "free"} are two time series for one metric. minimap A visual graph found on the right rail of any dataset, resource page, or worksheet that shows how and what items are related. moment A specific point in time, as displayed ina landing page or worksheet stage. moment selector The vertical bar displaying a timestamp in the time scrubber. Move the selector to view different moments. monitor A monitor watches a dataset for a particular condition and sends an alert when it occurs. Monitors use channels to determine where the alert is sent. observation An individual event from a data source, such as a trace, metric, or log line.

241 Observe

OPAL The Observe Processing and Analysis Language, a query language for searching and transforming data in Observe. resource A human-understandable thing, something you might want to ask a question about. Examples of resources are a user, a location, a server, or a CI job. resource definition The collection of fields that make up a particular resource set. resource grid Hexagons on a resource set’s landing page, each representing an individual resource. Click on a hexagon to highlight data from a single resource. resource key A field or set of fields that uniquely identify a single resource ina resource set. Used for linking related resources and datasets. One field of a multi-field key is a partial resource key. resource set A collection of resources that share a common resource definition or schema. All resources with the same definition/schema are part of the same resource set. stage An individual results table in a worksheet. Link a new stage to a previous one to iteratively refine your desired results. step An individual data transformation from a UI action or an OPAL statement. An OPAL script is made up of one or more steps. streamable A verb or function whose behavior is the same for any size query time window, such as filter. A dataset that only uses streamable verbs or functions can be accelerated. Also called “materializable.” See unstreamable. time picker A popup to select the time range to view, either relative to the current time or a specific range of dates or times. time scrubber The timeline view in a landing page or worksheet. time series A set of data points, with identifying metadata, in time order. Example: the counts of active users at one minute intervals, for each of three services, are three time series. See also metric. unstreamable A verb or function whose behavior changes depending on the query time window it is applied to. Results and datasets that use unstreamable operations are unstreamable and cannot be accelerated. Examples of unstreamable verbs are statsby and exists. See streamable. worksheet A view where you can transform and manipulate data directly, “an infinite spreadsheet for your data.” Worksheets may contain multiple stages and visualizations.

242 Chapter 10. Observe Glossary CHAPTER ELEVEN

HELPFUL HINTS

Sometimes we have handy little tips that haven’t yet made it to a documentation page. The suggestions here may be updated or moved, if there’s something you are looking for, try Quick Search.

11.1 Account details

11.1.1 Customer ID

If you are logged in, your Customer ID is the subdomain of the URL you use to access Observe. Example: 1234567890.observeinc.com

11.2 OPAL

11.2.1 Change a field’s type

Change the type of an existing field by creating a new one with the desired type. You may keep both fields, orreplace the existing one by giving it the same name. colmake foo:float64(foo)

11.2.2 Customized metric aggregation

Do common metric aggregation operations with the aggregate verb: rollup options(buckets:100), cpu_usage:metric("cpu_usage_total", rollup:"rate", type:

˓→"cumulativeCounter") aggregate avg_cpu_usage:avg(cpu_usage), groupby(cluster_uid, node_name, cpu_id)

You can also form more advanced aggregation operations with it as well. For example, create a weighted average with: rollup options(buckets:100), cpu_usage:metric("cpu_usage_total", rollup:"rate", type:

˓→"cumulativeCounter") colmake weight:case( contains(cpu_type,"expensive"), 2.0, contains(cpu_type,"normal"), 1.0) aggregate avg_cpu_usage:avg(cpu_usage* weight), groupby(cluster_uid, node_name)

243 Observe

11.2.3 Filter

Comparisons: filter temperature> 60 and temperature< 80 filter temperature< 30 or temperature> 100 filter hostname="www" or (hostname="api" and user="root") filter not severity="DEBUG"

Operators vs functions: Construct expression with either operators or functions. For example, these two statements are eqivalent: filter abc< 100 filter lt(abc, 100)

11.2.4 ifnull

Example: A source error resulted in JSON data with similar values but different key names. FIELDS {"data":"abc123"} {"payload":"def456"} {"data":"ghi789"}

Use ifnull to get the value from payload if there is no value for data. Note: both values must be the same type. colmake data:ifnull(string(FIELDS.data), string(FIELDS.payload))

11.3 Performance

11.3.1 Limit your query window to 1 hour or less while actively modeling

By default, worksheets read 4 hours of data. Depending on the input dataset, that can be a lot of data. Consider reducing the query window to 1 hour or less while actively modeling.

11.3.2 Create intermediate event datasets when shaping data

Where possible, create an intermediate event dataset by publishing partially shaped data as a new event dataset. Queries and further derived datasets will typically have to read much less data than if they were created directly on top of the original input dataset. This technique is especially effective if the intermediate dataset applies a selective filter to the input dataset, picksonly a subset of input columns, or extracts JSON paths from an input column and then drops the original column. Avoid defining datasets directly on the Observation dataset.

244 Chapter 11. Helpful Hints Observe

11.3.3 Use options(expiry) to reduce the time range read by makeresource

By default, the makeresource verb reads a large time range of input events: 24 hours. The reason for this behavior is that makeresource must compute the state of each resource at the beginning of the query time range, and, by default, it looks for events up to 24 hours in the past. Thus, a query with makeresource that has a query time range of 4 hours actually reads at least 28 hours of input data. 24+ hours can be a lot of data, especially if the input dataset is the Observation dataset. So especially avoid defining resource datasets directly on the Observation dataset. Most resource types receive events much more frequently than every 24 hours. We recommend adding options(expiry:duration_hr(...)) to your makeresource command to reduce its lookback where appropriate. For example, if it is known that the live instances of some resource dataset receive events at least every 15 minutes, it would be appropriate to set the resource expiration to 1 hour, thereby greatly reducing the amount of data read by makeresource: makeresource options(expiry:duration_hr(1)), col1:col1, primarykey(pk1, pk2)

11.4 Shaping data

11.4.1 Field name allowed characters

In most cases, field (column) names may contain any character except double quote ", period ., or colon :. Underscores are displayed as spaces in the UI. colmake"T":float64(field3) colmake"":float64(field4) colmake"0_3 µm":float64(um03)

To reference a field with non-alphanumeric characters in an OPAL statement, use double quotes and prepend @.. colmake temp_difference:@."T"

Regex extracted columns (either Extract From Text or colregex) are limited to alphanumeric characters (A-Z, a-z, 0-9).

11.5UI

11.5.1 Supported web browsers

Observe works best with the latest versions of Chrome, Edge, Firefox, and Safari.

11.4. Shaping data 245 Observe

11.5.2 “use__to_share” URLs

To share a worksheet with someone else, use the Share Worksheet button to copy the link to the clipboard. A URL copied from the browser address bar won’t show the same contents.

11.5.3 Change the number of results displayed

By default, an events table shows the first 1000 rows of results. You can change the number displayed inthe Limit tab of the Table Controls menu.

11.5.4 Hide, show, or reorder columns

Also use Table Controls to hide, show, or change the order of columns displayed. In the Columns tab, click to show or hide, and drag to reorder.

246 Chapter 11. Helpful Hints Observe

Video instructions

11.5.5 Bookmarks

Use the the to save the current dataset as a bookmark:

New bookmarks are private to each user, make them public to be accessible by other users. You may also arrange them in folders.

11.5. UI 247 Observe

248 Chapter 11. Helpful Hints CHAPTER TWELVE

OBSERVE DATASETS AND TIME

12.1 Foreign Keys

If a dataset contains one or more fields that together can be used to identify some resource in another dataset (oreven another instance of the same resource,) those fields taken together make up a “foreign key.” Foreign keys consist of: • The field or fields in the source dataset that make upthekey • The target dataset that the key links to • The fields in the destination dataset that should match up to the fields in the source dataset When a foreign key exists in a dataset, the Observe user interface shows a clickable link to follow they key relationship to the target dataset. You can also use foreign keys to lookup values from the target dataset by browsing the relations.

12.1.1 Related Keys

When some other dataset points into a dataset, that target dataset is also related to the other dataset. That relationship is called a “related key” — a “what links here” relationship. The reason this is not generally a foreign key, is that many remote resources may link to a single resource instance (for example, a single host may have multiple disks in it) and thus following this related key may end up finding multiple remote resources for a single source resource.

12.2 Resource Primary Keys

This may be a GUID assigned to the resource, a user ID assigned in some database, or a MAC address of a network interface; whatever makes sense for that particular resource. Primary keys may be composite — consists of a number of fields taken together. For example, the primary key fora particular disk device may be the “host ID” of the host the disk is attached to, and the “disk index” within that host, such as host-3, /dev/sdc

249 Observe

12.3 Resource Times

For values that have recently changed and will be valid “until later changed,” the end time is unknown, and assumed to be in the distant future. For values inherited at the start of time, the start time is unknown, and is assumed to last since the dawn of time. Observe collects all data (system and application logs, metrics, and tracing spans) into observations, which are trans- formed into datasets. Datasets are structured with times or time intervals, as well as links (relations) to/from other datasets. Having these links (relations) between different parts of the system is what gives Observe superpowers when discovering the meaning hidden in the data.

12.4 Datasets

A dataset lives within a named project, and in turn has a name. Project names must be unique within your customer, and dataset names must be unique within their project. When you log into Observe, you are shown the “Explore” page which lets you browse the different datasets that exist for your customer id. A dataset has a schema (a set of named columns and the type of data stored in those columns) and a type: “table,” “event,” or “resource.” The type is determined mainly by how much time-related information is available about the dataset.

12.5 Table Datasets

If information is not related to time and changes over time are not tracked, a dataset is a “table.” This is like a normal look-up table in most systems, but is the kind that is used the least in Observe because it doesn’t allow tracking changes over time.

12.6 Event Datasets

If something happens “at a time” and has a well-defined timestamp, then the dataset is an “event dataset.” Events have a single point in time, and typically link (relate) to one or more other tables in the system. For example, “user X logged into system Y at time Z” is an event, which also links to the “user” dataset and the “system” dataset. Foreign Keys

12.7 Resource Datasets

Finally, objects that have permanence over time, and whose state changes over time, are stored in “resource datasets.” Any field value for a resource has a validity time interval — a start time, and an end time. For a resource, youcanask questions like “what was the name at time T?” Additionally, a resource is identified by a primary key. Resource Times Resource Primary Keys

250 Chapter 12. Observe Datasets and Time CHAPTER THIRTEEN

OBSERVE BASIC DATA PROCESSING MODEL

The Observe platform is built to capture and visualize machine data form a wide variety of sources. On top of the Observe platform, we’ve built a task-specific application to manage IT infrastructure deployed on top of the AWS cloud infrastructure and the Kubernetes container orchestration system, as a first example of how to use the features of the platform. The Observe platform contains many affordances for capturing, processing, and presenting data, with special attention paid to relationship between entities, and evolution of such relationships over time. The main steps along this path are: • Data Origin • Data Capture and Forwarding • Data Collection • Data Buffering • Data Loading • Data Transformation • Data Storage • Data Querying • Data Presentation

13.1 Data Origin

Originally, the data came from somewhere. This might be messages printed to the console from a batch process. These data may be log files on disk coming from a server such as a web, application, or database server. These may beevents emitted by some small internet-of-things sensor or device, or by a machine on a factory floor, or by a cash register in a department store. Data may even already live in a database table somewhere, prepared by some other system (this is common for, for example, customer profile information.) Which origin data come from is important to capture as metadata about the data, so that appropriate processing can be made in steps further down. As an example, the time zone in effect at the time of data capture may be important later, when correlating events across a globally distributed enterprise. Additionally, which physical host or machine or container or database the data come from, is important metadata, as opposed to the entities perhaps mentioned within the data. For example, a web server may emit a log statement saying that user X logged in from ip Y to application area Z. The log message contains references to entities X, Y, and Z, but the entity of which web server actually emitted the log statement, is metadata about the origin, not found within the datum itself. The Observe model is to be agnostic about how data are generated — we don’t have a custom API that a customer has to use to generate data for Observe. Instead, we capture data through whatever means are already available. If a customer

251 Observe wants use a rich data generation API (such as OpenTracing or Prometheus or logging full JSON encoded objects) then that’s easy to add using whatever mechanism works best for that customer.

13.2 Data Capture and Forwarding

Data are captured either using special collection services known as “agents,” or by pointing data producers directly at collection services. A customer can use industry standard agents like filebeat or fluent-bit to capture log files or other data inputs, and a customer can also choose to host one or more instances of the observe-agent for capturing data. observe-agent is especially useful on systems using Prometheus, as it can query Prometheus endpoints and push the results into the Observe system. The observe-agent also adds metadata about where it’s running and where it’s capturing data.

13.3 Data Collection

Observe runs a set of ingestion endpoints in the cloud. For data that doesn’t come in through Snowflake data sharing, this is the destination where the customer hands it off, and it can no longer be directly modified by the customer. Atthis point, information such as which registered customer provided the data, and through which customer-defined integration name, is attached as metadata. Service authentication is also done at this point — data that are not accompanied by proper customer-specific credentials are not accepted into the system.

13.4 Data Buffering

Customer data production may be bursty. This is especially true when new systems are onboarded, and historical data are captured. Additionally, while Observe works to maintain industry-leading uptime, there exists the possibility of an outage on the Observe platform or dependent Snowflake data processing side. To avoid having to reject data provided by customers, all data collected go through a buffer stage, with sufficient storage capacity for several days of dataingest. Under normal circumstances, the queuing latency in this buffer is negligible, but during ingest spikes or temporary capacity outages, this buffer makes sure the data will eventually be processed if they have been accepted.

13.5 Data Loading

Data are pulled from the buffer, and loaded into the Snowflake data warehouse, through a process known astheloader. The function of this loader is to collate data arriving for individual customers into per-customer load requests, as well as format and forward data and metadata in a mode suitable for semi-structured SQL processing. All the stages until now clearly keep a separation between “the data” that the customer initially provided, and “the metadata” that were captured around the data. Because the data are loaded in a form as un-touched as possible into the first permanent store, it is always possible for the customer to change their mind about how to process data,andgo back to the initial data store to apply new processing rules. We call this unmodified original data “evidence.”

252 Chapter 13. Observe Basic Data Processing Model Observe

13.6 Data Transformation

Once evidence is loaded into the base layer (which we call the “Observations table” or the “firehose”) the process of refining and shaping it to well-behaved entities with relations starts. When starting with Observe, a customer willget one or more pre-installed transformation configurations, for example for AWS infrastructure or Kubernetes clusters, but the platform allows customers to modify these initial configurations, to extend them with further derived configurations, and to create new basic configurations from scratch. Transformation is viewed as successive steps of refinement, where datasets are selected, filtered, and processed outof the raw observation stream. For example, a set of rules may select observations from a Kubernetes apiserver that talks about container creation, lifetime, and death, and extract the container ID, cluster ID, and other relevant fields out of those log events, and create a dataset called “container events.” A further derived transform may take these container events, identify resource keys in the events (in this case, cluster ID + cluster-specific container ID,) and make the system build a resource out of this set of updates. Those resources are then available to other processing streams that happen to have the same kind of identifier in them, so we can talk about “services running in containers” andsoforth. The majority of the Observe platform implementation focuses on making all necessary data and metadata available for the transform step, and efficiently implementing the transform step both for pre-configured, and user-configured datasets. Decisions made in this area include anything from how frequently to pre-process incoming data, to whether to process the data only on demand, or accelerate the result of a transform to make it immediately accessible to queries without further processing. Transforms are described using statements in the temporal algebra query language we created called OPAL. These transforms also run in an environment that is defined for the transforms in question — for example, if a transform joins four different datasets, that transform runs after the transforms creating those datasets have output their results. Thisis an implementation decision made by choosing to treat stream processing as a never-ending sequence of small batches, which makes processing more efficient than a pure stream-based system.

13.7 Data Storage

Once a dataset is defined, and if the system decides to accelerate its transform (rather than just remembering thetrans- form rules and applying them on demand when a query is run,) one or more tables are created for its results in the Snowflake data warehouse. Tables may be partitioned both across attributes, across time, and across functional areas. Frequently changing attributes of a resource, such as metrics, may be stored separately from seldom-changing attributes, like the designated name or CPU type of a host. Datasets may be partitioned in time, to allow for larger datasets without exceeding particular built-in volume limitations of per-table size in the underlying Snowflake database. To the user, the specific storage chosen by the storage subsystem is not visible, as the datasets present themselves and behave as per their definitions in dataset schema and metadata. However, the correct choice of storage for each partof the dataset has significant efficiency impact.

13.8 Data Querying

Once the user wants to query some state of the system being observed, a query is formed on top of the existing datasets, using the OPAL query language. This query is additionally conditioned to be easily presented in the user interface. For example, an OPAL statement that runs as a transform will unconditionally process all matching data, whereas a UI may limit the number of rows presented to something like 1,000 rows, because the user will not be expected to scroll through millions of results, but will instead further aggregate and filter the query to find the results they are interested in. The queries formulated in the user interface generally do not come from direct user-input of OPAL statements, but instead are built by the user using affordances in the user interface, such as “follow link to related dataset” and“show only values in the top-10 list,” or clicking to focus on a specific set of entities or time range.

13.6. Data Transformation 253 Observe

Another user of the query language is the user interface created for browsing datasets and metadata.

13.9 Data Presentation

Interactive data exploration benefits from data further conditioned than what a raw processing query can provide. Thus, the presentation includes affordances such as “rolling up” resource states (returning one row per resource instance, with all the states of that resource over time merged into a single column,) and “linking” key columns — showing the name of the target entity, in place of the specific key value used to declare a foreign key relationship. The presentation layeralso supports calculating summaries and statistics about the columns of data being presented, allowing the user interface to display sparklines, histograms, top-k displays and other helpful affordances in context with data being displayed. Because many such summaries cannot be efficiently stream transformed, they are rendered as part of the presentation layer, rather than as part of the underlying tree of transform streams.

254 Chapter 13. Observe Basic Data Processing Model CHAPTER FOURTEEN

FAQ

14.1 How do I create an access token to post data to Observe?

You can do this with a curl command like the following: $ curl -s \ https://${OBSERVE_CUSTOMER}.observeinc.com/v1/login/ingestToken -d \ '{"user_email":"[email protected]", "user_password":"so secret"}'

Where OBSERVE_CUSTOMER is your numeric Customer ID (in this example, from an environment variable.) The re- sponse looks like this: { "ok":true, "access_key":"Ga21uay2vAGrzxfZHgJN4gNhuCBC9oKD", "expiration":"2021-04-03 20:08:39", "tokenName":"[email protected]" }

You can also contact your Observe account manager and they will be happy to generate a token for you that will be sent to you via email.

14.2 How do I easily post some data to Observe?

Easiest is to use some supported existing collector, like those for fluentd, filebeat, or prometheus. If you want topost some JSON to Observe, you can make an access token, and then post an array of JSON payloads to the https endpoint. The path after the endpoint bit will show up as the path attribute of your collected observations: $ curl -s \ https://collect.observeinc.com/v1/http/your-path-here \ -H 'Authorization: Bearer ${OBSERVE_CUSTOMER} ${OBSERVE_TOKEN}' \ -H 'Content-type: application/json' -d '[{"foo":"bar"}]'

Note that the Authorization header needs the token Bearer followed by both your customer ID, and the access token, separated by a space.

255 Observe

14.3 How do I create an access token that can do more than just ingest data?

You can do this with curl: $ curl -s \ https://${OBSERVE_CUSTOMER}.observeinc.com/v1/login -d \ '{"user_email":"[email protected]", "user_password":"so secret", \ "tokenName":"My token name"}' { "ok":true, "access_key":"Ga21uay2vAGrzxfZHgJN4gNhuCBC9oKD", "expiration":"2021-04-03 20:08:39", "tokenName":"My token name" }

Note that the curl URL in this case is /v1/login not /v1/login/ingestToken Also, it’s possible to provide a specific name to keep track of this token.

256 Chapter 14. FAQ INDEX

A step, 242 accelerate, 241 streamable, 242 C T channel, 241 time picker, 242 channel action, 241 time scrubber, 242 collector, 241 time series, 242 console 241 , U D unstreamable, 242 dataset, 241 W E worksheet, 242 event stream, 241 F firehose, 241 L landing page, 241 link, 241 M metric, 241 minimap, 241 moment, 241 moment selector, 241 monitor, 241 O observation, 241 OPAL, 242 R resource, 242 resource definition, 242 resource grid, 242 resource key, 242 resource set, 242 S stage, 242

257