eBook: The DevOps Guide Essential to Service Meshes Synopsis

Service mesh recently emerged to provide an abstraction layer that streamlines service-to-service communication in order to address the failure of conventional strategies used to orchestrate . They are quickly becoming an entire industry on their own. Here is an in depth view of the service mesh capabilities and current key players. Index

Introduction ...... 1

Why service mesh? ...... 3

What is a service mesh ecosystem? ...... 5

Who are the key service mesh players? Istio ...... 10

App Mesh ...... 13

Linkerd ...... 15

Consul Connect ...... 18

Microso Service Mesh Interface (SMI) ...... 21

Kong ...... 23

Conclusion ...... 26 The term “Service Mesh” was coined to represent a network of applications or microservices, and the relationships and interactions between them. Relatively new on the scene, a service mesh is an infrastructure layer controlling delivery service requests that enables DevOps to abstract numerous application network functions from the service code when developing hybrid or cloud-native applications.

This infrastructure layer manages inter-service communications in the microservice architecture now becoming the norm for cloud-native applications. A service mesh considerably reduces application architecture complexity by providing a unified way to manage traffic flow and access policy enforcement across microservices, regardless of their location. As communications occur on a single layer, the service mesh layer, it facilitates diagnosing communication errors between applications and services.

Security features provided by the service mesh, such as encryption, authentication and authorization, further explain its fast growing popularity.

1 A service mesh’s main attributes and features cover wide areas such as:

Resiliency features (retries, timeouts, deadlines, etc)

Cascading failure prevention (circuit breaking and failover)

Robust load balancing algorithms managing automatic retries, circuit breaking, global rate limiting, request shadowing, zone local load balancing, etc.

Control over request routing (convenient for CI/CD release patterns)

Configuration and management of TLS and mTLS

Rich sets of metrics providing instrumentation at the service-to-service layer

Mesh expansion to VM (not limited to k8s clusters), including cross-cluster routing and encryption

According to Gartner, the global public cloud services market is set to reach $266.4 billion in 2020. Service meshes abstract cloud-native application architecture complexities, so it is no wonder that service mesh is taking the soware industry by storm.

2 Why service mesh?

Digital transformation drives the adoption of cloud-native methodology, where applications are built as a collection of microservices that work with each other to deliver the expected user experience.

Such a microservices-based architecture shis the complexity from within the monolithic application code to a distributed system. There are many more endpoints and interactions to scale, secure, and monitor, resulting in an exponential increase in debugging time and security vulnerabilities.

The service mesh is emerging as a way to address these requirements.

Traffic management - When adding or removing a microservice, the service mesh eliminates the gateway updates necessary when using API gateways to handle protocol transactions. Service meshes are location agnostic thereby providing a unified way for developers to manage traffic flow and access policy enforcement across microservices, regardless of where they reside.

3 Reduce complexity - It leverages a proxy instance called a sidecar. Sidecar proxies reduce the complexity in the microservice code by abstracting common infrastructure-related functionalities (encryption, endpoint discovery, failure recovery etc.) to a different layer. Situated next to a container cluster, they effectively manage network services.

As enthusiastically expressed by Gabe Monroy, Director of Product at Microso Azure Application Platform, “Service mesh is obviously hot technology — and for good reasons. The cloud-native ecosystem is driving the need for smarter networks and smarter pipes and service mesh technology provides answers.”

4 What is a service mesh ecosystem?

Operating at the application level, a Service Mesh is a network communication infrastructure that allows decoupling and offloading most of the application network functions from the service code by handling service to service communication for cloud-native applications.

The service mesh overlay is logically split between a control plane and a data plane.The data plane consists of an array of lightweight proxies known as sidecars

The control plane manages and configures proxies to route traffic. Its rich sets of metrics provides instrumentation at the service-to-service layer, facilitating the implementation of communication and security policies, as well as aggregating telemetry data for monitoring.

5 Currently the most popular capabilities that users are requesting from a service mesh are:

Traffic management - to connect and control the flow of traffic and API calls between services

Security - You can enforce mTLS- Mutual TLS authentication to ensure that traffic is both secure and trusted in both directions between a client and server. Also you can enforce service-level authentication using either TLS or JSON web tokens.

Access Control - to apply policies and ensure that they’re enforced, and that resources are fairly distributed among users.

Observability - Enables inferring the internal states of a system from knowledge of external outputs.

6 The service mesh overlay is logically split between a control plane and a data plane.The data plane consists of an array of lightweight proxies known as sidecars that, deployed alongside application code, handle the ingress and egress traffic between services.

The control plane manages and configures proxies to route traffic. Its rich sets of metrics provide instrumentation at the service-to-service layer, facilitating the implementation of communication and security policies. It also aggregates telemetry data for monitoring.

By streamlining the CI/CD process, service mesh provides granular control over request routing. When leveraged during a canary or blue-green deployment set, it enables sending only a portion of the traffic, or traffic of a particular type to the new service version, thus preventing cascading failure. Regular traffic is optimized through robust load balancing algorithms. Also, authentication, authorization, and encryption policies secure service-to-service communications protecting the services and data in the mesh with the possibility to expand to VM and across clusters.

7 Troubleshooting service latency or errors gets a lot easier with the service mesh's rich set of metrics traces and logs aggregated by the service mesh, helping to pinpoint what went wrong and exactly where it happened rapidly, benefiting both developers and operators.

Developers no longer need to worry about embedding libraries like service discovery, Transport Layer Security (TLS), or metrics into the application code, and instead, can focus on delivering new features and business values.

Operators gain the visibility and control they need to manage microservices at scale. It helps them achieve a consistent operational model across all cloud-native applications by reducing the complexity of managing a distributed system.

8 In short, the main use cases of service meshes are

Traffic Governance: configuring the mesh network to implement fine-grained traffic management policies, including connection and control of the traffic flow and API calls between services and all ingress and egress traffic to and from the mesh, without going back and changing the application.

Security: Configuring and managing mTLSauthentication to ensure a traffic secure and trusted traffic between a client and server, including the ability to enforce service-level authentication with TLS or JSON web tokens

Control: Configuring and enforcing of policies with Policy Repositories (PR), Policy Decision Points (PDP) andPolicy Enforcement Points (PEP), and managing load balancing

Observability: generating extensive detailed telemetry, including metrics to understand cluster statuses, accelerate debugging, and improve system architecture readability, system resilience and system stability.

9 Who are the key service mesh players?

Istio

Cofounded by IBM, Google, and Ly in 2017, Istio currently supports and is developing into additional environments. An open-source project, Istio uses Envoy as a sidecar and is one of the most feature-rich service mesh available today. Its features include:

Fully open-source

Support for multi-cluster and mesh expansion

Automated policy-based inter-service load-balancing (currently only with Kubernetes)

Zero-Trust Security concept: Service communication is restricted to required services to minimize attack propagation

Authenticating services

10 Inter-services encrypted traffic

Security policies enforcement

Unified and language agnostic standard task management

Built in CA (Citadel) with the option to use custom CA

Integrated with observability tool Kiali

Istio provides statistics, traces, and logs for cluster ingress and egress, as well as automatic load balancing for many common types of traffic. Still, its control pane is extremely complex, and integrating Istio is reputedly more complicated than other service meshes integration.

11 Service A Service B HTTP/1.1, HTTP/2, gRPC or TCP -- with or without mTLS

Proxy Proxy

Policy checks, telemetry

Adapter Mixer Adapter

Configuration Configuration TLS certificates data to proxies data to proxies

Pilot Galley Citadel

Control Plane

Service mesh architectural diagram for Istio. [Courtesy istio.io] https://istio.io/docs/ops/deployment/architecture/arch.svg

12 App Mesh

Hosted by AWS, all App Mesh compute infrastructures, AWS Fargate, Amazon EC2, Amazon ECS, Amazon EKS, are compatible with Kubernetes. It also integrates with AWS Outpost to run applications on-premises. App Mesh uses the open-source proxy sidecar Envoy, though modified. Its features include:

Application-level networking support for compute services within AWS

Fully managed

Partially open source (injector & controller)

Relies on open source Envoy proxy sidecar within the mesh

API has similar routing concepts as the Istio control plane

13 Disadvantages:

Only runs within AWS – Cannot be migrated outside AWS

Their Envoy modification and their "Pilot" is closed source

B Mesh - [sample_app]

Virtual Router Virtual B Node B HTTP route

prefix / Service A Virtual Listener Discovery Backends Node A targets:

# Service Virtual B Listener Discovery Backends # Node B

Service B Listener Discovery Backends

Service mesh architectural diagram for Appmesh. [Courtesy aws.amazon.com] https://aws.amazon.com/blogs/compute/introducing-aws-app-mesh- service-mesh-for-microservices-on-aws/

14 Linkerd

Initially developed in-house by Twitter, it was made open-source as Linkerd v1 in 2016 and donated to the Cloud Native Computing Foundation (CNCF) in 2017. Released in 2018, Linkerd v2 merged with Conduit and is rewritten in Rust and Golang. It uses linkerd-proxy as a proxy sidecar. It has the following features and advantages:

Open source

Observability using Prometheus and Grafana

Secure mTLS communication

Support for service traffic shiing

Provides an injector to inject proxies during a Kubernetes pod deployment based on an annotation to the Kubernetes pod specification

User interface dashboard to view and configure the mesh settings

15 Key components:

User Interface is comprised of a command line interface (CLI) (Linkerd) and a web UI. The CLI runs on a local machine; the web UI is hosted by the control plane.

Control plane is composed of a number of services that run on your cluster and drive the behavior of the data plane. It is responsible for aggregating telemetry data from data plane proxies.

Data planes are comprised of ultralight, transparent proxies that are deployed in front of a service. These proxies automatically handle all traffic to and from the service.

16 controller

CLI prometheus tap proxy-injector

sp-validator public destination api web grafana

identity

Control Plane

Data Plane

Linkerd -proxy

application

Service mesh architectural diagram for Linkerd. [Courtesy linkerd.io] https://linkerd.io/2/reference/architecture/

17 Consul Connect

Launched by HashiCorp in July 2018, Consul Connect is an extension to Consul service networking solution. that provides service mesh functionality. It has a default proxy L4 and, Envoy compatible through configuration with Mesh Gateway. Consul Connect is open source with a paid enterprise offering.

Features & Advantages:

A key-value store

Health checking

Service segmentation for secure TLS communication between services

Built-in mTLS proxy

Support for Envoy proxy with Mesh Gateway

18 Service Access Graph

Security certificate management with Vault

Open source

Disadvantages:

No tracing

No rate limiting

No metrics collection

19 OPTIONAL

consul consul consul server server server

VIRTUAL consul consul CONTAINER CONTROL MACHINE client client POD PLANE

CONTROL PATH

DATA

PLANE

APP A PROXY PROXY APP A CONTROL PATH

Service mesh architectural diagram for Consul. [Courtesy hashicorp.com] https://www.hashicorp.com/resources/service-mesh-microservices-networking

20 Microso Service Mesh Interface (SMI)

Spurred by Microso with the backing of partners like Linkerd, HashiCorp, Solo.io and VMware, SMI was announced at KubeCon in early spring 2019. Its goal is to become a common interface or abstraction layer for other service mesh implementations.

Features:

Proposes "a standard interface for meshes on Kubernetes" which offers a basic and common feature set and the flexibility for different mesh services.

SMI can be used either directly through a set of APIs or customers can build operators to translate SMI to native APIs.

Service Mesh Interface provides:

A standard interface for meshes on Kubernetes A basic feature set for the most common mesh use cases Flexibility to support new mesh capabilities over time Space for the ecosystem to innovate with mesh technology

21 SMI covers

Traffic policy – apply policies like identity and transport encryption across services Traffic telemetry – capture key metrics like error rate and latency between services Traffic management – shi and weight traffic between different services

Cloud Sidecar Proxy Sidecar Proxy Sidecar Proxy

HTTP gRPC

Microservice Microservice Microservice

Sidecar Proxy Sidecar Proxy Sidecar Proxy

HTTP gRPC

Microservice Microservice Microservice

Service mesh architectural diagram for Microso SMI. [Courtesy docs.microso.com] https://docs.microso.com/en-us/dotnet/architecture/cloud-native/ service-mesh-communication-infrastructure

22 Kong

Announced in September 2019. Kong provides a service mesh named Kuma. It aims at addressing earlier service meshes limitations. Built on Envoy and compatible with both Kunbernetes and VMs, it builds on Kong Edge capabilities for managing APIs and, as it can work with organizations’ legacy infrastructures, if facilitates their migration to a containerized environment. Designed to have a shallow learning curve, it provides policies to configure lower-level details at the data plane.

Features:

mTLS enabled for all L4 traffic

Service ACL- Access Control List - Restrict access to a service or a route by whitelisting or blacklisting consumer using arbitrary ACL group names.

Multi-Mesh

Tracing to visualize latency

23 Logging: Log requests and responses to your system over TCP, UDP or to disk

Metrics

Custom Policies

Authentication: Manage consumer credentials query string and header tokens Rate-limiting: Block and throttle requests based on IP or authentication Transformations: Add, remove or manipulate HTTP params and headers on-the-fly CORS: Enable cross-origin requests to your APIs that would otherwise be blocked

Platform agnostic - runs on Kubernetes and VM

SSD - Soware Defined Security - enables IP-restriction: Whitelist or blacklist IPs authorized to make requests

24 OAuth2.0: Add easily an OAuth2.0 authentication to your APIs

Monitoring: Live monitoring provides key load and performance server metrics

CLIENT

Rate-Limiting Authentication Transformations

Caching Logging ...and More

KONG

API API API

PUBLIC API PRIVATE API PUBLIC

PARTNER PARTNER

Service mesh architectural diagram for Kong. [Courtesy konghq.com]

25 Conclusion

Service meshes evolved as a facilitator for orchestration in the wake of Kubernetes and other container technology, and are rapidly becoming an indispensable tool for containerization. With service meshes, DevOps teams can focus on building added value services, in distributed architectures ready to scale, with built-in predictability and consistency across platforms, and increase containers performance.

From a security perspective, service meshes are instrumental in enforcing compliance and best practices, alleviating SOC team workload and improving resilience while simplifying vulnerability identification and facilitating remediation.

In short, service meshes spell the end of black-box implementation with no visibility where DevOps teams focus on building pipes instead of creating value and where common functionalities are duplicated across services.

This was powerfully summarized by Bernard Golden -VP Cloud Strategy at Capital One at the first Service Mesh Day in san Francisco in March 2019: “Service Mesh has the potential to reduce the cognitive load. We have people running Kubernetes in our organization who end up having to run a lot of "plumbing," which is a challenge for their cognitive load. Service mesh level of abstraction gives them more ability to focus on developing the business side of the applications.”

26