Distributed Data Management Thorsten Papenbrock Felix Naumann Akka Actor Programming F-2.03/F-2.04, Campus II Hasso Plattner Institut

Akka Actor Programming Hands-on

. (Recap) . Basic Concepts . Runtime Architecture . Demo . Messaging . Parallelization . Remoting . Clustering . Patterns . Homework

Thorsten Papenbrock Slide 2 Actor Model (Recap) Actor Programming

Object-oriented Actor Task-oriented programming programming programming

. Objects encapsulate . Actors encapsulate . Application split down state and behavior state and behavior into task graph . Objects communicate . Actors communicate . Tasks are scheduled with each other with each other and executed . Actor activities are transparently  Separation of concerns scheduled and makes applications executed transparently  Decoupling of tasks Distributed Data Management easier to build and and resources allows maintain. for asynchronous and Akka Actor  Combines the Programming advantages of object- parallel programming.

and task-oriented Thorsten Papenbrock programming. Slide 3 Actor Model (Recap) Actor Model Actor 1 Actor 2

Actor Model . A stricter message-passing model that treats actors as the universal primitives of concurrent computation . Actor: . Computational entity (private state/behavior) . Owns exactly one mailbox . Reacts on messages it receives (one message at a time) . Actor reactions: “The actor model retainedDistributed more Data of . Send a finite number of messages to other actors what I thought wereAnalytics good features of the objectEncoding idea” and . Create a finite number of new actors Alan Kay, pioneer of objectEvolution orientation . Change own state, i.e., behavior for next message

Thorsten Papenbrock . Actor model prevents many parallel programming issues Slide 4 (race conditions, locking, deadlocks, …)

Actor Model (Recap) Actor Model Actor 1 Actor 2

Advantages over pure RPC . Fault-tolerance: . “Let it crash!” philosophy to heal from unexpected errors . Automatic restart of failed actors; resend/re-route of failed messages  Errors are expected to happen and implemented into the model: . Deadlock/starvation prevention: . Asynchronous messaging and private state actors prevent many parallelization issues Distributed Data . Parallelization: Management Akka Actor . Actors process one message at a time but different actors operate Programming independently (parallelization between actors not within an actor)

. Actors may spawn new actors if needed (dynamic parallelization) Thorsten Papenbrock Slide 5 Actor Model (Recap) Actor Model Actor 1 Actor 2

Popular Actor Frameworks . Erlang: . Actor framework already included in the language . First popular actor implementation . Special: Native language support and strong actor isolation . Akka: . Actor framework for the JVM ( and Scala)

. Most popular actor implementation (at the moment) Distributed Data . Special: Actor Hierarchies Management Akka Actor . Orleans: Programming . Actor framework for Microsoft .NET

Thorsten Papenbrock . Special: Virtual Actors (persisted state and transparent location) Slide 6

Akka Actor Programming Hands-on

. Actor Model (Recap) . Basic Concepts . Runtime Architecture . Demo . Messaging . Parallelization . Remoting . Clustering . Patterns . Homework

Thorsten Papenbrock Slide 7 Basic Concepts Akka Toolkit and Runtime

. A free and open-source toolkit and runtime for building concurrent and distributed applications on the JVM . Supports multiple programming models for concurrency, but emphasizes actor-based concurrency Distributed Data Management . Inspired by Erlang Akka Actor Programming . Written in Scala (included in the Scala standard library)

. Offers interfaces for Java and Scala Thorsten Papenbrock Slide 8

Basic Concepts Akka Modules

Akka Actors Akka Cluster Akka Streams Akka Http

Core Actor Classes for the Asynchronous, Asynchronous, model classes resilient and non-blocking, streaming-first for concurrency elastic backpressured, HTTP server and and distribution distribution over reactive stream client classes multiple nodes classes

Cluster Sharding Akka Persistence Distributed Data Alpakka

Distributed Data Management Classes to Classes to Classes for an Stream decouple actors persist actor eventually connector Akka Actor from their state for fault consistent, classes to other Programming locations tolerance and distributed, technologies

referencing state restore replicated key- Thorsten Papenbrock them by identity after restarts value store Slide 9 Maven – pom.xml Basic Concepts com.typesafe.akka Small Setup akka-actor_${scala.version} 2.5.3 Base actor library actors, supervision, scheduling, … com.typesafe.akka akka-remote_${scala.version} Remoting library 2.5.3 remote actors, heartbeats … Logger library com.typesafe.akka logging event bus for akka actors akka-slf4j_${scala.version} 2.5.3 com.typesafe.akka akka-testkit_${scala.version} Distributed Data Testing library 2.5.3 Management TestKit class, expecting messages, … Akka Actor Kryo library Programming Custom serialization with Kryo com.twitter chill-akka_${scala.version} Thorsten Papenbrock 0.9.2 Slide 10 Basic Concepts Akka Actors

State Mailbox Actor 1 Actor 2

Behavior

. Actor = State + Behavior + Mailbox . Communication: . Sending messages to mailboxes Distributed Data . Unblocking, fire-and-forget Mutable messages are possible, Management but don’t use them! Akka Actor . Messages: Programming . Immutable, serializable objects

Thorsten Papenbrock . Object classes are known to both sender and receiver Slide 11 . Receiver interprets a message via pattern matching

Basic Concepts Akka Actors

State Called in default Mailbox actor constructor Actor 1 Actor 2 and set as the actor‘s behavior Behavior public class Worker extends AbstractActor { Inherit default actor behavior, @Override state and mailbox implementation public Receive createReceive() { return receiveBuilder() The Receive class performs .match(String.class, this::respondTo) pattern matching and de-serialization .matchAny(object -> System.out.println("Could not understand received message")) Distributed Data Management .build(); A builder pattern for constructing } a Receive object with otherwise Akka Actor private void respondTo(String message) { many constructor arguments Programming System.out.println(message); this.sender().tell("Received your message, thank you!", this.self()); Thorsten Papenbrock } Send a response to the sender of the last message Slide 12 } (asynchronously, non-blocking) Basic Concepts Akka Actors

State Mailbox Actor 1 Actor 2

Behavior public class Worker extends AbstractActor { The message types (= classes) @Override define how the actor reacts public Receive createReceive() { return receiveBuilder() .match(String.class, s -> this.sender().tell("Hello!", this.self())) .match(Integer.class, i -> this.sender().tell(i * i, this.self())) Distributed Data .match(Doube.class, d -> this.sender().tell(d > 0 ? d : 0, this.self())) Management .match(MyMessage.class, s -> this.sender().tell(new YourMessage(), this.self())) Akka Actor .matchAny(object -> System.out.println("Could not understand received message")) Programming .build(); } Thorsten Papenbrock } Slide 13

Basic Concepts Akka Actors

State Mailbox Actor 1 Actor 2

Behavior public class Worker extends AbstractLoggingActor { @Override public Receive createReceive() { return receiveBuilder() .match(String.class, s -> this.sender().tell("Hello!", this.self())) .match(Integer.class, i -> this.sender().tell(i * i, this.self())) Distributed Data .match(Doube.class, d -> this.sender().tell(d > 0 ? d : 0, this.self())) Management .match(MyMessage.class, s -> this.sender().tell(new YourMessage(), this.self())) Akka Actor .matchAny(object -> this.log().error("Could not understand received message")) Programming .build(); } AbstractLoggingActor Thorsten Papenbrock } provides propper logging Slide 14

Basic Concepts Akka Actors

State Mailbox Actor 1 Actor 2

Behavior public class Worker extends AbstractLoggingActor { Good practice: public static class MyMessage implements Serializable {} Actors define their messages (provides a kind of interface description) @Override public Receive createReceive() { Distributed Data return receiveBuilder() Management .match(MyMessage.class, s -> this.sender().tell(new OtherActor.YourMessage(), this.self())) Akka Actor .matchAny(object -> this.log().error("Could not understand received message")) Programming .build(); } Thorsten Papenbrock } Slide 15

Basic Con. public class WorkerTest {

Testing private ActorSystem actorSystem;

@Before public void setUp() { this.actorSystem = ActorSystem.create(); }

@Test public void shouldWorkAsExpected() { new TestKit(this.actorSystem) {{ ActorRef worker = actorSystem.actorOf(Worker.props()); worker.tell(new Worker.WorkMessage(73), this.getRef());

Master.ResultMessage expectedMsg = new Master.ResultMessage(42); this.expectMsg(Duration.create(3, "secs"), expectedMsg); }}; Distributed Data } Management Akka Actor @After Programming public void tearDown() { this.actorSystem.terminate();

Thorsten Papenbrock } Slide 16 } Basic Concepts Some Further Nodes

Redundant API Calls . Due to Java-Scala interface mix  this.getContext() = this.context()  this.getSender() = this.sender()  …

More on that pattern later! Non-Blocking, asynchronous Blocking, synchronous Distributed Data . Tell messaging . Ask pattern Management . Java: someActor.tell(message) . Java: someActor.ask(message) Akka Actor Programming . Scala: someActor ! message . Scala: someActor ? message

Thorsten Papenbrock Slide 17 case class Calculate(items: List[String]) caseMessage class Work-(data:Passing String )Dataflow case class Result(value: Int) Actor Model: Akka class Worker extends Actor { val log = Logging(context.system, this)

def receive = { Akka Scala Interface case Work(data) => sender ! Result(handle(data)) case _ => log.info("received unknown message") A Master-Worker Example }

def handle(data: String): Int = { data.hashCode } } class Master(numWorkers: Int) extends Actor { val worker = context.actorOf(Props[Worker], name = "worker") Distributed Data Analytics def receive = { Akka Actor case "Hello master" => sender ! "Hello sender" Programming case Calculate(items) => for (i <- 0 until items.size) worker ! Work(item.get(i)) case Result(value) => log.info(value) case _ => log.info("received unknown message") Thorsten Papenbrock } Slide 18 } Akka Actor Programming Hands-on

. Actor Model (Recap) . Basic Concepts . Runtime Architecture . Demo . Messaging . Parallelization . Remoting . Clustering . Patterns . Homework

Thorsten Papenbrock Slide 19 Runtime Architecture Actor Hierarchies

Task- and data-parallelism Delegate work! Parent . Actors can dynamically create new actors

Supervision hierarchy . Creating actor (parent) supervises created actor (child) Child Child 1 2 Fault-tolerance Distributed Data . If child fails, parent can choose: Management Akka Actor . restart, resume, Programming stop, or escalate Grandchild Grandchild Grandchild 1 2 3 Thorsten Papenbrock Slide 20

Runtime Architecture Actor Lifecycles

Actor Lifecycle escalate()

actorOf()

resume() restart()

stop() Distributed Data Management Akka Actor Programming

ActorRef DeadLetterBox

Thorsten Papenbrock Slide 21 Runtime Architecture public class MyActor extends AbstractLoggingActor {

Actor Lifecycles @Override public void preStart() throws Exception { Listen to super.preStart(); DisassociatedEvents Actor Lifecycle this.context().system().eventStream() .subscribe(this.self(), DisassociatedEvent.class); . PreStart() } @Override . Called before actor is started public void postStop() throws Exception { . Initialization super.postStop(); this.log().info("Stopped {}.", this.self()); . PreRestart() } } Log that MyActor . Called before actor is restarted was stopped . Free resources (keeping resources that can be re-used)

. PostRestart() Distributed Data . Called after actor is restarted Management Akka Actor . Re-initialization (re-using resources if possible) Programming . PostStop()

Thorsten Papenbrock . Called after actor was stopped Slide 22 . Free resources Runtime Architecture Let It Crash

“Let it crash” philosophy Actor . Distributed systems are inherently prone to errors (because there is simply more to go wrong/break)  Message loss, unreachable mailboxes, crashing actors … . Make sure that critical code is supervised by some entity that knows how errors can be handled Actor . Then, if an error occurs, do not (desperately) try to fix it: let it crash!  Errors are propagated to supervisors that can deal better with them

Actor . Example: Actor discovers a parsing error and crashes  Its supervisor restarts the actor and resends the a fixed message

Thorsten Papenbrock Slide 23 Runtime Architecture Actor Hierarchies public class Master extends AbstractLoggingActor { public Master() { Master ActorRef worker = this.context().actorOf(Worker.props()); this.context().watch(worker); Receive Terminated- } messages for watched actors @Override public SupervisorStrategy supervisorStrategy() { return new OneForOneStrategy(3, Duration.create(10, TimeUnit.SECONDS), Worker DeciderBuilder.match(IOException.class, e -> restart()) .matchAny(e -> escalate()) .build()); Try 3 restarts in 10 seconds for } Distributed Data IOExceptions; otherwise } Management escalate Akka Actor public class Worker extends AbstractLoggingActor { Programming public static Props props() { return Props.create(Worker.class); Thorsten Papenbrock } Slide 24 } Create the Props telling the A context how to instantiate you factory pattern Runtime Architecture Each actor has a Actor Systems path/URL

ActorSystem User actors / System and remote . A named hierarchy of actors reside here (root guardian) actors reside here . Runs within one JVM process . Configures: user system . Actor dispatchers (guardian) (guardian) (that schedule actors on threads) . Global actor settings (e.g. mailbox types) remote- controller broker Distributed Data . Remote actor access System Management (e.g. addresses) Akka Actor Programming . …

Thorsten Papenbrock bookkeeper merchant worker Slide 25

Runtime Architecture Actor Systems

Event stream . Reacts on errors, new nodes, message ActorSystem sends, message loss, … Dispatcher . Assigns threads dynamically to actors . Transparent multi-threading . # Threads ≈ # CPU cores . # Actors > # CPU cores (usually many hundreds) Distributed Data  Over-provisioning! Idle actors don‘t Event Dispatcher Remoting Management stream Remoting bind resources Akka Actor

Programming . Resolves remote actor adresses . Sends messages over network Thorsten Papenbrock . serialization + de-serialization Events Threads Serializers Slide 26 Akka Actor Programming Hands-on

. Actor Model (Recap) . Basic Concepts . Runtime Architecture . Demo . Messaging . Parallelization . Remoting . Clustering . Patterns . Homework

Thorsten Papenbrock Slide 27 Demo “akka-tutorial” and “octopus”

Distributed Data Management Akka Actor Programming

Thorsten Papenbrock Slide 28 Demo master --host akka-tutorial slave --host --master

LogPrimes Range Address

Primes

Listener Primes Master create Worker

Validation

RemoteSystem Subscription Distributed Data Reaper Shepherd Slave Reaper Management Akka Actor Acknowledge Programming

Thorsten Papenbrock MasterActorSystem SlaveActorSystem Slide 29 Demo master --host octopus slave --host --master

Depth

Completion

Cluster Events Profiler Work Worker Cluster Events

MemberUp

MemberEvent Registration

Cluster Metrics Distributed Data Cluster Metrics Management MetricsChanged Listener Listener Akka Actor Programming

Thorsten Papenbrock MasterActorSystem SlaveActorSystem Slide 30 Akka Actor Programming Hands-on

. Actor Model (Recap) . Basic Concepts . Runtime Architecture . Messaging . Parallelization . Remoting . Clustering . Patterns . Demo . Homework

Thorsten Papenbrock Slide 31 Messaging Message Delivery Guarantees

Message delivery . at-most-once: each message is delivered zero or one times  no guaranteed delivery; no message duplication You can implement  highest performance; no implementation overhead at-least-once and exactly-once, with at-most-once!  fire-and-forget . at-least-once: each message is delivered one or more times  guaranteed delivery; possibly message duplication

 ok-isch performance; state in sender Distributed Data  send-and-acknowledge Management Akka Actor . exactly-once: each message is delivered once Programming  guaranteed delivery; no message duplication

Thorsten Papenbrock  bad performance; state in sender and receiver Slide 32

 send-and-acknowledge-and-deduplicate https://doc.akka.io/docs/akka/2.5/general/message-delivery-reliability.html

Messaging Message Delivery Guarantees

Message ordering . no ordering: all messages can be arbitrarily out of order  no guaranteed ordering  highest performance; no implementation overhead . sender-receiver ordering: all messages between specific sender-receiver pairs are ordered (by send order)  ordered individual communications  good performance; message broker simply sustains received order Distributed Data . total ordering: all messages are ordered Management (by send timestamps) Akka Actor Programming  serialized communication (see total-order-broadcast later in lecture)

 bad performance; global ordering Thorsten Papenbrock Slide 33

https://doc.akka.io/docs/akka/2.5/general/message-delivery-reliability.html Messaging A1

Message Delivery Guarantees A2

A3 Message ordering . sender-receiver ordering: all messages between specific sender-receiver pairs are ordered (by send order) “If X is delivered…” . Example:  No guaranteed delivery, i.e., messages may be dropped . Actor A sends messages M , M , M to A 1 1 2 3 2 and not arrive at A2!

. Actor A3 sends messages M4, M5, M6 to A2

 If M1 is delivered it must be delivered before M2 and M3 Distributed Data Management  If M2 is delivered it must be delivered before M3 Akka Actor  If M4 is delivered it must be delivered before M5 and M6 Programming

 If M5 is delivered it must be delivered before M6 Thorsten Papenbrock

 A2 can see messages from A1 interleaved with messages from A3 Slide 34

https://doc.akka.io/docs/akka/2.5/general/message-delivery-reliability.html Messaging A1

Message Delivery Guarantees A2

A3 Message ordering . sender-receiver ordering: all messages between specific sender-receiver pairs are ordered (by send order)

. Send order is not transitive: Framework does not know that M1 was forwarded, i.e., A1 sends M1 to A2 that M1 is “younger” than M2 A1 sends M2 to A3

A2 forwards M1 to A3  A may receive M and M in any order! 3 1 2 Distributed Data . Failure communication uses different channel: Although A2 causes M2, Management it is technically not the Akka Actor A1 has child A2 sender of M2 Programming A2 sends M1 to A1

A2 fails causing failure message M2 being send to A1 Thorsten Papenbrock Slide 35  A1 may receive M1 and M2 in any order! https://doc.akka.io/docs/akka/2.5/general/message-delivery-reliability.html Messaging Message Delivery Guarantees

Does the message ordering guarantee also hold for non-TCP messaging?

No: "The rule that for a given pair of actors, messages sent directly from the first to the second will not be received out-of-order holds for messages sent over the network with the TCP based Akka remote transport protocol." [1] So if you switch away from TCP, you loose that guarantee. The documentation also warns that the ordering guarantee can be violated by various factors, so we should not rely too much on it. If ordering is important, we need to add and check sequence numbers.

[1] https://doc.akka.io/docs/akka/2.5/general/message-delivery-reliability.html#how- Distributed Data does-local-ordering-relate-to-network-ordering Management Akka Actor Programming

Thorsten Papenbrock Slide 36 Messaging Pull vs. Push

Work Propagation

. Producer actors Producer Consumer generate work for other consumer actors . Push propagation: . Producers send work packages to their consumers immediately (in particular, data is copied over the network proactively) . Work is queued in the inboxes of the consumers . Fast work propagation; risk for message congestion/drops Distributed Data Management Akka Actor Programming You can have back-pressured mail boxes, but that kind of kills the non-blocking, fire-and-forget messaging Thorsten Papenbrock Slide 37 Messaging public class PullProducer extends AbstractLoggingActor { Pull vs. Push @Override public Receive createReceive() { return receiveBuilder() Work Propagation .match(NextMessage.class, this.sender().tell(this.workPackages.remove())) .matchAny(object -> this.log().info("Unknown message")) . Producer actors .build(); generate work for other } } consumer actors . Push propagation: . Producers send work packages to their consumers immediately (in particular, data is copied over the network proactively) . Work is queued in the inboxes of the consumers . Fast work propagation; risk for message congestion Distributed Data Management . Pull propagation: Akka Actor Programming . Consumers ask producers for more work if they are ready

. Work is queued in the producers’ states Thorsten Papenbrock Slide 38 . Slower work propagation; no risk for message congestion Messaging Netty . A (non-)blocking I/O client-server framework for Large Messages large file transfer

. Supports different Akka Messaging System transport protocols . Default configuration: (TCP, UDP, WebSockets, …)

Akka

Netty

TCP https://netty.io/wiki/ user- guide-for-4.x.html

 Capable of sending many small messages Distributed Data  Provides the delivery guarantees of TCP Management Akka Actor Programming  But: What if we need to send large amounts of data over the network?

Thorsten Papenbrock Slide 39 Large messages are broken down into frames that need to be re- assembled on the receiving side.

This blocks the TCP socket for other messages:

. Regular messages: risk of message congestion (sender) and idle times (receiver) . Heartbeat messages: risk of cluster partitions and split-brain scenarios

 But: What if we need to send large amounts of data over the network?

https://petabridge.com/blog/large-messages-and-sockets-in-akkadotnet/ Messaging Large Messages

Sending Large Messages . Use side channels for large data transfer  Different channel that does not block main channel messages  Transfer protocol that is optimized for large files (WebSockets, UDP, FTP, …) . Approach: 1. Send data via side channel 2. Send data references via Akka message when data is transferred

. Side channel examples: Distributed Data . Akka Streaming Management Akka Actor . Netty (one abstraction level deeper than Akka) Programming . Databases

Thorsten Papenbrock . … Slide 41

Akka Actor Programming Hands-on

. Actor Model (Recap) . Basic Concepts . Runtime Architecture . Demo . Messaging . Parallelization . Remoting . Clustering . Patterns . Homework

Thorsten Papenbrock Slide 42 Parallelization book Task- and Data-Parallelism Booking reserve Actor promote Task-parallelism pay bill . Distribute sub-tasks to different actors Reservation Payment Billing Promotion Actor Actor Actor Actor

book

Booking Data-parallelism Scheduler . Distribute chunks of data book Actor book book book to different actors

Booking Booking Booking Booking Actor Actor Actor Actor Parallelization Scheduler

Dynamic Parallelism . Actors often delegate work if they are responsible for … . Many tasks Task-parallalismparallelism . Compute-intensive tasks (with many subtasks) . Data-intensive tasks (with independent partitions) Data-parallelism . Work can be delegated to a dynamically managed pool of worker actors

Task Scheduling Push Propagation! Distributed Data . Strategies (see package akka.routing): Management . RoundRobinRoutingLogic . SmallestMailboxRoutingLogic Akka Actor Programming . BroadcastRoutingLogic . ConsistentHashingRoutingLogic

Thorsten Papenbrock . RandomRoutingLogic . BalancingRoutingLogic Slide 44 . … Parallelization Scheduler

Router workerRouter = new Router(new SmallestMailboxRoutingLogic());

for (int i = 0; i < this.numberOfWorkers; i++) { workerRouter = workerRouter.addRoutee(this.context().actorOf(Worker.props())); } Scala world: All objects are immutable! for (WorkPackage workMessage : this.workPackages) { workerRouter.route(workMessage, this.self()); }

Strategy defines the worker to be chosen Task Scheduling Distributed Data . Strategies (see package akka.routing): Management . RoundRobinRoutingLogic . SmallestMailboxRoutingLogic Akka Actor Programming . BroadcastRoutingLogic . ConsistentHashingRoutingLogic

Thorsten Papenbrock . RandomRoutingLogic . BalancingRoutingLogic Slide 45 . … Akka Actor Programming Hands-on

. Actor Model (Recap) . Basic Concepts . Runtime Architecture . Demo . Messaging . Parallelization . Remoting . Clustering . Patterns . Homework

Thorsten Papenbrock Slide 46 Remoting Serialization

/ / Serialization (root guardian) (root guardian) . Only messages to remote actors are serialized . Communication within one system: language-specific data types . Pointers and primitive values user user (guardian) . Communication via process boundaries: transparent serialization (guardian) . Serializable, Kryo, Protocol Buffers, … (configurable)

System1 System2 controller broker object Distributed Data worker reference Management Akka Actor Programming

byte Thorsten Papenbrock bookkeeper merchant subworker subworker sequence Slide 47 1 2 Remoting Serialization application.conf remote.conf, akka.conf, … however you call it akka { actor { provider = remote serializers { java = "akka.serialization.JavaSerializer“ kryo = "com.twitter.chill.akka.ConfiguredAkkaSerializer" } serialization-bindings { "java.io.Serializable" = kryo Sets the serialization class } for remote messages } remote { Distributed Data Management enabled-transports = ["akka.remote.netty.tcp"] netty.tcp { Akka Actor hostname = "192.168.0.5" Programming port = 7787 Configures transport protocol, hostname, and port } Thorsten Papenbrock } for remote actor systems Slide 48 } Remoting Serialization

Usually no no-arg constructor needed A Java Serializable class must do the following: 1. Implement the java.io.Serializable interface 2. Identify the fields that should be serializable  Means: declare non-seriablizable fields as “transient” 3. Have access to the no-arg constructor of its first non-serializable superclass  Means: define no-arg constructors only if non-serializable superclasses exists https://docs.oracle.com/javase/8/docs/platform/serialization/spec/serial-arch.html#a4539

Distributed Data No-arg constructor needed! Management A Java Kryo class must do the following: Akka Actor By default, if a class has a zero argument constructor then it is invoked via ReflectASM or Programming reflection, otherwise an exception is thrown.

Thorsten Papenbrock https://github.com/EsotericSoftware/kryo/blob/master/README.md Slide 49 Remoting public class Master extends AbstractLoggingActor { private ActorRef worker; Actor Lookup 1. A factory pattern public Master() { ActorRefs serve as pointers this.worker = this.context().actorOf(Worker.props()); to local/remote actors }

@Override public Receive createReceive() { 3. Actor 1 Actor 2 return receiveBuilder() .match(ActorRef.class, worker -> this.worker = worker) ? .matchAny(object -> this.log().error("Invalid message")) .build(); 1. By construction: } } . Create a child actor 2. By application: Distributed Data . Ask for a reference in your constructor or provide a setter Management Akka Actor 3. By message: Programming . Ask a known actor to send you a reference to another actor

Thorsten Papenbrock Slide 50 Remoting public class Master extends AbstractLoggingActor { private ActorRef worker; Actor Lookup ActorSelection is a logical view public Master() { to a subtree of an ActorSystems; Address address = new Address("akka.tcp", tell() broadcasts to that subtree “MyActorSystem", "localhost", 7877);

ActorSelection selection = this.context().system() .actorSelection(String.format( Actor 1 Actor 2 "%s/user/%s", address, "worker")); 4. selection.tell(new HelloMessage(), this.self()) ? } } 1. By construction: URL: . Create a child actor "akka.tcp://MyActorSystem@localhost:7877/user/worker" 2. By application: Distributed Data . Ask for a reference in your constructor or provide a setter Management Akka Actor 3. By message: Programming . Ask a known actor to send you a reference to another actor

Thorsten Papenbrock 4. By name (path/URL): Slide 51 . Ask the context to create a reference to an actor with a certain URL

Akka Actor Programming Hands-on

. Actor Model (Recap) . Basic Concepts . Runtime Architecture . Demo . Messaging . Parallelization . Remoting . Clustering . Patterns . Homework

Thorsten Papenbrock Slide 52 Clustering Cluster-Awareness

System1 System2

?

How does System1 know … Distributed Data . which other ActorSystems are available? Management (the number might even change at runtime!) Akka Actor Programming . what failures occurred in other ActorSystems? (single actors but also entire nodes might become unavailable!) Thorsten Papenbrock . what roles other ActorSystems take? Slide 53 (e.g. a master or worker or metrics collector or entirely different application!) Clustering Cluster-Awareness

Akka Actors Akka Cluster Akka Streams Akka Http

Core Actor Classes for the Asynchronous, Asynchronous, model classes resilient and non-blocking, streaming-first for concurrency elastic backpressured, HTTP server and and distribution distribution over reactive stream client classes multiple nodes classes

Cluster Sharding Akka Persistence Distributed Data Alpakka

Distributed Data Management Classes to Classes to Classes for an Stream decouple actors persist actor eventually connector Akka Actor from their state for fault consistent, classes to other Programming locations tolerance and distributed, technologies

referencing state restore replicated key- Thorsten Papenbrock them by identity after restarts value store Slide 54 Maven – pom.xml Clustering com.typesafe.akka Dependency akka-actor_${scala.version} 2.5.3 Clustering capabilities com.typesafe.akka cluster membership, singletons, akka-remote_${scala.version} publish/subscribe, cluster client, … 2.5.3 com.typesafe.akka akka-cluster-tools_${scala.version} Metrics collection 2.5.3 CPU load, memory consumption, … com.typesafe.akka akka-cluster-metrics_${scala.version} 2.5.3 Transparent actors logical references, distributed/persisted state, … com.typesafe.akka akka-cluster-sharding_${scala.version} 2.5.3 Slide 55 … Clustering Node Lifecycle for Cluster Membership

fd* = failure detector (see φ accrual failure detector later)

Distributed Data Management Akka Actor Programming

Thorsten Papenbrock Slide 56 Clustering Node Lifecycle for Cluster Membership

Who manages the cluster states of ActorSystems in the cluster exactly (joining, up, leaving, exiting, unreachable, down, removed)?

The "Cluster" is a distributed membership service inside our ActorSystems and it stores all membership information in a distributed, fully replicated key-value store (= Riak). This means that the states are basically stored on all nodes of the cluster. Changes to the state of one ActorSystem are propagated via gossipping [1]. The eldest node in a cluster automatically become the "leader" of the cluster and is then allowed to perform the state changes to "up", "exiting", and "removed" for members of the cluster [2]. The state changes to "joining" and "leaving" is something that the local Cluster service of each individual ActorSystem can probably do on its own (although I don't find a clear statement for that in the documentation). The state transition to Distributed Data Management "unreachable" is triggered by the failure detector, which also runs on all nodes (i.e. in each local Cluster service). Akka Actor Programming

[1] https://doc.akka.io/docs/akka/2.5/common/cluster.html#gossip Thorsten Papenbrock [2] https://doc.akka.io/docs/akka/2.5/common/cluster.html#membership-lifecycle Slide 57 Clustering Configuration application.conf akka { actor { instead of local or remote provider = cluster serializers { ... } } remote { enabled-transports = ["akka.remote.netty.tcp"] netty.tcp { hostname = "192.168.0.5" port = 7787 } } the cluster membership waits for these numbers of member before Distributed Data cluster { Management min-nr-of-members = 3 it sets their status to up role { Akka Actor Programming master.min-nr-of-members = 1 slave.min-nr-of-members = 2

} Thorsten Papenbrock } Slide 58 } Clustering Configuration

public static void start(String appName, String host, int port, String seedhost, int seedport) {

final Config config = ConfigFactory.parseString( Sets ActorSystem-specific "akka.remote.netty.tcp.hostname = \"" + host + "\"\n" + configuration parameters dynamically "akka.remote.netty.tcp.port = " + port + "\n" + Sets the seed node(s) for initial connect; "akka.remote.artery.canonical.hostname = \"" + host + "\"\n" + any connected node can be a seed node; "akka.remote.artery.canonical.port = " + port + "\n" + first node connects to itself creating the cluster "akka.cluster.roles = [slave]\n" + "akka.cluster.seed-nodes = [\"akka://" + appName + "@" + seedhost + ":" + seedport + "\"]") .withFallback(ConfigFactory.load("application"));

final ActorSystem system = ActorSystem.create(appName, config); system.registerOnTermination(new Runnable() { @Override Distributed Data Management public void run() { A runnable that is executed when the System.exit(0); Akka Actor ActorSystem is terminated } Programming }); End the application; } Thorsten Papenbrock allows the applications main thread to end Slide 59 Clustering Member Events

public static void start(String appName, String host, int port, String seedhost, int seedport) { […]

Cluster.get(system).registerOnMemberUp(new Runnable() { @Override public void run() { Run when this node has been set to status “up” for (int i = 0; i < 10; i++) system.actorOf(WorkerActor.props(), "worker" + i); } }); Create all local actors when “up” }

Retrieve the cluster object when needed Distributed Data Management (to access cluster events, failure detector, node status, …) Akka Actor Programming Cluster cluster = Cluster.get(this.context().system());

Thorsten Papenbrock Slide 60 Clustering Member Events

public static void start(String appName, String host, int port, String seedhost, int seedport) { […]

Cluster.get(system).registerOnMemberRemoved(new Runnable() { @Override public void run() { Run when this node has been “removed” system.terminate(); new Thread() { Terminate the ActorSystem @Override public void run() { try { Await.ready(system.whenTerminated(), Duration.create(10, TimeUnit.SECONDS)); } catch (Exception e) { System.exit(-1); Distributed Data Management } Let a different thread } Akka Actor wait for the ActorSystem to terminate Programming }.start(); and, then, terminate the application }

}); Thorsten Papenbrock } Slide 61 Clustering Member Events

What happens during cluster partitioning (i.e. in case one or many ActorSystems get separated from the others)?

If the cluster partitions, each partition will form its own cluster. From each partition's perspective, we see many other nodes failing so that only some nodes are left in our cluster. Because the Cluster service is fully replicated, each partition will quickly find a new leader (= oldest node in each partition) and continue operating. Consequently, a cluster partitioning removes only unreachable ActorSystems and no onMemberRemoved() callback is actually triggered. The callback serves only a clean shutdown of the Cluster. Each Cluster keeps track of all removed ActorSystems so that "[...] the same actor system can never join a cluster again once it’s been removed from that cluster" [1]; otherwise, Distributed Data the cluster could run into split-brain situations (= two leaders). To re-unite the nodes, the Management application logic needs to identify the "main" cluster, let all "non-main" clusters shutdown Akka Actor themselves, and restart ActorSystems on all affected nodes. Programming

Thorsten Papenbrock [1] https://doc.akka.io/docs/akka/2.5/common/cluster.html#cluster-specification Slide 62 public class WorkerActor extends AbstractActor {

@Override public void preStart() { Cluster.get(this.context().system()).subscribe(this.self(), MemberUp.class); } Subscribe to cluster events @Override public Receive createReceive() { return receiveBuilder() .match(CurrentClusterState.class, this::registerAll) .match(MemberUp.class, message -> this.register(message.member())) .match(WorkMessage.class, this::handle) .build(); Register at all masters after creation }

private void registerAll(CurrentClusterState message) { message.getMembers().forEach(member -> { if (member.status().equals(MemberStatus.up())) this.register(member); }); Distributed Data } Management Akka Actor private void register(Member member) { Register at any new master Programming if (member.hasRole("master")) this.getContext().actorSelection(member.address() + "/user/masteractor") .tell(new RegistrationMessage(), this.self()); Thorsten Papenbrock } Slide 63 } Clustering Member Events

public class MasterActor extends AbstractActor {

@Override public Receive createReceive() { return receiveBuilder() .match(RegistrationMessage.class, this::handle) .match(Terminated.class, this::handle) .match(CompletionMessage.class, this::handle) .build(); }

private void handle(RegistrationMessage message) { Simply watch the remote workers this.context().watch(this.sender()); this.workers.add(this.sender()); Distributed Data this.sender().tell(new WorkMessage()); Management } Akka Actor private void handle(Terminated message) { Programming this.context().unwatch(message.getActor());

this.workers.remove(this.sender()); Thorsten Papenbrock } Slide 64 } Clustering Cluster-Aware Scheduler

int maxWorkersPerNode = 10; Automatically spawns 10 workers per node int maxWorkersPerCluster = 1000000; and not more than 1,000,000 per cluster boolean allowLocalWorkers = true; Set roles = new HashSet<>(Arrays.asList("slave"));

ActorRef router = system.actorOf( Scheduling strategy: average load new ClusterRouterPool( new AdaptiveLoadBalancingPool(SystemLoadAverageMetricsSelector.getInstance(), 0), new ClusterRouterPoolSettings(maxWorkersPerCluster, maxWorkersPerNode, allowLocalWorkers, roles)))) .props(Props.create(WorkerActor.class)), "router"); application.conf Distributed Data Management akka { Akka Actor Programming extensions = ["akka.cluster.metrics.ClusterMetricsExtension"] cluster.metrics.native-library-extract-folder=${user.dir}/target/native

} Thorsten Papenbrock Slide 65

Akka Actor Programming Hands-on

. Actor Model (Recap) . Basic Concepts . Runtime Architecture . Demo . Messaging . Parallelization . Remoting . Clustering . Patterns . Homework

Thorsten Papenbrock Slide 66 Patterns Actor Programming Patterns

Actor programming is a mathematical model that defines basic rules for communication (not a style guide for architecture)

Writing actor-based systems is based on patterns

Distributed Data Management Akka Actor Programming

Thorsten Papenbrock Slide 67 Patterns Master/Worker

Distributed Data Management Akka Actor Programming

Thorsten Papenbrock Slide 68 Patterns book Master/Worker Booking reserve Actor promote Task-parallelism pay bill . Distribute sub-tasks to different actors Reservation Payment Billing Promotion Actor Actor Actor Actor

book

Booking Data-parallelism Scheduler . Distribute chunks of data book Actor book book book to different actors

Booking Booking Booking Booking Actor Actor Actor Actor Patterns Master/Worker

Work Propagation

. Producer actors Producer Consumer generate work for other consumer actors . Push propagation: . Producers send work immediately Scheduler . Work queued in inboxes . Fast propagation; risk for congestion Distributed Data . Pull propagation: Management . Consumers ask for more work Akka Actor Programming Master/Worker . Work queued in producer Pattern . Slower propagation; no congestion Thorsten Papenbrock Slide 70 Patterns Master/Worker A concept for … 1. Parallelization 2. Fault-Tolerance Master . Splits the task into work packages . Schedules the work packages to known workers . Watches available workers (register new workers; detect and unregister failed workers) . Monitors task completion (assign pending work packages; re-assign failed work packages) . Assemble final result (from partial work package results)  Does not know how to solve the individual tasks!

Worker Distributed Data . Register at master Management Akka Actor . Accept and process work packages Programming . Does not know the overall task!

Thorsten Papenbrock Slide 71 Patterns Master/Worker

Master

public class OneTaskMasterActor extends AbstractActor { work packages

private final Queue unassignedWork = new LinkedList<>(); private final Queue idleWorkers = new LinkedList<>(); worker private final Map busyWorkers = new HashMap<>(); private TaskMessage task; work packages + worker private Result result;

@Override public Receive createReceive() { Watch and try to assign work return receiveBuilder() .match(RegistrationMessage.class, this::handle) Un-watch and re-assign work Distributed Data .match(Terminated.class, this::handle) Management .match(TaskMessage.class, this::handle) Split and assign to workers .match(CompletionMessage.class, this::handle) Akka Actor Programming .build(); Collect and send new work }

Thorsten Papenbrock […] Slide 72 } Patterns Master/ Worker

Example

Distributed Data Management Akka Actor Programming

Thorsten Papenbrock Slide 73

http://www.akkaessentials.in/2012/03/implementing-master-slave-grid.html Patterns Proxy

Distributed Data Management Akka Actor Programming

Thorsten Papenbrock Slide 74 Patterns Proxy

Proxy Actor . Acts as an agent or surrogate for some other actor . Handles a certain (standard) task . Serves to … . externalize behavior/state (e.g., prevent cluttering code in real actor) . hide the real actor (e.g., protect against DOS attacks) . handle short-lived concepts (e.g., communications) . handle resource/time intensive actions (e.g., data transfer)  Other actors “think” they where talking to the real actor!

Patterns Proxy

Example: Simple Proxy . Delegate a new communication to a proxy . If the communication returns a result, the proxy reports it to the real actor

public class MyRealActor extends AbstractActor {

@Override public Receive createReceive() { return receiveBuilder() .match(HelloMessage.class, message -> { ActorRef proxy = this.context().actorOf(Proxy.props()); Distributed Data this.sender().tell(new HelloBackMessage(), proxy); Management }) .match(ProxyResultMessage.class, this::handle) Akka Actor Programming .build(); } } Thorsten Papenbrock Slide 76 Patterns Proxy

Example: Reliable Communication Proxy . Provides exactly-once messaging on top of at-most-once messaging . Implements an ACK–RETRY protocol

System1 System2

Supervision ensures reliable Sender communication Receiver

All the magic! Distributed Data Management Akka Actor Reliable intra Sender Receiver Reliable intra Programming ActorSystem Proxy Proxy ActorSystem send send Thorsten Papenbrock Slide 77 https://doc.akka.io/docs/akka/2.2.4/contrib/reliable-proxy.html#reliable-proxy Patterns Proxy

Example: Reliable Communication Proxy Sender Proxy Receiver Proxy . Adds sequence numbers to messages . Acknowledges received messages to Sender Proxy . Forwards messages to Receiver Proxy . Forwards acknowledged messages to Receiver . Stores messages until successfully . Detects missing/duplicate messages by checking acknowledged by Receiver Proxy sequence number of last forwarded message

. Adds a timeout to each message; waitingmessages retries send if acknowledgment message seq. NR prior) (awaiting does not arrive within timeout msgC 3

Distributed Data

.) message timeout Management

msgA 10:32:15 msgB Akka Actor

acknow Sender Receiver Programming msgB 10:32:18 Proxy Proxy Last msg. msgC 10:32:24 message seq. NR Thorsten Papenbrock

send messages send msgA 1 Slide 78 (awaiting https://doc.akka.io/docs/akka/2.2.4/contrib/reliable-proxy.html#reliable-proxy Patterns Proxy

Example: Reliable Communication Proxy Sender Proxy Receiver Proxy . Adds sequence numbers to messages . Acknowledges received messages to Sender Proxy . Forwards messages to Receiver Proxy . Forwards acknowledged messages to Receiver . Stores messages until successfully . Detects missing/duplicate messages by checking acknowledged by Receiver Proxy sequence number of last forwarded message

. Adds a timeout to each message; waitingmessages How does that work message seq. NR prior) (awaiting retries send if acknowledgment in actor programming? does not arrive within timeout msgC 3

Distributed Data

.) message timeout Management

msgA 10:32:15 msgB Akka Actor

acknow Sender Receiver Programming msgB 10:32:18 Proxy Proxy Last msg. msgC 10:32:24 message seq. NR Thorsten Papenbrock

send messages send msgA 1 Slide 79 (awaiting https://doc.akka.io/docs/akka/2.2.4/contrib/reliable-proxy.html#reliable-proxy Patterns Proxy

Example: Reliable Communication Proxy . Quick digression: akka.actor.Scheduler and akka.actor.Cancellable . Useful to schedule future and periodic events (e.g. message sends)

public class ReceiverProxy extends AbstractActor { […]

// On messageA receive Cancellable sendMessageA = this.getContext().system().scheduler().schedule( Duration.create(0, TimeUnit.SECONDS), Duration.create(3, TimeUnit.SECONDS), (Re-)send messageA receiverProxy, messageA, this.getContext().dispatcher(), null); to receiverProxy Distributed Data Management […] every 3 seconds Akka Actor // On messageA acknowledge Programming sendMessageA.cancle(); Stop resending messageA

[…] Thorsten Papenbrock } Slide 80 Patterns Proxy

Example: Reliable Communication Proxy . Provides exactly-once messaging on top of at-most-once messaging . Implements an ACK–RETRY protocol

System1 System2

Still fire-and-forget Sender messaging for Sender Receiver

Proxies add some overhead to ensure exactly-once Distributed Data Management Akka Actor Sender Receiver Programming Proxy Proxy

Thorsten Papenbrock Slide 81 https://doc.akka.io/docs/akka/2.2.4/contrib/reliable-proxy.html#reliable-proxy Patterns Ask

… … … / … … / … … … … … … … … … / … … … … / … … … … / … … … / … … … … … … … / … … … / … … / … 42

Distributed Data Management Akka Actor Programming

Thorsten Papenbrock Slide 82 Patterns Ask

Tell messaging . non-blocking, asynchronous, fire-and-forget . Java: someActor.tell(message) . Scala: someActor ! Message

Ask pattern . blocking, synchronous

. Java: someActor.ask(message) Distributed Data . Scala: someActor ? message Management Akka Actor . Returns a Future that the calling entity can wait for Programming . Implemented in akka.pattern.PatternsCS.ask

Thorsten Papenbrock and not a default message send option Slide 83 Patterns Ask

Ask pattern . Via ask, the Sender creates a Future that wraps a Proxy actor that tells the message to a Receiver with a timeout for that message . The Proxy completes the Future either when it receives a response or the timeout elapses

tell Sender complete Receiver Future Distributed Data Management Akka Actor ask tell Sender Programming Proxy

Thorsten Papenbrock timeout Slide 84 Patterns Ask

Ask pattern . Useful if … . the outside, non-actor world needs to communicate with an actor . an actor must not continue working until a response is received (very rare case) . Not a good solution to … . make the communication reliable, i.e., enable exactly-once messaging (use reliable proxy pattern) . implement timeouts for message sends Distributed Data (use scheduled tasks) Management Akka Actor Programming

Thorsten Papenbrock Slide 85 Patterns Ask

Why to avoid ask: . Paradigm violation . Synchronous calls break the strict decoupling of actors . Resource blocking . Actively waiting for other actors locks resources (in particular threads) . Inefficient messaging . Asking requires more effort than telling messages (e.g. timeouts)

Distributed Data Management Akka Actor Avoid ask if possible! Programming (the need to ask is usually the result of a bad architecture)

Thorsten Papenbrock Slide 86 Patterns Ask

Wait! We can use Futures with the Pipe-Pattern in a non-blocking way!

. Example from the official Akka Docu:

Distributed Data https://doc.akka.io/docs/ Management Akka Actor akka/2.5/actors.html Programming

Thorsten Papenbrock Slide 87 Patterns Ask

Actors vs. Futures (+ Pipes) . Futures are an alternative model for parallelization . There are many discussions on “Actors vs. Futures” as means for parallelization control  The question here is more which pattern for parallelization you prefer!

. Actors + Futures is a bad decision, because … For instance, Chris Stuccio’s blog: . the mix both models makes your code harder to understand and maintain https://www.chrisstucchio.com/ . callbacks are needed to avoid blocking blog/2013/actors_vs_futures.html . Why are callbacks bad in actor programming?

. Callbacks are executed by non-actor threads on the side Distributed Data (i.e., the callback thread might be completing a Future while the actor that created the Future Management might process a different message at the same time) Akka Actor Programming  Bad for debugging, parallelization control, resource management, failure handling, …  Prone to introduce shared mutable state and, hence, to destroy encapsulation Thorsten Papenbrock . Failure handling gets messed up, because the asked actor needs to reply with certain ask- Slide 88 specific error messages to influence its completion Patterns Singleton

Distributed Data Management Akka Actor Programming

Thorsten Papenbrock Slide 89 Patterns Singleton Register new worker, watch on failed worker, read cluster metrics, balance load, …

Persist or distribute state to restore after failure (see docu on “sharding”)

book Exist exactly once in the cluster!

Booking reserve Actor promote pay bill Distributed Data Management Akka Actor Reservation Payment Billing Promotion Programming Actor Actor Actor Actor

Thorsten Papenbrock Slide 90 Patterns Singleton

Motivation . Sometimes, there needs to be exactly one actor of some type, e.g., … . one Endpoint actor for external communication . one Leader actor for consensus enforcement . one Resource actor of some type First Idea: . Simply create that actor once in the cluster . Problems: 1. Requires a dedicated ActorSystem that is responsible for creating the singleton 2. If that ActorSystem dies, the singleton is unavailable until the ActorSystem is back 3. Starting the same dedicated ActorSystem twice might cause split brain

Thorsten Papenbrock 4. Every ActorSystem needs to know the address of the singleton Slide 91 . akka.cluster.singleton.ClusterSingletonManager Patterns Singleton . Runs in every ActorSystem (start early!) . Creates exactly one Singleton actor in the cluster (on the oldest node; singleton moves if node goes down) Cluster Singleton . akka.cluster.singleton.ClusterSingletonProxy

. Create one to communicate with singleton . Redirects messages to the current Singleton actor (buffering messages if singleton is temporarily unavailable)

Cluster- Cluster- Cluster- Cluster- Cluster- Singleton- Singleton- Singleton- Singleton- Singleton- Manager Manager Proxy Manager Proxy Distributed Data Management Akka Actor Programming Singleton ActorXY ActorYZ

Thorsten Papenbrock Slide 92 System1 System2 System3 . akka.cluster.singleton.ClusterSingletonManager Patterns . If an ActorSystem hosting the singleton dies, Singleton the singleton is re-created on the then oldest node

Cluster Singleton . akka.cluster.singleton.ClusterSingletonProxy

. Knows where the current singleton lives Note: and tracks singleton movements This is no “reliable proxy” so messages can get lost!

Cluster- Cluster- Cluster- Cluster- Cluster- Singleton- Singleton- Singleton- Singleton- Singleton- Manager Manager Proxy Manager Proxy Distributed Data Management Akka Actor Programming Singleton Singleton ActorXY ActorYZ

Thorsten Papenbrock Slide 93 System1 System2 System3 Patterns Singleton

Cluster Singleton

// On ActorSystem startup ActorRef manager = system.actorOf( ClusterSingletonManager.props( LeaderActor.props(), PoisonPill.class, ClusterSingletonManagerSettings.create(system).withRole("master")), "leaderManager");

// If an actor needs to talk to the singleton ActorRef proxy = system.actorOf( Distributed Data ClusterSingletonProxy.props( Management "/user/leaderManager", Akka Actor ClusterSingletonProxySettings.create(system).withRole("master")), Programming "leaderProxy");

proxy.tell(new HelloLeaderMessage(), this.self()); Thorsten Papenbrock Slide 94 Patterns Reaper

Distributed Data Management Akka Actor Programming

Thorsten Papenbrock Slide 95 Patterns Reaper Application Shutdown? Task vs. Actor shutdown . Tasks finish and vanish . Actors finish and wait for more work  Actors need to be notified to stop working

How to detect that an application has finished? . All mailboxes empty?  No: actors might still be working on messages (and produce new ones) Distributed Data . All mailboxes empty and all actors idle? Management  No: messages can still being transferred, i.e., on the network Akka Actor Programming . All mailboxes empty and all actors idle for “a longer time”?

Thorsten Papenbrock  No: actors might be idle for “longer times” if they wait for resources Slide 96  Only the application knows when it is done (e.g. a final result was produced) Patterns Reaper Application Shutdown? Problem . ActorSystems stay alive when the main application thread ends

Forced Shutdown . Kill the JVM process . Problems: . Risk of resource corruption (e.g. corrupted file if actor was writing to it) . Many, distributed JVM processes that need to be killed individually Distributed Data

Management Actor System terminate() Akka Actor Programming . Calling terminate() on an Actor System will stop() all its actors

. Problem: Thorsten Papenbrock Slide 97 . Remote Actor Systems are still alive Patterns Reaper

PoisonPill Shutdown . If an application is done, send a PoisonPill message to all actors . Actors automatically forward the PoisonPill to all children

. The PoisonPill finally stops an actor Use postStop() to also forward . Advantages: a PoisonPill to other actors . Pending messages prior to the PoisenPill are properly processed . PoisonPill propagates into all remote Actor Systems

actor actor Distributed Data Management 1 4 Akka Actor actor actor Programming application 2 6

Thorsten Papenbrock actor actor Slide 98 3 5 Patterns Reaper

PoisonPill Shutdown . If an application is done, send a PoisonPill message to all actors . Actors automatically forward the PoisonPill to all children

. The PoisonPill finally stops an actor Use postStop() to also forward . Advantages: a PoisonPill to other actors . Pending messages prior to the PoisenPill are properly processed . PoisonPill propagates into all remote Actor Systems

Distributed Data Management import akka.actor.PoisonPill; PoisonPill is an Akka message Akka Actor that is handled by all actors […] Programming

this.otherActor.tell(PoisonPill.getInstance(), ActorRef.noSender()); Thorsten Papenbrock Slide 99 Patterns Reaper

PoisonPill Shutdown . Problem: . If all actors are stopped, the Actor System is still running! . Solution: . Reaper Pattern

Reaper . A dedicated actor that “knows” all actors Distributed Data Management  “Reaps actor souls and ends the world!” Akka Actor Programming . Listens to death-events (Termination events)

. Call the terminate() function on the Actor System Thorsten Papenbrock if all actors have stopped (e.g. due to PoisonPills) Slide 100 See: http://letitcrash.com/post/30165507578/shutdown-patterns-in-akka-2 public class Reaper extends AbstractLoggingActor { Akka public static class WatchMeMessage implements Serializable { } Application Shutdown public static void watchWithDefaultReaper(AbstractActor actor) { ActorSelection reaper = actor.context().system().actorSelection("/user/reaper"); reaper.tell(new WatchMeMessage(), actor.self()); After creation, every actor needs } to register at its local reaper

private final Set watchees = new HashSet<>();

@Override public Receive createReceive() { return receiveBuilder() .match(WatchMeMessage.class, message -> { if (this.watchees.add(this.sender())) this.context().watch(this.sender()); Watch a new actor }) .match(Terminated.class, message -> { Distributed Data this.watchees.remove(this.sender()); Witness actor dying Management if (this.watchees.isEmpty()) Akka Actor this.context().system().terminate(); End the world Programming }) .matchAny(object -> this.log().error("Unknown message")) Thorsten Papenbrock .build(); Slide 101 } } Patterns Reaper

Reasons to die without a PoisonPill . If an actor’s parent dies, the orphaned actor dies too . If a client looses its master Actor System, it might decide to die . If an error occurs, the supervisor might choose to let the failing actor die

Distributed Data Management Akka Actor Programming

Thorsten Papenbrock Slide 102 Patterns Reaper

Stop a running system . What if the system operates an endless stream of jobs and should be stopped? . Send a custom termination message . Upon receiving this termination message, an actor should … 1. refuse all incoming new jobs 2. finish all current jobs (i.e., wait for other actors that work on it) 3. let child actors finish their jobs

4. stop child actors Distributed Data 5. stop itself Management Akka Actor Programming

Thorsten Papenbrock Slide 103 Patterns Further Reading

Akka documentation Example code @ GitHub: https://github.com/akka/akka-samples . http://doc.akka.io/docs/akka/current/java/index.html . http://doc.akka.io/docs/akka/current/scala/index.html Experiences, best practices, and patterns . http://letitcrash.com . http://akka.io/blog . https://github.com/sksamuel/akka-patterns

. Akka actor programming Distributed Data literature: Management Akka Actor Programming

Thorsten Papenbrock Slide 104 Patterns Further Reading

Example code @ GitHub: https://github.com/akka/akka-samples

Distributed Data Management Akka Actor Programming

Thorsten Papenbrock Slide 105 Akka Actor Programming Hands-on

. Actor Model (Recap) . Basic Concepts . Runtime Architecture . Demo . Messaging . Parallelization . Remoting . Clustering . Patterns . Homework

Thorsten Papenbrock Slide 106 Password Gene Cracking Analysis

passwords partners

Hash Mining prefixes hashes Linear Combination Homework Assignment

Password Cracking . Given: . 42 passwords (SHA-256 encrypted) . All passwords are numeric and have a fix length of six digits . Task: . Find clear text password for each hash (brute force approach, i.e., test all numbers) . Hint: private String hash(String password) { Distributed Data MessageDigest digest = MessageDigest.getInstance("SHA-256"); Management byte[] hashedBytes = digest.digest(line.getBytes("UTF-8")); Akka Actor StringBuffer stringBuffer = new StringBuffer(); Programming for (int i = 0; i < hashedBytes.length; i++) stringBuffer.append(Integer.toString((hashedBytes[i] & 0xff) + 0x100, 16).substring(1)); Thorsten Papenbrock return stringBuffer.toString(); Slide 108 } Homework Assignment

Gene Analysis . Given: . 42 RNA gene sequences . Task: . For each person, find the other person with the longest common gene subsequence (compare all RNA sequences pair-wise) . Hint: (a) Apriori-gen (b) Dynamic programming (c) Suffix tree (d) …

Thorsten Papenbrock Slide 109 Homework Assignment

Linear Combination . Given: . 42 numeric passwords . Task: . Find a linear combination of all numbers that resolves to 0 . In other words, solve:

Distributed Data a1 ∗ p1 + a2 ∗ p2 + + a42 ∗ p42 = 0 Management ⋯ Akka Actor Programming where ai ∈ {−1,1} and pi is the password of person i

Thorsten Papenbrock . Store each persons prefix ai Slide 110

Homework Assignment

Hash Mining . Given: . 42 prefixes and partners . Task: . For each person, hash the partner number in such a way that the resulting hash starts with

. Five “1”s, if the person’s prefix ai is “1”

. Five “0”s, if the person’s prefix ai is “-1” . The hash can be altered by adding a (random) nonce to the partner number

. Hint: nonce = rand.nextInt();

String hash = this.hash(content + nonce); ThorstenPapenbrock if (hash.startsWith(fullPrefix)) Slide 111 return hash; Homework Requirements to pass Assignment the assignment!

Algorithm . Implemented in Java or Scala . Supports cluster deployment, i.e., one master and arbitrary many slave ActorSystems . Schedules work to all slave ActorSystems . Starts with: . “java -jar YourAlgorithmName.jar master --workers 4 --slaves 4 --input students.csv” . “java -jar YourAlgorithmName.jar slave --workers 4 --host 198.172.0.24”

. As a master: Distributed Data Measure the time from “startManagement processing” to 1. Starts “workers” many local workers “print final result” (printing excluded) in milliseconds to console:Akka Actor 2. Waits for “slaves” ActorsSystems to join Programming long t1 = System.currentTimeMillis(); 3. Starts the processing of the “input” CSV file

[…] Thorsten Papenbrock 4. Outputs the final result to the console long t2 = System.currentTimeMillisSlide 112 (); System.out.println("Time: " + (t2 - t1)); 5. Ends all slave ActorSystems and its own ActorSystem Homework Solution Note: IDs start with 1

Hashes may be different in your solution (depending on your nonce) but the prefixes should be the same

Password Cracking: 42 sec Linear Combination: 101 sec Gene Analysis: 42 sec Distributed Data Hash Mining: 104 sec Management Akka Actor Programming

Thorsten Papenbrock Slide 113 Homework Rules

Assignment . Submission deadline . 27.11.2018 09:15:00 (before the lecture) . Submission channel . Email to [email protected] . Submission artifacts 1. Source code as zip (Maven project; Java or Scala)

2. Jar file as zip Distributed Data 3. Architecture pdf or ppt slide Management Akka Actor . As email attachments or links to dropbox/owncloud/… folders Programming . Teams

Thorsten Papenbrock . Please solve the homework in teams of two students Slide 114 . Provide the names of both students in your submission email

Homework Evaluation 8 Cores

Best Team wins a price!

20 Cores 20 Cores

Distributed Data 20 Cores 20 Cores Management Akka Actor Programming

Thorsten Papenbrock Slide 115 “I wait for green”

“Road ahead is free!”

“I wait for crossing traffic”

“I accelerate!”

“You are not in my path!” Distributed Data Analytics Thorsten Papenbrock Introduction G-3.1.09, Campus III “Attention, I break!” Hasso Plattner Institut