Distributed Programming and Remote Procedure Calls (RPC)

George Porter CSE 124 February 17, 2015 Announcements

■ Project 2 checkpoint – use submission site or email if having difficulties.

■ Project 1 recap

■ SQL vs. Hadoop

■ Graphing tools • Log scale

■ Change in schedule: • Thursday will be an in-class tutorial on RPC with Facebook’s Thrift framework Project 1 recap Linear scale

Logarithmic scale Part 1: RPC Concepts Remote Procedure Call (RPC)

■ Distributed programming is challenging • Need common primitives/abstraction to hide complexity • E.g., file system abstraction to hide block layout, process abstraction for scheduling/fault isolation

■ In early 1980’s, researchers at PARC noticed most distributed programming took form of remote procedure call

■ Popular variant: Remote Method Invocation (RMI) • Obtain handle to remote object, invoke method on object • Transparency goal: remote object appear as if local object RPC Example

Local computing Remote/RPC computing

X = 3 * 10; server = connectToServer(S); print(X) Try: > 30 X = server.mult(3,10); print(X) Except e: print “Error!” > 30

or > Error! Why RPC?

■ Offload computation from devices to servers • Phones, raspberry pi, sensors, Nest thermostat, …

■ Hide the implementation of functionality • Proprietary algorithms

■ Functions that just can’t run locally Why RPC?

■ Functions that just can’t run locally • tags = Facebook.MatchFacesInPhoto(photo) • Print tags > [“Chuck Thacker”, “Leslie Lamport”] What makes RPC hard?

■ Network vs. computer backplane • Message loss, message reorder, …

■ Heterogeneity • Client and Server might have different: OS versions Languages Endian-ness Hardware architectures … Leslie Lamport

“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable” RPC Components

■ End-to-end RPC protocol • Defines messages, message exchange behavior, …

■ Programming language support • Turn “local” functions/methods into RPC • Package up arguments to the method/function, unpackage on the server, … • Called a “stub compiler” • Process of packaging and unpackaging arguments is called “Marshalling” and “Unmarshalling” High-level overview Overview

■ RPC Protocol itself

■ Stub compiler / mashalling

■ RPC Frameworks Remote Procedure Call Issues

■ Underlying protocol • UDP vs. TCP • Advantages/Disadvantages?

■ Semantics • What happens on network/host failure? • No fate sharing (compared to local machine case)

■ Transparency • Hide network communication/failure from programmer • With language support, can make remote procedure call “look just like” local procedure call Identifying remote function/methods

■ Namespace vs flat • Issues w.r.t. flat: how to assign?

■ Matching requests with responses • Can’t just rely on TCP Why?? • What if the client or server fail during execution?

■ Main idea: • Message ID • Client blocked until response returned w/ Message ID When things go wrong

■ Let’s assume client ■ Server gets first and server use TCP request(0), executes it, send response ■ Client issues request(0) ■ Server get the second request, thinks it is a ■ Client fails, reboots duplicate, and so ■ Client issues unrelated ACKs it but doesn’t request(0) execute

■ Client never gets a response to either request. Boot ID + Message ID

■ Idea is to keep non-volatile state that is updated every time the machine boots

■ Incremented during the boot-up process

■ Each request/response is uniquely identified by a (bootID,MessageID) pair Client/Server Interaction

Client Server ● Client sends request SendReq() RecvReq() i Client waits for response (blocking) RecvResp() ● Server wakes up SendResp() h Computes, sends response

h Response wakes up client ● Server structure?

h Process, thread, FIFO Reliable RPC (explicit acknowledgement) Reliable RPC (implicit acknowledgement) What about failures?

■ Different options: 1. Server can keep track of the (BootId,MessageId) and ignore any duplicate requests 2. Server doesn’t keep state—might execute duplicate requests

■ What are advantages/disadvantages? RPC Semantics

Delivery Guarantees RPC Call Retry Duplicate Retransmit Semantics Request Filtering Response

No NA NA Maybe

Re-execute Yes No At-least once Procedure Retransmit Yes Yes At-most once reply Remote Procedure Call Issues

■ Idempotent operations • Can you re-execute operations without harmful side effects • Idempotency means that executing the same procedure does not have visible side effects

■ Timeout value • How long to wait before re-transmitting request? Protocol-to-Protocol Interface

■ Send/Receive semantics • Synchronous vs. Asynchronous

■ Process Model • Single process vs. multiple process • Avoid context switches

■ Buffer Model • Avoid data copies Part 2: RPC Implementations Request / Response Flow

Name Server

RPC 3. Request RPC

Client Stub 4. Response Stub Server

■ Client stub indicates which procedure should run at server • Marshals arguments

■ Server stub unmarshals arguments • Big switch statement to determine local procedure to run RPC Mechanics

■ Client issues request by calling stub procedure • Stubs can be automatically generated with compiler support

■ Stub procedure marshals arguments, transmits requests, blocks waiting for response • RPC layer deals with network issues (e.g., TCP vs. UDP)

■ Server stub • Unmarshals arguments • Determines correct local procedure to invoke, computes • Marshals results, transmits to client • Can also be automatically generated with language support RPC Binding

■ Binding • Servers register the service they provide to a name server • Clients lookup services they need from server • Bind to server for a particular set of remote procedures • Operation informs RPC layer which machine to transmit requests to

■ How to locate the RPC name service? Presentation Formatting

■ Marshalling (encoding) application data into messages

■ Unmarshalling (decoding) Application Application messages into application data data data

Presentation Presentation encoding decoding ■ Data types to consider: … • integers Message Message Message • floats • strings Types of data we do not consider • arrays images • structs video multimedia documents Difficulties

■ Representation of base types • Floating point: IEEE 754 versus non-standard • Integer: big-endian versus little-endian (e.g., 34,677,374)

(2) (17) (34) (126) Big-endian 00000010 00 0 1 0 0 0 1 00 1 0 0 0 1 0 01 1 1 1 1 1 0

(126) (34) (17) (2) Little-endian 01 1 1 1 1 1 0 00 1 0 0 0 1 0 00 0 1 0 0 0 1 00 0 0 0 0 1 0

High Low address address

■ Compiler layout of structures Taxonomy

■ Data types • Base types (e.g., ints, floats); must convert • Flat types (e.g., structures, arrays); must pack • Complex types (e.g., pointers); must linearize

Application data structure

Marshaller RPC Interface vs. Implementation

■ RPC Interface: • High-level procedure invocation with arguments, return type • Asynchronous, Synchronous, ‘void’, Pipelined...

■ RPC Implementation: • SunRPC • Java RMI • XML RPC • • Google * • SunRPC

0 31 0 31 ■ Originally implemented for popular NFS (network file service) XID XID ■ XID (transaction id) uniquely MsgType = CALL MsgType = REPLY identifies request RPCVersion = 2 Status = ACCEPTED ■ Server does not remember last XID it Program Data serviced Version • Problem if client retransmits request while reply is in transit Procedure • Provides at-least once semantics Credentials (variable) Verifier (variable) Data XML-RPC

■ XML is a standard for describing structured documents • Uses tags to define structure: demarcates an element Tags have no predefined semantics … … except when document refers to a specific namespace • Elements can have attributes, which are encoded as name-value pairs A well-formed XML document corresponds to an element tree

SumAndDifference 40 10

(thanks to Vijay Karamcheti) XML-RPC Wire Format

■ Scalar values • Represented by a block

■ Integer • 12

■ Boolean • 0

■ String • Hello world

■ Double • 11.4368

■ Also Base64 (binary), DateTime, etc. XML-RPC Wire Format (struct)

■ Structures • Represented as a set of s • Each member contains a and a

lowerBound 18 upperBound 139 XML-RPC Wire Format (Arrays)

■ Arrays • A single element, which • contains any number of elements

12 Egypt 0 -31 XML-RPC Request

■ HTTP POST message • URI interpreted in an implementation-specific fashion • Method name passed to the server program ■ POST /RPC2 HTTP/1.1 Content-Type: text/xml User-Agent: XML-RPC.NET Content-Length: 278 Expect: 100-continue Connection: Keep-Alive Host: localhost:8080 SumAndDifference 40 10 XML-RPC Response

■ HTTP Response • Lower-level error returned as an HTTP error code • Application-level errors returned as a element (next slide)

■ HTTP/1.1 200 OK Date: Mon, 22 Sep 2003 21:52:34 GMT Server: Microsoft-IIS/6.0 Content-Type: text/xml Content-Length: 467 sum50 diff30 XML-RPC Fault Handling

■ Another kind of a MethodResponse

faultCode 500 faultString Arg `a’ out of range RMI Architecture

Java Client Java Server

Invoke Method A Object B on Object B Method A

RMI Object Registry Stub Skeleton Object B Maps object Object B names to locations Distributed Computing Services Services

RMI Transport Protocol

(thanks to David Del Vecchio) RMI Example (Server) Hello.java import java.rmi.*; public interface HelloInterface extends Remote { public String say() throws RemoteException; } HelloInterface.java import java.rmi.*; import java.rmi.server.*; public class Hello extends UnicastRemoteObject implements HelloInterface { private String message;

public Hello (String msg) throws RemoteException { message = msg; }

public String say() throws RemoteException { return message; }

public static void main (String[] args) { System.setSecurityManager(new RMISecurityManager()); try { Naming.bind ("Hello", new Hello ("Hello, world!")); } catch (Exception e) { System.out.println ("Server failed”); } } } RMI Example (Client)

HelloClient.java import java.rmi.*; public class HelloClient { public static void main (String[] args) { System.setSecurityManager(new RMISecurityManager()); try { HelloInterface hello = (HelloInterface) Naming.lookup ("//(hostname)/Hello"); System.out.println (hello.say()); } catch (Exception e) { System.out.println ("HelloClient exception: " + e); } } }

■ The rmic utility generates stubs and skeletons ■ Once started, server will stay active ■ RMI daemon (rmid) can be used to create (activate) objects on the fly Apache Thrift