Distributed Programming and Remote Procedure Calls (RPC)
George Porter CSE 124 February 17, 2015 Announcements
■ Project 2 checkpoint – use submission site or email if having difficulties.
■ Project 1 recap
■ SQL vs. Hadoop
■ Graphing tools • Log scale
■ Change in schedule: • Thursday will be an in-class tutorial on RPC with Facebook’s Thrift framework Project 1 recap Linear scale
Logarithmic scale Part 1: RPC Concepts Remote Procedure Call (RPC)
■ Distributed programming is challenging • Need common primitives/abstraction to hide complexity • E.g., file system abstraction to hide block layout, process abstraction for scheduling/fault isolation
■ In early 1980’s, researchers at PARC noticed most distributed programming took form of remote procedure call
■ Popular variant: Remote Method Invocation (RMI) • Obtain handle to remote object, invoke method on object • Transparency goal: remote object appear as if local object RPC Example
Local computing Remote/RPC computing
X = 3 * 10; server = connectToServer(S); print(X) Try: > 30 X = server.mult(3,10); print(X) Except e: print “Error!” > 30
or > Error! Why RPC?
■ Offload computation from devices to servers • Phones, raspberry pi, sensors, Nest thermostat, …
■ Hide the implementation of functionality • Proprietary algorithms
■ Functions that just can’t run locally Why RPC?
■ Functions that just can’t run locally • tags = Facebook.MatchFacesInPhoto(photo) • Print tags > [“Chuck Thacker”, “Leslie Lamport”] What makes RPC hard?
■ Network vs. computer backplane • Message loss, message reorder, …
■ Heterogeneity • Client and Server might have different: OS versions Languages Endian-ness Hardware architectures … Leslie Lamport
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable” RPC Components
■ End-to-end RPC protocol • Defines messages, message exchange behavior, …
■ Programming language support • Turn “local” functions/methods into RPC • Package up arguments to the method/function, unpackage on the server, … • Called a “stub compiler” • Process of packaging and unpackaging arguments is called “Marshalling” and “Unmarshalling” High-level overview Overview
■ RPC Protocol itself
■ Stub compiler / mashalling
■ RPC Frameworks Remote Procedure Call Issues
■ Underlying protocol • UDP vs. TCP • Advantages/Disadvantages?
■ Semantics • What happens on network/host failure? • No fate sharing (compared to local machine case)
■ Transparency • Hide network communication/failure from programmer • With language support, can make remote procedure call “look just like” local procedure call Identifying remote function/methods
■ Namespace vs flat • Issues w.r.t. flat: how to assign?
■ Matching requests with responses • Can’t just rely on TCP Why?? • What if the client or server fail during execution?
■ Main idea: • Message ID • Client blocked until response returned w/ Message ID When things go wrong
■ Let’s assume client ■ Server gets first and server use TCP request(0), executes it, send response ■ Client issues request(0) ■ Server get the second request, thinks it is a ■ Client fails, reboots duplicate, and so ■ Client issues unrelated ACKs it but doesn’t request(0) execute
■ Client never gets a response to either request. Boot ID + Message ID
■ Idea is to keep non-volatile state that is updated every time the machine boots
■ Incremented during the boot-up process
■ Each request/response is uniquely identified by a (bootID,MessageID) pair Client/Server Interaction
Client Server ● Client sends request SendReq() RecvReq() i Client waits for response (blocking) RecvResp() ● Server wakes up SendResp() h Computes, sends response
h Response wakes up client ● Server structure?
h Process, thread, FIFO Reliable RPC (explicit acknowledgement) Reliable RPC (implicit acknowledgement) What about failures?
■ Different options: 1. Server can keep track of the (BootId,MessageId) and ignore any duplicate requests 2. Server doesn’t keep state—might execute duplicate requests
■ What are advantages/disadvantages? RPC Semantics
Delivery Guarantees RPC Call Retry Duplicate Retransmit Semantics Request Filtering Response
No NA NA Maybe
Re-execute Yes No At-least once Procedure Retransmit Yes Yes At-most once reply Remote Procedure Call Issues
■ Idempotent operations • Can you re-execute operations without harmful side effects • Idempotency means that executing the same procedure does not have visible side effects
■ Timeout value • How long to wait before re-transmitting request? Protocol-to-Protocol Interface
■ Send/Receive semantics • Synchronous vs. Asynchronous
■ Process Model • Single process vs. multiple process • Avoid context switches
■ Buffer Model • Avoid data copies Part 2: RPC Implementations Request / Response Flow
Name Server
RPC 3. Request RPC
Client Stub 4. Response Stub Server
■ Client stub indicates which procedure should run at server • Marshals arguments
■ Server stub unmarshals arguments • Big switch statement to determine local procedure to run RPC Mechanics
■ Client issues request by calling stub procedure • Stubs can be automatically generated with compiler support
■ Stub procedure marshals arguments, transmits requests, blocks waiting for response • RPC layer deals with network issues (e.g., TCP vs. UDP)
■ Server stub • Unmarshals arguments • Determines correct local procedure to invoke, computes • Marshals results, transmits to client • Can also be automatically generated with language support RPC Binding
■ Binding • Servers register the service they provide to a name server • Clients lookup services they need from server • Bind to server for a particular set of remote procedures • Operation informs RPC layer which machine to transmit requests to
■ How to locate the RPC name service? Presentation Formatting
■ Marshalling (encoding) application data into messages
■ Unmarshalling (decoding) Application Application messages into application data data data
Presentation Presentation encoding decoding ■ Data types to consider: … • integers Message Message Message • floats • strings Types of data we do not consider • arrays images • structs video multimedia documents Difficulties
■ Representation of base types • Floating point: IEEE 754 versus non-standard • Integer: big-endian versus little-endian (e.g., 34,677,374)
(2) (17) (34) (126) Big-endian 00000010 00 0 1 0 0 0 1 00 1 0 0 0 1 0 01 1 1 1 1 1 0
(126) (34) (17) (2) Little-endian 01 1 1 1 1 1 0 00 1 0 0 0 1 0 00 0 1 0 0 0 1 00 0 0 0 0 1 0
High Low address address
■ Compiler layout of structures Taxonomy
■ Data types • Base types (e.g., ints, floats); must convert • Flat types (e.g., structures, arrays); must pack • Complex types (e.g., pointers); must linearize
Application data structure
Marshaller RPC Interface vs. Implementation
■ RPC Interface: • High-level procedure invocation with arguments, return type • Asynchronous, Synchronous, ‘void’, Pipelined...
■ RPC Implementation: • SunRPC • Java RMI • XML RPC • Apache Thrift • Google Protocol Buffers* • Apache Avro SunRPC
0 31 0 31 ■ Originally implemented for popular NFS (network file service) XID XID ■ XID (transaction id) uniquely MsgType = CALL MsgType = REPLY identifies request RPCVersion = 2 Status = ACCEPTED ■ Server does not remember last XID it Program Data serviced Version • Problem if client retransmits request while reply is in transit Procedure • Provides at-least once semantics Credentials (variable) Verifier (variable) Data XML-RPC
■ XML is a standard for describing structured documents • Uses tags to define structure:
■
(thanks to Vijay Karamcheti) XML-RPC Wire Format
■ Scalar values • Represented by a
■ Integer •
■ Boolean •
■ String •
■ Double •
■ Also Base64 (binary), DateTime, etc. XML-RPC Wire Format (struct)
■ Structures • Represented as a set of
■
■ Arrays • A single element, which • contains any number of
■
■ HTTP POST message • URI interpreted in an implementation-specific fashion • Method name passed to the server program ■ POST /RPC2 HTTP/1.1 Content-Type: text/xml User-Agent: XML-RPC.NET Content-Length: 278 Expect: 100-continue Connection: Keep-Alive Host: localhost:8080
■ HTTP Response • Lower-level error returned as an HTTP error code • Application-level errors returned as a
■ HTTP/1.1 200 OK Date: Mon, 22 Sep 2003 21:52:34 GMT Server: Microsoft-IIS/6.0 Content-Type: text/xml Content-Length: 467
■ Another kind of a MethodResponse
■
Java Client Java Server
Invoke Method A Object B on Object B Method A
RMI Object Registry Stub Skeleton Object B Maps object Object B names to locations Distributed Distributed Computing Computing Services Services
RMI Transport Protocol
(thanks to David Del Vecchio) RMI Example (Server) Hello.java import java.rmi.*; public interface HelloInterface extends Remote { public String say() throws RemoteException; } HelloInterface.java import java.rmi.*; import java.rmi.server.*; public class Hello extends UnicastRemoteObject implements HelloInterface { private String message;
public Hello (String msg) throws RemoteException { message = msg; }
public String say() throws RemoteException { return message; }
public static void main (String[] args) { System.setSecurityManager(new RMISecurityManager()); try { Naming.bind ("Hello", new Hello ("Hello, world!")); } catch (Exception e) { System.out.println ("Server failed”); } } } RMI Example (Client)
HelloClient.java import java.rmi.*; public class HelloClient { public static void main (String[] args) { System.setSecurityManager(new RMISecurityManager()); try { HelloInterface hello = (HelloInterface) Naming.lookup ("//(hostname)/Hello"); System.out.println (hello.say()); } catch (Exception e) { System.out.println ("HelloClient exception: " + e); } } }
■ The rmic utility generates stubs and skeletons ■ Once started, server will stay active ■ RMI daemon (rmid) can be used to create (activate) objects on the fly Apache Thrift