Distributed Programming and Remote Procedure Calls (RPC)
Total Page:16
File Type:pdf, Size:1020Kb
Distributed Programming and Remote Procedure Calls (RPC) George Porter CSE 124 February 17, 2015 Announcements ■ Project 2 checkpoint – use submission site or email if having difficulties. ■ Project 1 recap ■ SQL vs. Hadoop ■ Graphing tools • Log scale ■ Change in schedule: • Thursday will be an in-class tutorial on RPC with Facebook’s Thrift framework Project 1 recap Linear scale Logarithmic scale Part 1: RPC Concepts Remote Procedure Call (RPC) ■ Distributed programming is challenging • Need common primitives/abstraction to hide complexity • E.g., file system abstraction to hide block layout, process abstraction for scheduling/fault isolation ■ In early 1980’s, researchers at PARC noticed most distributed programming took form of remote procedure call ■ Popular variant: Remote Method Invocation (RMI) • Obtain handle to remote object, invoke method on object • Transparency goal: remote object appear as if local object RPC Example Local computing Remote/RPC computing X = 3 * 10; server = connectToServer(S); print(X) Try: > 30 X = server.mult(3,10); print(X) Except e: print “Error!” > 30 or > Error! Why RPC? ■ Offload computation from devices to servers • Phones, raspberry pi, sensors, Nest thermostat, … ■ Hide the implementation of functionality • Proprietary algorithms ■ Functions that just can’t run locally Why RPC? ■ Functions that just can’t run locally • tags = Facebook.MatchFacesInPhoto(photo) • Print tags > [“Chuck Thacker”, “Leslie Lamport”] What makes RPC hard? ■ Network vs. computer backplane • Message loss, message reorder, … ■ Heterogeneity • Client and Server might have different: OS versions Languages Endian-ness Hardware architectures … Leslie Lamport “A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable” RPC Components ■ End-to-end RPC protocol • Defines messages, message exchange behavior, … ■ Programming language support • Turn “local” functions/methods into RPC • Package up arguments to the method/function, unpackage on the server, … • Called a “stub compiler” • Process of packaging and unpackaging arguments is called “Marshalling” and “Unmarshalling” High-level overview Overview ■ RPC Protocol itself ■ Stub compiler / mashalling ■ RPC Frameworks Remote Procedure Call Issues ■ Underlying protocol • UDP vs. TCP • Advantages/Disadvantages? ■ Semantics • What happens on network/host failure? • No fate sharing (compared to local machine case) ■ Transparency • Hide network communication/failure from programmer • With language support, can make remote procedure call “look just like” local procedure call Identifying remote function/methods ■ Namespace vs flat • Issues w.r.t. flat: how to assign? ■ Matching requests with responses • Can’t just rely on TCP Why?? • What if the client or server fail during execution? ■ Main idea: • Message ID • Client blocked until response returned w/ Message ID When things go wrong ■ Let’s assume client ■ Server gets first and server use TCP request(0), executes it, send response ■ Client issues request(0) ■ Server get the second request, thinks it is a ■ Client fails, reboots duplicate, and so ■ Client issues unrelated ACKs it but doesn’t request(0) execute ■ Client never gets a response to either request. Boot ID + Message ID ■ Idea is to keep non-volatile state that is updated every time the machine boots ■ Incremented during the boot-up process ■ Each request/response is uniquely identified by a (bootID,MessageID) pair Client/Server Interaction Client Server ● Client sends request SendReq() RecvReq() i Client waits for response (blocking) RecvResp() ● Server wakes up SendResp() h Computes, sends response h Response wakes up client ● Server structure? h Process, thread, FIFO Reliable RPC (explicit acknowledgement) Reliable RPC (implicit acknowledgement) What about failures? ■ Different options: 1. Server can keep track of the (BootId,MessageId) and ignore any duplicate requests 2. Server doesn’t keep state—might execute duplicate requests ■ What are advantages/disadvantages? RPC Semantics Delivery Guarantees RPC Call Retry Duplicate Retransmit Semantics Request Filtering Response No NA NA Maybe Re-execute Yes No At-least once Procedure Retransmit Yes Yes At-most once reply Remote Procedure Call Issues ■ Idempotent operations • Can you re-execute operations without harmful side effects • Idempotency means that executing the same procedure does not have visible side effects ■ Timeout value • How long to wait before re-transmitting request? Protocol-to-Protocol Interface ■ Send/Receive semantics • Synchronous vs. Asynchronous ■ Process Model • Single process vs. multiple process • Avoid context switches ■ Buffer Model • Avoid data copies Part 2: RPC Implementations Request / Response Flow Name Server RPC 3. Request RPC Client Stub 4. Response Stub Server ■ Client stub indicates which procedure should run at server • Marshals arguments ■ Server stub unmarshals arguments • Big switch statement to determine local procedure to run RPC Mechanics ■ Client issues request by calling stub procedure • Stubs can be automatically generated with compiler support ■ Stub procedure marshals arguments, transmits requests, blocks waiting for response • RPC layer deals with network issues (e.g., TCP vs. UDP) ■ Server stub • Unmarshals arguments • Determines correct local procedure to invoke, computes • Marshals results, transmits to client • Can also be automatically generated with language support RPC Binding ■ Binding • Servers register the service they provide to a name server • Clients lookup services they need from server • Bind to server for a particular set of remote procedures • Operation informs RPC layer which machine to transmit requests to ■ How to locate the RPC name service? Presentation Formatting ■ Marshalling (encoding) application data into messages ■ Unmarshalling (decoding) Application Application messages into application data data data Presentation Presentation encoding decoding ■ Data types to consider: … • integers Message Message Message • floats • strings Types of data we do not consider • arrays images • structs video multimedia documents Difficulties ■ Representation of base types • Floating point: IEEE 754 versus non-standard • Integer: big-endian versus little-endian (e.g., 34,677,374) (2) (17) (34) (126) Big-endian 00000010 00 0 1 0 0 0 1 00 1 0 0 0 1 0 01 1 1 1 1 1 0 (126) (34) (17) (2) Little-endian 01 1 1 1 1 1 0 00 1 0 0 0 1 0 00 0 1 0 0 0 1 00 0 0 0 0 1 0 High Low address address ■ Compiler layout of structures Taxonomy ■ Data types • Base types (e.g., ints, floats); must convert • Flat types (e.g., structures, arrays); must pack • Complex types (e.g., pointers); must linearize Application data structure Marshaller RPC Interface vs. Implementation ■ RPC Interface: • High-level procedure invocation with arguments, return type • Asynchronous, Synchronous, ‘void’, Pipelined... ■ RPC Implementation: • SunRPC • Java RMI • XML RPC • Apache Thrift • Google Protocol Buffers* • Apache Avro SunRPC 0 31 0 31 ■ Originally implemented for popular NFS (network file service) XID XID ■ XID (transaction id) uniquely MsgType = CALL MsgType = REPLY identifies request RPCVersion = 2 Status = ACCEPTED ■ Server does not remember last XID it Program Data serviced Version • Problem if client retransmits request while reply is in transit Procedure • Provides at-least once semantics Credentials (variable) Verifier (variable) Data XML-RPC ■ XML is a standard for describing structured documents • Uses tags to define structure: <tag> … </tag> demarcates an element Tags have no predefined semantics … … except when document refers to a specific namespace • Elements can have attributes, which are encoded as name-value pairs A well-formed XML document corresponds to an element tree ■ <?xml version="1.0"?> <methodCall> <methodName>SumAndDifference</methodName> <params> <param><value><i4>40</i4></value></param> <param><value><i4>10</i4></value></param> </params> </methodCall> (thanks to Vijay Karamcheti) XML-RPC Wire Format ■ Scalar values • Represented by a <value><type> … </type></value> block ■ Integer • <i4>12</i4> ■ Boolean • <boolean>0</boolean> ■ String • <string>Hello world</string> ■ Double • <double>11.4368</double> ■ Also Base64 (binary), DateTime, etc. XML-RPC Wire Format (struct) ■ Structures • Represented as a set of <member>s • Each member contains a <name> and a <value> ■ <struct> <member> <name>lowerBound</name> <value><i4>18</i4></value> </member> <member> <name>upperBound</name> <value><i4>139</i4></value> </member> </struct> XML-RPC Wire Format (Arrays) ■ Arrays • A single <data> element, which • contains any number of <value> elements ■ <array> <data> <value><i4>12</i4></value> <value><string>Egypt</string></value> <value><boolean>0</boolean></value> <value><i4>-31</i4></value> </data> </array> XML-RPC Request ■ HTTP POST message • URI interpreted in an implementation-specific fashion • Method name passed to the server program ■ POST /RPC2 HTTP/1.1 Content-Type: text/xml User-Agent: XML-RPC.NET Content-Length: 278 Expect: 100-continue Connection: Keep-Alive Host: localhost:8080 <?xml version="1.0"?> <methodCall> <methodName>SumAndDifference</methodName> <params> <param><value><i4>40</i4></value></param> <param><value><i4>10</i4></value></param> </params> </methodCall> XML-RPC Response ■ HTTP Response • Lower-level error returned as an HTTP error code • Application-level errors returned as a <fault> element (next slide) ■ HTTP/1.1 200 OK Date: Mon, 22 Sep 2003 21:52:34 GMT Server: Microsoft-IIS/6.0