Yes, No, Maybe ? Error Handling with Grpc Examples Agenda
Total Page:16
File Type:pdf, Size:1020Kb
Yes, No, Maybe ? Error handling with gRPC examples Agenda Hello world with Protocol buffers and gRPC What's done by "magic" ? Error codes Did it work ? yes, no, and maybe ? Should I Retry ? TL;DR guidelines Protocol buffers and gRPC In 5-ish mins... Protocol Buffers The Greeter Service SayHello("Fred", "de_DE") Translate("Hello", "de_DE") Client Greeter Translator "Guten Tag Fred!" "Guten Tag" 1. define data structure schemas and programming interfaces 2. implementation code 3. remote procedure call (RPC) for the distributed client and server 1. The service definition .proto Protocol Protocol Buffers is a simple language-neutral and platform-neutral Buffers Interface Definition Language (IDL) // Hello world service Greeter { a service method rpc SayHello (HelloRequest) returns (HelloReply) {} } // Who to greet ? message HelloRequest { a request message type string name = 1; string locale = 2; } // The greeting. message HelloReply { a response message type string greeting = 1; } 1. The generated helper code Protocol Buffers The protocol buffer compiler generates codes that has ● remote interfacestub for Client to call with the methods ● abstract interface for Server code to implement Protocol buffer code will populate, serialize, and retrieve our request and response message types. 2. The implementation code greeter_client.cc greeter_server.cc #include "greeter.grpc.pb.h" // generated by protoc #include "greeter.grpc.pb.h" // generated by protoc GreeterClient(std::shared_ptr<Channel> channel) class GreeterServiceImpl final : public : stub_(Greeter::NewStub(channel)) {} Greeter::Service { std::string SayHello(const std::string& user) { grpc::Status SayHello(ServerContext* context, const HelloRequest request; HelloRequest* request, HelloReply* reply) override { request.set_name(user); std::string prefix("Hello "); HelloReply reply; reply->set_message(prefix + request->name()); ClientContext context; return Status::OK; stub_->SayHello(&context, request, &reply); } return reply.message(); }; 3. ...and the gRPC core ● Exposes core api to language api ● Filters ○ Implements RPC deadlines ○ Performs authentication ● Reconnect automatically with exponential backoff ● Takes care of socket creation, timers etc. Status OK OK set OK Client Server SayHello gRPC transport gRPC Yes It's done, ship it ! Status OK OK set OK Client Server But.. SayHello what happens if something fails ? gRPC transport gRPC gRPC Core Status Codes Status Error : yes, no, maybe - OK - CANCELLED OK - UNKNOWN - INVALID_ARGUMENT - It worked (as implemented) - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS Status Error : yes, no, maybe greeter_client.cc stub_->SayHello(&context, request, &reply); return reply.message(); Status status = stub_->SayHello(&context, request, &reply); if (status.ok()) { return reply.message(); } else { // do something useful and cheap LOG_EVERY_N(ERROR, 10) << "Error " << google::COUNTER << " with status " << status.error_code() << status.error_message(); } return "no hello available"; gRPC Core Status Codes Status Error : yes, no, maybe - OK - CANCELLED OK - It worked (as implemented) - UNKNOWN - INVALID_ARGUMENT It wasn't gRPC library. - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - PERMISSION DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS Maybe Deadline exceeded… or did it ? gRPC Core Status Codes Status Error : yes, no, maybe - OK - CANCELLED - UNKNOWN - INVALID_ARGUMENT Client Server - DEADLINE_EXCEEDED - NOT_FOUND - ALREADY_EXISTS - DEADLINE_EXCEEDED - ?? - PERMISSION DENIED - CANCELLED - UNAUTHENTICATED - DEADLINE_EXCEEDED - RESOURCE_EXHAUSTED - OK - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS DEADLINE_EXCEEDED D_E Client Server Client is slow, the deadline expires on the internal queues. SayHello Do the client and server agree ? set no gRPC Network gRPC DEADLINE_EXCEEDED Client Server SayHello gRPC Network☠ gRPC Network Transport flaps Stubs re-connects automatically. Like magic but not actually magic ! DEADLINE_EXCEEDED D_E Client Server SayHello set gRPC Network☠ gRPC Network DEADLINE_EXCEEDED D_E Client Server Do the client and server agree ? SayHello no set gRPC Network gRPC DEADLINE_EXCEEDED Client's deadline reached before D_E set OK the response from the server. Client Server Do the client and server agree ? no SayHello - Server did wasted work. - Client had already received set gRPC Network gRPC deadline_exceeded from gRPC DEADLINE_EXCEEDED Client's deadline reached before D_E set OK the response from the server. Client Server Do the client and server agree ? no SayHello DEADLINE_EXCEEDED- Server did wasted work. - Client had already received set gRPCCAN NetworkMEAN IT gRPCWORKED deadline_exceeded! from gRPC DEADLINE_EXCEEDED Servers. Cascading outage happen when servers spend resources handling requests that will exceed their deadlines on the client. Clients. Think carefully about whether your request is idempotent before considering retries. Servers can succeed and clients could still be retrying the requests ! greeter_server.cc // Check whether the client deadline has expired before processing. if (context->IsCancelled()) { LOG(INFO) << "Deadline exceeded or Client cancelled, abandoning."; return Status::CANCELLED; } DEADLINE_EXCEEDED Client's deadline reached before the response from the server. D_E setCANCELLED Client Server - Server did no wasted work. - Client receives deadline_exceeded SayHello set Do the client and server agree ? gRPC Network gRPC no greeter_server.cc // Avoid expensive backend calls for client who won't wait for results. if (time_left < FLAGS_too_little_time_ms) { LOG(INFO) << "Don't call the backends, and set deadline exceeded."; return Status(grpc::DEADLINE_EXCEEDED, "Greeter service needs more time."); } The proto // Hello world service Greeter { // If request deadline < FLAGS_too_little_time_ms remains, // returns DEADLINE_EXCEEDED. rpc SayHello (HelloRequest) returns (HelloReply) {} } message HelloRequest { string name = 1; string locale = 2; } message HelloReply { string greeting = 1; DEADLINE_EXCEEDED set D_E Client Server Do the client and server agree ? SayHello maybe ! gRPC Network gRPC DEADLINE_EXCEEDED D_E set D_E Client Server Do the client and server agree ? SayHello Yes, for the same reason. gRPC Network gRPC DEADLINE_EXCEEDED D_E set D_E Client Server SayHello Do the client and server agree ? Yes, but for different reasons. set gRPC Network gRPC gRPC Core Status Codes Status Error : yes, no, maybe - OK - CANCELLED - UNKNOWN gRPC Network gRPC - INVALID_ARGUMENT - DEADLINE_EXCEEDED - NOT_FOUND CLIENT SERVER - ALREADY_EXISTS - CANCELLED - CANCELLED - PERMISSION DENIED - UNKNOWN - UNKNOWN - UNAUTHENTICATED - DEADLINE_EXCEEDED - DEADLINE_EXCEEDED - RESOURCE_EXHAUSTED - UNAUTHENTICATED - UNAUTHENTICATED - FAILED_PRECONDITION - RESOURCE_EXHAUSTED - RESOURCE_EXHAUSTED - ABORTED - UNIMPLEMENTED - UNIMPLEMENTED - OUT_OF_RANGE - INTERNAL - INTERNAL - UNIMPLEMENTED - UNAVAILABLE - UNAVAILABLE - INTERNAL - UNAVAILABLE - DATA_LOSS gRPC Core Status Codes Status Error : yes, no, maybe - OK - CANCELLED OK - It worked (as implemented) - UNKNOWN - INVALID_ARGUMENT MAYBE - It _might_ have WORKED - DEADLINE_EXCEEDED - NOT_FOUND NO - It _probably_ didn't WORK. - ALREADY_EXISTS - PERMISSION DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED - FAILED_PRECONDITION - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED - INTERNAL - UNAVAILABLE - DATA_LOSS gRPC Core Status Codes Status Error : yes, no, maybe - OK - CANCELLED OK - It worked (as implemented) - UNKNOWN - INVALID_ARGUMENT MAYBE - It might have WORKED - DEADLINE_EXCEEDED - NOT_FOUND NO - It probably didn't WORK.Error codes - ALREADY_EXISTS - PERMISSION DENIED - UNAUTHENTICATED - RESOURCE_EXHAUSTED are conventions not rules- FAILED_PRECONDITION ! Is there any - ABORTED - OUT_OF_RANGE - UNIMPLEMENTED source of "truth"- INTERNAL ? - UNAVAILABLE - DATA_LOSS No But services can set expectations Status Error : yes, no, maybe google.rpc.Codes Client errors Server errors - INVALID_ARGUMENT - DATA_LOSS - FAILED_PRECONDITION - UNKNOWN - OUT_OF_RANGE - INTERNAL - UNAUTHENTICATED - NOT_IMPLEMENTED - PERMISSION DENIED - UNAVAILABLE - NOT_FOUND - DEADLINE_EXCEEDED - ABORTED - ALREADY_EXISTS - RESOURCE_EXHAUSTED - CANCELLED https://cloud.google.com/apis/design/errors#error_retries Status Error : yes, no, maybe google.rpc.Codes Client errors Server errors - INVALID_ARGUMENT - DATA_LOSS Clients should retry on UNKNOWN and - FAILED_PRECONDITION - UNKNOWN UNAVAILABLE errors with exponential - OUT_OF_RANGE - INTERNAL backoff. - UNAUTHENTICATED - NOT_IMPLEMENTED The minimum delay should be 1s unless it is - PERMISSION DENIED - UNAVAILABLE documented otherwise. - NOT_FOUND - DEADLINE_EXCEEDED - ABORTED For RESOURCE_EXHAUSTED errors, the - ALREADY_EXISTS client may retry with minimum 30s delay. - RESOURCE_EXHAUSTED - CANCELLED https://cloud.google.com/apis/design/errors#error_retries Status Error : yes, no, maybe google.rpc.Codes Client errors Server errors - INVALID_ARGUMENT - DATA_LOSS - FAILED_PRECONDITION - - OUT_OF_RANGE - INTERNAL For all other errors : - UNAUTHENTICATED - NOT_IMPLEMENTED - PERMISSION DENIED - Retry may not be applicable - firstensure - NOT_FOUND - DEADLINE_EXCEEDED your request is idempotent, and see the - ABORTED error message for guidance. - ALREADY_EXISTS