Johann Schleier-Smith and Joseph M. Hellerstein Presented at HPTS November 5, 2019 Outline
Total Page:16
File Type:pdf, Size:1020Kb
A File System for Serverless Computing Johann Schleier-Smith and Joseph M. Hellerstein Presented at HPTS November 5, 2019 Outline Serverless computing background Limitations The Cloud Function File System (CFFS) Evaluation and learnings 2 ©2019 RISELab Serverless computing in 2014 AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the cloud, run at any scale Successful for web APIs and event processing, limited in stateful applications Thumbnail 3 ©2019 RISELab Serverless computing in 2014 AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the cloud, run at any scale Successful for web APIs and event processing, limited in stateful applications Thumbnail 4 ©2019 RISELab Defining characteristics of serverless abstractions Hiding the servers and the complexity of programming them Consumption-based costs and no charge for idle resources Excellent autoscaling so resources match demand closely 5 ©2019 RISELab Understanding serverless computing’s impact Resources People On-premise Serverful cloud Serverless cloud Job Serverful Cloud Serverless Cloud Infrastructure Outsourced job Outsourced job Administration System Simplified job Outsourced job Administration Resources Software Allocated & Charged Allocated Little change Simplified job Development Time 6 ©2019 RISELab Serverless as next phase of cloud computing Serverless abstractions offer Simplified programming Outsourced operations Improved resource utilization 7 ©2019 RISELab Serverless is much more than FaaS Function as a Service Big Data Processing AWS Lambda Object Storage Google Cloud Dataflow Google Cloud Functions AWS S3 AWS Glue Google Cloud Run Azure Blobs AWS Athena Azure Functions Google Cloud Storage AWS Redshift Queue Service Mobile back end Google Firebase AWS SQS Key-Value Store AWS AppSync Google Cloud Pub/Sub AWS DynamoDB Azure CosmosDB Google Cloud Datastore AWS Serverless Aurora Google App Engine 8 ©2019 RISELab Fertile ground for research 120 Actual Extrapolated 100 80 60 40 Number of Publications 20 0 2012 2013 2014 2015 2016 2017 2018 2019 Year Source: dblp computer science bibliography. Outline Serverless computing background Limitations The Cloud Function File System (CFFS) Evaluation and learnings 10 ©2019 RISELab Limitations of FaaS No inbound network Limited runtime connections No specialized Ephemeral state hardware, e.g., GPU Plenty of complementary stateful serverless services Others Object Storage Key-Value Store • AWS Aurora Serverless • AWS S3 • AWS DynamoDB • Google Firebase • Azure Blobs • Azure CosmosDB • Google Cloud Storage • Google Cloud Datastore • Anna KVS (Berkeley) Combine with FaaS to build applications Compute: λ λ isolated & ephemeral state Storage: Object Key-Value Etc. shared & durable state Storage Store Allows independent scaling λ λ λ λ λ Object Key-Value Etc. Etc. Etc. Storage Store How happy are we? How happy are we? Two main problems Latency API Can I please have something like local disk, but for the cloud? File systems let us run so much software Data analysis with Pandas Machine learning with TensorFlow Software builds with Make Search indexes with Sphinx Image rendering with Radiance Databases with SQLite Web serving with Nginx Email with Postfix 20 ©2019 RISELab Objections It won’t scale Need a simpler data model (key-value store, immutable objects) Need a weaker consistency model (eventual consistency) Performance and reliability will suffer otherwise You don’t need it You should be rewriting your software for the cloud anyhow Why not just use use a key-value store? 21 ©2019 RISELab How does this make me feel? 22 ©2019 RISELab Outline Serverless computing background Limitations The Cloud Function File System (CFFS) Evaluation and learnings 23 ©2019 RISELab Introducing the Cloud Function File System (CFFS) POSIX semantics, including strong consistency Local caches for local disk performance Works under autoscaling, extreme elasticity, and FaaS limitations Implemented as a transaction system 24 ©2019 RISELab What is special about the FaaS environment? Function invocations have well defined beginning and end At-least-once execution—expects idempotent code Constrained execution model Clients frozen in between invocations No inbound network connections No root access 25 ©2019 RISELab CFFS Architecture CFFSλ CFFSλ CFFSλ CFFSλ CFFSλ CFFSλ CFFS Back end transactional system 26 ©2019 RISELab FaaS instance caching “Function as a Service” suggests statelessness, but most implementations reuse instances and preserve their state Setup of sandboxed environment takes time Loads selected runtime (e.g., JavaScript, Python, C#, etc.) Configures network endpoint, IAM privileges Loads user code Runs user initialization Caching is key to amortizing instance setup 27 ©2019 RISELab Cloud Function File System (CFFS) Application — POSIX API — Our Txn Buffer emphasis Cache Cache Logs Txn Commit 28 Back end transactional©2019 RISELab system Core API implemented in CFFS open New descriptor (handle) for file or mkdir Create directory directory rename Rename file / directory close Close descriptor unlink Delete file / directory write / pwrite Write / positioned write chmod Set access permissions read / pread Read / positioned read chown Set ownership stat Get size, ownership, access, utimes Update modified / accessed permissions, last modified timestamps seek Set descriptor position clock_gettime Get current time dup / dup2 Copy descriptor chdir Set working directory truncate Set file size getcwd Get working directory flock Byte range lock and unlock begin Start transaction commit / abort End transaction 29 ©2019 RISELab POSIX guarantees - language from the spec • Writes can be serialized with respect to other reads and “ writes. ” • If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write()… • A similar requirement applies to multiple write operations to the same file position… • This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics. https://pubs.opengroup.org/onlinepubs/9699919799/ POSIX guarantees in database terms Atomic operations Each operation (at the API level) is observed entirely or not at all Some violations in practice Consistency model Spec references time, technically requires strict consistency (shared global clock) Actually implemented as sequential consistency (global order exists, consistent with order at each processor) We use serializability to provide isolation and atomicity at function granularity Open question: what guarantees do applications actually rely on? 31 ©2019 RISELab Implementation highlights Choice of transaction mechanism not fundamental Implemented timestamp-order serilizable Optimistic protocols can be a good fit—FaaS side effects must be idempotent State-of-the-art protocols promise lower abort rates, more effective local caches (e.g., Yu, et al., VLDB 2018) Cache updates through on-demand filtererd log shipping Check for updates when function starts execution Eviction messages help back-end track client cache content 32 ©2019 RISELab CFFS in context: Transactional file systems QuickSilver distributed system – IBM, 1991 Very close in spirit No FaaS No caching Inversion File System – Berkeley, 1993 Built on top of PostgreSQL Access through custom library Transactional NTFS (TxF) – Microsoft, 2006 Shipping in Windows Deprecated on account of complexity There are many non-transactional shared & distributed file systems 33 ©2019 RISELab CFFS in context: Shared file systems Must choose between consistency and latency Eventual consistency Delegation/lock-based caching No caching HPC Big data Client-server Lustre HDFS NFS GPFS (IBM) GFS (Google) SMB GlusterFS (RedHat / IBM) Ceph (IBM) GFS (Google File System) Mainframe MooseFS MapR-FS zFS LizardFS Alluxio BeeGFS 34 ©2019 RISELab Outline Serverless computing background Limitations The Cloud Function File System (CFFS) Evaluation and learnings 35 ©2019 RISELab Sample workload call frequencies 120,000 make OpenSSH 100,000 TPC-C on SQLite 80,000 60,000 40,000 20,000 - stat dup read seek write chdir close open unlink chmod getdent rename 36 ©2019 RISELab Caching benefits (TPC-C / SQLite) 9,000 8,105 8,000 7,000 6,000 5,000 tpmC 4,000 3,000 2,000 1,000 817 142 - Local NFS CFFS 37 ©2019 RISELab Scaling in AWS Lambda 180,000 Uncached Cached 160,000 140,000 120,000 100,000 80,000 60,000 40,000 20,000 - 10 20 30 40 50 60 70 80 90 100 4k random reads 38 ©2019 RISELab Returning to objections It won’t scale—contention risks File length: must be checked on every read Stat command: mainly used use it to check the file length or permissions, but also returns modification time and access time Challenges here, optimistic they will be overcome You don’t need it I think it will be pretty useful 39 ©2019 RISELab How happy are we? How happy are we? CFFS Summary Transactions are a natural fit for FaaS BEGIN and END from function context At-least-once execution goes well with optimistic transactions On-demand filtered log shipping allows asynchronous cache updates Overcomes limitations of FaaS & traditional shared file systems Allows caching for lower latency, preserving consistency Highly scalable, especially with snapshot reads POSIX API enables vast range of tools and libraries 42 ©2019 RISELab.