Scaling Uber with Node.Js Amos Barreto @Amos Barreto
Total Page:16
File Type:pdf, Size:1020Kb
Scaling Uber with Node.js Amos Barreto @amos_barreto Uber is everyone’s Private driver. REQUEST! RIDE! RATE! Tap to select location Sit back and relax, tell your Help us maintain a quality service" driver your destination by rating your experience YOUR DRIVERS 4 Your Drivers UBER QUALIFIED RIDER RATED LICENSED & INSURED Uber only partners with drivers Tell us what you think. Your From insurance to background who have a keen eye for feedback helps us work with checks, every driver meets or customer service and a drivers to constantly improve beats local regulations. passion for the trade. the Uber experience. 19 LOGISTICS 4 #OMGUBERICECREAM 22 UberChopper #OMGUBERCHOPPER 22 #UBERVALENTINES 22 #ICANHASUBERKITTENS 22 Trip State Machine (Simplified) Request Dispatch Accept Arrive End Begin 6 Trip State Machine (Extended) Expire / Request Dispatch (1) Reject Dispatch (2) Accept Arrive End Begin 6 OUR STORY 4 Version 1 • PHP dispatch PHP • Outsourced to remote contractors in Midwest • Half the code in spanish Cron • Flat file " • Lifetime: 6-9 months 6 33 “I read an article on HackerNews about a new framework called Node.js” """"Jason Roberts" Tradeoffs • Learning curve • Database drivers " " • Scalability • Documentation" " " • Performance • Monitoring" " " • Library ecosystem • Production operations" Version 2 • Lifetime: 9 months " Node.js • Developed in house " • Node.js application • Prototyped on 0.2 • Launched in production with 0.4 " • MongoDB datastore “I really don’t see dispatch changing much in the next three years” 33 Expect the unexpected 15 Version 3 • Mongo did not scale with CN volume of GPS logs (global CN CN write lock) " • Swapped mongo for redis and flat files SF NYC SEA CHI Decoupling storage of different types of data Version 3 (continued) • Node.js mongo client failed CN to recognize replica set CN CN topology changes SF NYC SEA CHI Be wary of immature client libraries Commits to client modules over time Version 3 (continued) SF NYC SEA CHI BOS PAR Focus on driving business value 15 15 15 Capacity planning, forecasting, and load testing are your friends 15 Measure everything 15 Version 4 • Nickname: The Grid CN " CN CN • Multi-process dispatch " • Peer assignment " • Redis is now considered the SF NYC SEA CHI SF NYC CHI source of truth SF CHI " • Use lua interpreter for atomic operations " • Fan out to all city peers to find nearby cars clientStatus = redis.call('hget', clientHash, ‘status’) driverStatus = redis.call('hget', driverHash, ‘status’) if clientStatus == 'WaitingForPickup' and driverStatus == 'Open' then local clientPeerId = redis.call('hget', clientAssignmentHash, clientToken) redis.call('hset', driverHash, 'status', 'DispatchPending') redis.call('hset', clientAssignmentHash, clientToken, driverPeerId) if clientPeerId then redis.call('zincrby', countKey, -1, clientPeerId) end redis.call('zincrby', countKey, 1, driverPeerId) return redis.status_reply('SUCCESS') else return redis.error_reply('ERROR - clientStatus: '..tostring(clientStatus)..', driverStatus: '..tostring(driverStatus)) end 15 clientStatus = redis.call('hget', clientHash, ‘status’) driverStatus = redis.call('hget', driverHash, ‘status’) if clientStatus == 'WaitingForPickup' and driverStatus == 'Open' then local clientPeerId = redis.call('hget', clientAssignmentHash, clientToken) redis.call('hset', driverHash, 'status', 'DispatchPending') redis.call('hset', clientAssignmentHash, clientToken, driverPeerId) if clientPeerId then redis.call('zincrby', countKey, -1, clientPeerId) end redis.call('zincrby', countKey, 1, driverPeerId) return redis.status_reply('SUCCESS') else return redis.error_reply('ERROR - clientStatus: '..tostring(clientStatus)..', driverStatus: '..tostring(driverStatus)) end 15 clientStatus = redis.call('hget', clientHash, ‘status’) driverStatus = redis.call('hget', driverHash, ‘status’) if clientStatus == 'WaitingForPickup' and driverStatus == 'Open' then local clientPeerId = redis.call('hget', clientAssignmentHash, clientToken) redis.call('hset', driverHash, 'status', 'DispatchPending') redis.call('hset', clientAssignmentHash, clientToken, driverPeerId) if clientPeerId then redis.call('zincrby', countKey, -1, clientPeerId) end redis.call('zincrby', countKey, 1, driverPeerId) return redis.status_reply('SUCCESS') else return redis.error_reply('ERROR - clientStatus: '..tostring(clientStatus)..', driverStatus: '..tostring(driverStatus)) end 15 clientStatus = redis.call('hget', clientHash, ‘status’) driverStatus = redis.call('hget', driverHash, ‘status’) if clientStatus == 'WaitingForPickup' and driverStatus == 'Open' then local clientPeerId = redis.call('hget', clientAssignmentHash, clientToken) redis.call('hset', driverHash, 'status', 'DispatchPending') redis.call('hset', clientAssignmentHash, clientToken, driverPeerId) if clientPeerId then redis.call('zincrby', countKey, -1, clientPeerId) end redis.call('zincrby', countKey, 1, driverPeerId) return redis.status_reply('SUCCESS') else return redis.error_reply('ERROR - clientStatus: '..tostring(clientStatus)..', driverStatus: '..tostring(driverStatus)) end 15 Version 4 (continued) SF1 SF2 NY1 NY2 SEA1 SEA2 CHI1 BOS1 BOS2 PAR1 SF3 NY3 NY4 SEA3 SEA4 CHI2 CHI3 Version 5 SF SF SF SF SF SF SF SF SF Version 5 max # of loc queries loc of # max # of nodes Version 5 CN CN CN ncar SF NYC SEA CHI ncar SF NYC CHI ncar SF CHI ncar Break out services as needed 15 Understand v8 to optimize Node.js applications 15 SF1 SF2 NY1 NY2 SEA1 SEA2 CHI1 BOS1 BOS2 PAR1 SF3 NY3 NY4 SEA3 SEA4 CHI2 CHI3 Don’t take vacation ;) Don’t live in Chicago! 15 Stateless applications… No single points of failure… Replicated data stores… Dynamic application topology… 15 Version 6 SF1 SEA3 NY2 PAR1 CHI1 NY3 BOS1 BOS2 NY1 SF3 SEA1 NY4 SEA4 CHI2 CHI3 SEA2 SF2 Grid Grid Grid Manager Manager Manager Version 7 haproxy Do the obvious 15 Pros • every application is horizontally scalable • flexible, partially dynamic topology • failure recovery manual in the worst case " • supports primary business case very well • conservative estimates 1-2 years of runway Never be satisfied Cons • what happens when a city out scales the capacity of a single redis instance? " • who wants to wake up in the middle of the night for servers crashes? " • what about future business use cases? #WORLDCLASS 4 World Class • city agnostic dispatch application " • “stateless” applications " • scale to 100x current load " • flexible data model Every now and then it’s okay to bend the rules 15 Realtime Analytics Realtime Analytics So why did we stick with Node.js? • JavaScript is easy to learn " • Simple interface with thorough documentation " • Lends itself to fast prototyping " • Asynchronous, nimble " • Avoid concurrency challenges " • Increasingly mature module ecosystem How to win with Node.js? • measure everything - particularly response times and event loop lag! • learn to take heap dumps to debug memory issues! • strace, perf, flame graphs are necessary tools for improving performance" " • small, reusable components to reduce duplication The Human Factor 34 Thank you. Questions?.