
The Challenges Tomcat Faces in High Throughput Production Systems May 17, 2017 © 2017 Alibaba Middleware Group 1 Biography • Huxing Zhang (张乎兴) • Software Engineer @ Alibaba Middleware Group since 2014 • Previously worked for DeNA, Tokyo, Japan • Apache Tomcat Committer since 2016 • Core tomcat maintainer @ Alibaba Inc. May 17, 2017 © 2017 Alibaba Middleware Group 2 Outline • How we use Tomcat at Alibaba • High throughput web applications • The challenges • Case studies • Summary • Q & A May 17, 2017 © 2017 Alibaba Middleware Group 3 Outline • How we use Tomcat at Alibaba • High throughput web applications • The challenges • Case studies • Summary • Q & A May 17, 2017 © 2017 Alibaba Middleware Group 4 Tomcat in Alibaba • Small trial since 2009 • Migration From JBoss to Tomcat • Jboss uses Tomcat as its servlet container • Officially started in 2013 • Becomes the dominant servlet container in 2015 • Thousands of web applications • Hundreds of thousands of instances • Status Quo • 95% tomcat, 5% other • Mainly 7.0.x (support JDK 6+) • Few 8.0.x • Emerging 8.5.x because of micro-service (e.g. spring-boot) May 17, 2017 © 2017 Alibaba Middleware Group 5 Deployment Standard One web application per Tomcat instance 5XQWLPH&RQƉJXUDWLRQ &$7$/,1$B%$6( AppName. Tomcat Nginx war conf conf V\PEROLFOLQN • Startup script features $SSOLFDWLRQ6SHFLƉF&RQƉJXUDWLRQ • Per startup log rotation for catalina.out Nginx Tomcat Startup • Smart auto-fix for incorrect JVM args conf conf Hooks Deploy UROOEDFN • CATALINA_BASE preparation ment AppName. JVM Health war Args Check • Tomcat startup status detection Platform SXEOLVK 6WDQGDUGL]HG&RQƉJXUDWLRQ &$7$/,1$B+20( Nginx Tomcat Startup Conf Conf script May 17, 2017 © 2017 Alibaba Middleware Group 6 Monitoring, diagnostics and development tools • Tomcat Monitor (RESTful service) • Metrics for OS, JVM, Tomcat, Middleware,and application • Classloading: locate which jar file a class is loaded from • Thread: per thread CPU usage, thread state, etc. • Connector stats: tomcat thread pool statistics • Diagnostics on production environment • https://github.com/oldmanpushcart/greys-anatomy • Much easier to use than BTrace • Development tools • Package free tomcat Eclipse plugin May 17, 2017 © 2017 Alibaba Middleware Group 7 Other Usages that worth mentioning • Cluster Session Management: Taobao Session • Cookie + distributedkey-value storage(Tair) • Database Connection Pool: Druid • https://github.com/alibaba/druid • Performance, Monitoring, Extensibility • Multi-tenancy • High density deploymentfor legacy web applications • Providing CPU/Memory isolation • Based on multi-tenantJVM May 17, 2017 © 2017 Alibaba Middleware Group 8 Outline • How we use Tomcat at Alibaba • High throughput web applications • The challenges • Case studies • Summary • Q & A May 17, 2017 © 2017 Alibaba Middleware Group 9 High throughput web application • Taobao API Gateway for mobile devices Shopping Cart • Interacting with hundreds of backend services • Handling millions of request per second Inventory Mobile API Order Devices Gateway Shipping … May 17, 2017 © 2017 Alibaba Middleware Group 10 Overall Architecture Connection 'DWDEDVH & Load Balance $3,*DWHZD\ +773 Nginx Nginx Tomcat Database +7736 PC +773 53& :HE$SS Nginx Database :HE$SS Nginx Tomcat +7736 ŏ ŏ :HE$SS Mobile Nginx Tomcat Nginx Nginx Tomcat Database • Nginx + Tomcat on the same machine • Security May 17, 2017• Rate Limiting © 2017 Alibaba Middleware Group 11 Outline • How we use Tomcat at Alibaba • High throughput web applications • The challenges • Case studies • Summary • Q & A May 17, 2017 © 2017 Alibaba Middleware Group 12 The challenges Configuration tuning Memory Security • Connector Selection Leak • Configuration Tweaking Tomcat • NIO Connection & Concurrency Issues Concurrency • Case study 1 Connection • Security Issues • Case study 2 Asynchron GC ization May 17, 2017 © 2017 Alibaba Middleware Group 13 The challenges Configuration tuning Memory Security • Connector Selection Leak • Configuration Tweaking Tomcat • NIO Connection & Concurrency Issues Concurrency • Case study 1 Connection • Security Issues • Case study 2 Asynchron GC ization May 17, 2017 © 2017 Alibaba Middleware Group 14 Architecture: BIO + Sync RPC $3,*DWHZD\ Remote Service 1JLQ[ 7RPFDW User +773 BIO 53& Remote Nginx Servlet Request Connector Service +773 Remote Service • One working thread per request • Can not scale up to thousands of requests per second • Low cpu utilization May 17, 2017 © 2017 Alibaba Middleware Group 15 Architecture: APR + Async RPC $3,*DWHZD\ Remote Service 1JLQ[ 7RPFDW User +773 APR Async $V\QF53& Remote Nginx Request Connector Servlet Service $V\QF+773 Remote Service • Blocking on reading request headers • Non-blocking on waiting for the next request • Cons: • Unexpected JVM crash • Maintenance cost: requires separatetcnative library May 17, 2017 © 2017 Alibaba Middleware Group 16 Architecture: NIO + Async RPC $3,*DWHZD\ Remote Service 1JLQ[ 7RPFDW User +773 NIO Async $V\QF53& Remote Nginx Request Connector Servlet Service $V\QF+773 Remote Service • Non-blocking on reading request headers • Non-blocking on waiting for next request • Fully asynchronized May 17, 2017 © 2017 Alibaba Middleware Group 17 The challenges Configuration tuning Memory Security • Connector Selection Leak • Configuration Tweaking Tomcat • NIO Connection & Concurrency Issues Concurrency • Case study 1 Connection • Security Issues • Case study 2 Asynchron GC ization May 17, 2017 © 2017 Alibaba Middleware Group 18 Tweaking Tomcat • Connector configuration • Do not change if you really encounter some issue • Read the documentationcarefully Configuration Default Recommended Note ConnectionTimeout 20s Decrease maxThreads 200 Increase acceptCount (backlog) 100 Increase min(acceptCount , /proc/sys/net/core/somaxconn) processorCache 200 Increase max(maxThreads, concurrent connection) • access log disabled • Availableon Nginx side May 17, 2017 © 2017 Alibaba Middleware Group 19 The challenges Configuration tuning Memory Security • Connector Selection Leak • Configuration Tweaking Tomcat • NIO Connection & Concurrency Issues Concurrency • Case study 1 Connection • Security Issues • Case study 2 Asynchron GC ization May 17, 2017 © 2017 Alibaba Middleware Group 20 Tomcat NIO revisit User Request 1. Acceptor accepts a socket Acceptor 2. Acceptor retrieves a nioChannelobject from cache 3. Poller thread registers nioChannelto its selector 3ROOHU(YHQW 3ROOHU(YHQW and selects for IO events nioChannel 4. Poller assigns the nioChannelto one of the worker … Poller0 Poller1 threads to process the request … VHOHFWRU VHOHFWRU 5. SocketProcessor finishes processing the request … and returns the nioChanel SocketProcessor SocketProcessor SocketProcessor May 17, 2017 © 2017 Alibaba Middleware Group 21 Common NIO Issues • Acceptor stops accepting new request • Connection leakage • Connection count leakage • nioChannelCache pollutiondue to concurrency issues • NPE during request processing • Caused by sendfile, default is on May 17, 2017 © 2017 Alibaba Middleware Group 22 Case study 1 • Tomcat 7.0.x • NIO connector • Throughput: 1000+ request/s • Tomcat Poller thread throws ConcurrentModificationException • After a while Tomcat stops accepting new request May 17, 2017 © 2017 Alibaba Middleware Group 23 Troubleshooting protected void timeout(int keyCount, boolean hasEvents) { ... • Poller thread has its own selector //timeout Set<SelectionKey> keys = selector.keys(); • Should not have concurrency issues ... for (Iterator<SelectionKey> iter = keys.iterator(); • Who is modifying the SelectionKeys? iter.hasNext();) { SelectionKey key = iter.next(); ... } ... } May 17, 2017 © 2017 Alibaba Middleware Group 24 Digging into the JVM • The only entrance: sun.nio.ch.EpollSelectorImpl protected void implRegister(SelectionKeyImpl ski) { ... keys.add(ski); + System.out.println("> " + Thread.currentThread().getName() + ", " + this); } protected void implDereg(SelectionKeyImpl ski) throws IOException { ... keys.remove(ski); + System.out.println("> " + Thread.currentThread().getName() + ", " + this); ... } Issue NOT reproducible! May 17, 2017 © 2017 Alibaba Middleware Group 25 Minimize the overhead protected void implRegister(SelectionKeyImpl ski) { ... keys.add(ski); + System.out.println(Thread.currentThread().getName() + "> " + this); } protected void implDereg(SelectionKeyImpl ski) throws IOException { ... keys.remove(ski); + System.out.println(Thread.currentThread().getName() + "> " + this); ... } http-nio-7001-ClientPoller-1> sun.nio.ch.EPollSelectorImpl@cc98f2c http-nio-7001-ClientPoller-1> sun.nio.ch.EPollSelectorImpl@5787ae61 http-nio-7001-ClientPoller-0> sun.nio.ch.EPollSelectorImpl@cc98f2c • 2 poller thread are modifying the same selector instance! May 17, 2017 © 2017 Alibaba Middleware Group 26 JVM bug? @Override Can’t be changed during runtime public void run() { if ( interestOps == OP_REGISTER ) { try { • socket.getIOChannel().register( Update to JDK 1.7 and the issue is remaining socket.getPoller().getSelector(), SelectionKey.OP_READ, key); • Setting poller thread number to 1 prevents} catchthe (Exceptionissue from x) { being log.error(“”, x); reproduced } } else { nioChannel registered to • NOT a JVM bug! // … }//end if the wrong selector? • Where is the selection key being modified? }//run java.lang.Exception: Stack trace at sun.nio.ch.EPollSelectorImpl.implRegister(EPollSelectorImpl.java:171) at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:133) at java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:180)
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages48 Page
-
File Size-