The Challenges Tomcat Faces in High Throughput Production Systems

The Challenges Tomcat Faces in High Throughput Production Systems

The Challenges Tomcat Faces in High Throughput Production Systems May 17, 2017 © 2017 Alibaba Middleware Group 1 Biography • Huxing Zhang (张乎兴) • Software Engineer @ Alibaba Middleware Group since 2014 • Previously worked for DeNA, Tokyo, Japan • Apache Tomcat Committer since 2016 • Core tomcat maintainer @ Alibaba Inc. May 17, 2017 © 2017 Alibaba Middleware Group 2 Outline • How we use Tomcat at Alibaba • High throughput web applications • The challenges • Case studies • Summary • Q & A May 17, 2017 © 2017 Alibaba Middleware Group 3 Outline • How we use Tomcat at Alibaba • High throughput web applications • The challenges • Case studies • Summary • Q & A May 17, 2017 © 2017 Alibaba Middleware Group 4 Tomcat in Alibaba • Small trial since 2009 • Migration From JBoss to Tomcat • Jboss uses Tomcat as its servlet container • Officially started in 2013 • Becomes the dominant servlet container in 2015 • Thousands of web applications • Hundreds of thousands of instances • Status Quo • 95% tomcat, 5% other • Mainly 7.0.x (support JDK 6+) • Few 8.0.x • Emerging 8.5.x because of micro-service (e.g. spring-boot) May 17, 2017 © 2017 Alibaba Middleware Group 5 Deployment Standard One web application per Tomcat instance 5XQWLPH&RQƉJXUDWLRQ &$7$/,1$B%$6( AppName. Tomcat Nginx war conf conf V\PEROLFOLQN • Startup script features $SSOLFDWLRQ6SHFLƉF&RQƉJXUDWLRQ • Per startup log rotation for catalina.out Nginx Tomcat Startup • Smart auto-fix for incorrect JVM args conf conf Hooks Deploy UROOEDFN • CATALINA_BASE preparation ment AppName. JVM Health war Args Check • Tomcat startup status detection Platform SXEOLVK 6WDQGDUGL]HG&RQƉJXUDWLRQ &$7$/,1$B+20( Nginx Tomcat Startup Conf Conf script May 17, 2017 © 2017 Alibaba Middleware Group 6 Monitoring, diagnostics and development tools • Tomcat Monitor (RESTful service) • Metrics for OS, JVM, Tomcat, Middleware,and application • Classloading: locate which jar file a class is loaded from • Thread: per thread CPU usage, thread state, etc. • Connector stats: tomcat thread pool statistics • Diagnostics on production environment • https://github.com/oldmanpushcart/greys-anatomy • Much easier to use than BTrace • Development tools • Package free tomcat Eclipse plugin May 17, 2017 © 2017 Alibaba Middleware Group 7 Other Usages that worth mentioning • Cluster Session Management: Taobao Session • Cookie + distributedkey-value storage(Tair) • Database Connection Pool: Druid • https://github.com/alibaba/druid • Performance, Monitoring, Extensibility • Multi-tenancy • High density deploymentfor legacy web applications • Providing CPU/Memory isolation • Based on multi-tenantJVM May 17, 2017 © 2017 Alibaba Middleware Group 8 Outline • How we use Tomcat at Alibaba • High throughput web applications • The challenges • Case studies • Summary • Q & A May 17, 2017 © 2017 Alibaba Middleware Group 9 High throughput web application • Taobao API Gateway for mobile devices Shopping Cart • Interacting with hundreds of backend services • Handling millions of request per second Inventory Mobile API Order Devices Gateway Shipping … May 17, 2017 © 2017 Alibaba Middleware Group 10 Overall Architecture Connection 'DWDEDVH & Load Balance $3,*DWHZD\ +773 Nginx Nginx Tomcat Database +7736 PC +773 53& :HE$SS Nginx Database :HE$SS Nginx Tomcat +7736 ŏ ŏ :HE$SS Mobile Nginx Tomcat Nginx Nginx Tomcat Database • Nginx + Tomcat on the same machine • Security May 17, 2017• Rate Limiting © 2017 Alibaba Middleware Group 11 Outline • How we use Tomcat at Alibaba • High throughput web applications • The challenges • Case studies • Summary • Q & A May 17, 2017 © 2017 Alibaba Middleware Group 12 The challenges Configuration tuning Memory Security • Connector Selection Leak • Configuration Tweaking Tomcat • NIO Connection & Concurrency Issues Concurrency • Case study 1 Connection • Security Issues • Case study 2 Asynchron GC ization May 17, 2017 © 2017 Alibaba Middleware Group 13 The challenges Configuration tuning Memory Security • Connector Selection Leak • Configuration Tweaking Tomcat • NIO Connection & Concurrency Issues Concurrency • Case study 1 Connection • Security Issues • Case study 2 Asynchron GC ization May 17, 2017 © 2017 Alibaba Middleware Group 14 Architecture: BIO + Sync RPC $3,*DWHZD\ Remote Service 1JLQ[ 7RPFDW User +773 BIO 53& Remote Nginx Servlet Request Connector Service +773 Remote Service • One working thread per request • Can not scale up to thousands of requests per second • Low cpu utilization May 17, 2017 © 2017 Alibaba Middleware Group 15 Architecture: APR + Async RPC $3,*DWHZD\ Remote Service 1JLQ[ 7RPFDW User +773 APR Async $V\QF53& Remote Nginx Request Connector Servlet Service $V\QF+773 Remote Service • Blocking on reading request headers • Non-blocking on waiting for the next request • Cons: • Unexpected JVM crash • Maintenance cost: requires separatetcnative library May 17, 2017 © 2017 Alibaba Middleware Group 16 Architecture: NIO + Async RPC $3,*DWHZD\ Remote Service 1JLQ[ 7RPFDW User +773 NIO Async $V\QF53& Remote Nginx Request Connector Servlet Service $V\QF+773 Remote Service • Non-blocking on reading request headers • Non-blocking on waiting for next request • Fully asynchronized May 17, 2017 © 2017 Alibaba Middleware Group 17 The challenges Configuration tuning Memory Security • Connector Selection Leak • Configuration Tweaking Tomcat • NIO Connection & Concurrency Issues Concurrency • Case study 1 Connection • Security Issues • Case study 2 Asynchron GC ization May 17, 2017 © 2017 Alibaba Middleware Group 18 Tweaking Tomcat • Connector configuration • Do not change if you really encounter some issue • Read the documentationcarefully Configuration Default Recommended Note ConnectionTimeout 20s Decrease maxThreads 200 Increase acceptCount (backlog) 100 Increase min(acceptCount , /proc/sys/net/core/somaxconn) processorCache 200 Increase max(maxThreads, concurrent connection) • access log disabled • Availableon Nginx side May 17, 2017 © 2017 Alibaba Middleware Group 19 The challenges Configuration tuning Memory Security • Connector Selection Leak • Configuration Tweaking Tomcat • NIO Connection & Concurrency Issues Concurrency • Case study 1 Connection • Security Issues • Case study 2 Asynchron GC ization May 17, 2017 © 2017 Alibaba Middleware Group 20 Tomcat NIO revisit User Request 1. Acceptor accepts a socket Acceptor 2. Acceptor retrieves a nioChannelobject from cache 3. Poller thread registers nioChannelto its selector 3ROOHU(YHQW 3ROOHU(YHQW and selects for IO events nioChannel 4. Poller assigns the nioChannelto one of the worker … Poller0 Poller1 threads to process the request … VHOHFWRU VHOHFWRU 5. SocketProcessor finishes processing the request … and returns the nioChanel SocketProcessor SocketProcessor SocketProcessor May 17, 2017 © 2017 Alibaba Middleware Group 21 Common NIO Issues • Acceptor stops accepting new request • Connection leakage • Connection count leakage • nioChannelCache pollutiondue to concurrency issues • NPE during request processing • Caused by sendfile, default is on May 17, 2017 © 2017 Alibaba Middleware Group 22 Case study 1 • Tomcat 7.0.x • NIO connector • Throughput: 1000+ request/s • Tomcat Poller thread throws ConcurrentModificationException • After a while Tomcat stops accepting new request May 17, 2017 © 2017 Alibaba Middleware Group 23 Troubleshooting protected void timeout(int keyCount, boolean hasEvents) { ... • Poller thread has its own selector //timeout Set<SelectionKey> keys = selector.keys(); • Should not have concurrency issues ... for (Iterator<SelectionKey> iter = keys.iterator(); • Who is modifying the SelectionKeys? iter.hasNext();) { SelectionKey key = iter.next(); ... } ... } May 17, 2017 © 2017 Alibaba Middleware Group 24 Digging into the JVM • The only entrance: sun.nio.ch.EpollSelectorImpl protected void implRegister(SelectionKeyImpl ski) { ... keys.add(ski); + System.out.println("> " + Thread.currentThread().getName() + ", " + this); } protected void implDereg(SelectionKeyImpl ski) throws IOException { ... keys.remove(ski); + System.out.println("> " + Thread.currentThread().getName() + ", " + this); ... } Issue NOT reproducible! May 17, 2017 © 2017 Alibaba Middleware Group 25 Minimize the overhead protected void implRegister(SelectionKeyImpl ski) { ... keys.add(ski); + System.out.println(Thread.currentThread().getName() + "> " + this); } protected void implDereg(SelectionKeyImpl ski) throws IOException { ... keys.remove(ski); + System.out.println(Thread.currentThread().getName() + "> " + this); ... } http-nio-7001-ClientPoller-1> sun.nio.ch.EPollSelectorImpl@cc98f2c http-nio-7001-ClientPoller-1> sun.nio.ch.EPollSelectorImpl@5787ae61 http-nio-7001-ClientPoller-0> sun.nio.ch.EPollSelectorImpl@cc98f2c • 2 poller thread are modifying the same selector instance! May 17, 2017 © 2017 Alibaba Middleware Group 26 JVM bug? @Override Can’t be changed during runtime public void run() { if ( interestOps == OP_REGISTER ) { try { • socket.getIOChannel().register( Update to JDK 1.7 and the issue is remaining socket.getPoller().getSelector(), SelectionKey.OP_READ, key); • Setting poller thread number to 1 prevents} catchthe (Exceptionissue from x) { being log.error(“”, x); reproduced } } else { nioChannel registered to • NOT a JVM bug! // … }//end if the wrong selector? • Where is the selection key being modified? }//run java.lang.Exception: Stack trace at sun.nio.ch.EPollSelectorImpl.implRegister(EPollSelectorImpl.java:171) at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:133) at java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:180)

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    48 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us