CPL 2016, week 7 Performance considerations
Oleg Batrashev
Institute of Computer Science, Tartu, Estonia
March 21, 2016 Overview
Studied so far: 1. Inter-thread visibility: JMM 2. Inter-thread synchronization: locks and monitors 3. Thread management: executors, tasks, cancelation 4. Inter-thread communication: confinements, queues, back pressure 5. Inter-thread collaboration: actors, inboxes, state diagrams 6. Asynchronous execution: callbacks, Pyramid of Doom, Java 8 promises. Today:
I Performance considerations: asynchronous IO, Java 8 streams. Performance considerations 140/160 Context switch - Outline
Performance considerations Context switch Green threads Asynchronous IO Java NIO
Declarative concurrency Java 8 streams Performance considerations 141/160 Context switch - Variants of context switch
Context switch may refer to different things
I application changes CPU priority level (kernel/user) of a running code
I system calls – set of basic operations supported by OS that applications use to open file/socket, write/read it, ... I CPU registers, stack, ... are reloaded in the core
I OS changes the thread that runs on a core
I OS changes the process that runs on a core Our main interest is in switching threads, e.g.:
I if lock is taken, the thread must be suspended until it is released
I if queue is empty, then consumer must be suspended
I if no more data in a socket, reader must be suspended Too many context switches may degrade the performance! Performance considerations 142/160 Context switch - Context switch test
Thread/process context switch is 1-10 microseconds (system dependent). 1. Two actors with own thread: producer writes 1 million integer values to the consumer actor, which sums them up. 2. Two actors with own thread: ping-pong of 0.5 million values. 3. Two actors with shared thread: ping-pong of 0.5 million values. Case Total time Two actors: producer-consumer 0.41 s Two actors with own threads 6 s Two actors with shared thread 0.18 s
I ping-pong between two threads causes expected decline in efficiency (6 µs per context switch, i.e. p(io)ng) Performance considerations 143/160 Context switch - Solutions to context switch
1. Let the same thread do most of the work
I from queue/actor model back to wandering threads 2. Make sure single thread does enough work before switching
I make message processing work expensive (in terms of computation) I keep queues full enough for consumers/transducers/actors – handle several in a row before switching off to another thread I not always possible 3. Do not switch thread when switching actors, consumers, and/or transducers
I use green threads This problem is only relevant in case of many actors and/or many batch non-applicapable messages! Performance considerations 144/160 Green threads - Outline
Performance considerations Context switch Green threads Asynchronous IO Java NIO
Declarative concurrency Java 8 streams Performance considerations 145/160 Green threads - Idea
Green threads (library threads, user-level threads):
I user-level thread is maintained outside OS, on the user level
I implemented by library or VM
I 1 kernel-level (OS) thread per n user-level threads
I OS resources are allocated for 1 thread I cheaper scheduling – no context switch needed
I m kernel threads per n user threads Problems:
I need a way to suspend execution and save/restore thread stack
I i.e. preempt executing thread I non-preemptable threads need to yield periodically
I IO may block OS thread, which is needed by other green threads Performance considerations 146/160 Green threads - Implementations
Languages/VMs:
I Java 1.1 had green threads as the main implementation
I Erlang VM uses green threads with no shared state
I Go, Smalltalk Libraries/frameworks/engines:
I Akka (Java) uses m-n model (specify dispatcher for an actor)
I CPython greenlet, eventlet, gevent
I Quasar (Java) modifies your code to save the stack (location and local variables) See also:
I fibers, coroutines Performance considerations 147/160 Asynchronous IO - Outline
Performance considerations Context switch Green threads Asynchronous IO Java NIO
Declarative concurrency Java 8 streams Performance considerations 148/160 Asynchronous IO - Blocking IO problem
I IO may block OS thread that is used for many green threads Solutions: 1. Use dedicated thread pool for blocking IO (Clojure) 2. Use asynchronous IO (Erlang) Some frameworks:
I Netty is a non-blocking I/O (NIO) client-server framework for the development of Java network applications
I Asynchronous servlets in Servlet 3.0 Performance considerations 149/160 Asynchronous IO - Ideas
I Synchronous IO suspends if no data is yet available
I Asynchronous IO – use callbacks that are executed when IO is readable/writeable
I does not block on IO operations I may read multiple sockets by single thread (selectors) Advantages:
I avoids context switch when reading from multiple sockets
I solves green thread blocking IO problem Disadvantages:
I requires more code to handle IO
I code becomes more scattered Performance considerations 150/160 Asynchronous IO - Java NIO Buffers and channels
http://tutorials.jenkov.com/java-nio/index.html
I buffers are much like arrays
I provide typical write-flip-read sequence I used for Java NIO channels I ByteBuffer.allocate(100)
I channels are much like streams, but
I both readable/writeable I support asynchronous operation, read AsynchronousByteChannel:
Future
I write also supports these 2 forms: future and callback Performance considerations 151/160 Asynchronous IO - Java NIO Selectors
I may register callback for each channel we are interested
I easier way is to use selectors
I register as many channels as we want, select desired operation:
channel.configureBlocking(false); SelectionKey key= channel.register(selector, SelectionKey.OP_READ);
I supported operations OP_CONNECT, OP_ACCEPT, OP_READ, OP_WRITE
I use selector.select() – blocks until at least one channel is ready for the events you registered for
I selector.selectedKeys() – returns the channels that are ready Performance considerations 152/160 Summary -
I context switch is changing executing mode, thread or process
I context switch is quite expensive on OS (kernel) level
I green threads (user-level threads) may mitigate the cost
I green threads have problems with preemption, saving stack and blocking IO
I blocking IO may be solved by:
I using dedicated thread pool I using asynchronous IO Declarative concurrency 153/160- Ideas
I Java <8 lacked functional style
I declarative = pure functional (see later Erlang,Clojure)
I single assignment variables, lock-step execution I deterministic, no side effects, no race conditions I lazyness, dataflow programming
I interest in performance (utilizing cores)
I structured declarative concurrency
I parallel map/filter/reduce Declarative concurrency 154/160 Java 8 streams - Outline
Performance considerations Context switch Green threads Asynchronous IO Java NIO
Declarative concurrency Java 8 streams Declarative concurrency 155/160 Java 8 streams - Java8 streams
Like usual streams:
I sequence of values. Unlike usual streams:
I do not have state, only for data transformation
I support map/filter/reduce transformations
I lazy – do not execute until data is needed Create stream:
Stream
I last 2 produce infinite streams Declarative concurrency 156/160 Java 8 streams - Collecting stream
I streams are not executed until their results are needed
I terminal operation – one that produces the result Some terminal operations:
long count() Optional
I Collector interface is very general
I Collectors class contains a lot of standard implementations
I toList(), toSet(), ... Declarative concurrency 157/160 Java 8 streams - Transforming stream
I map – transform each element and return new stream
I filter – select only some elements from the stream
Stream
I reduce – aggregate stream into the final result
Optional
I flatMap – like map but combining resulting streams
I analogue of compose in CompletableFuture Declarative concurrency 158/160 Java 8 streams - Example: explicit
Collect travelers that has speed>20, take their max and min temperatures
I types of intermediate streams are given explicitly
I see next slide for more compact version
I limit() takes first n elements
Stream
I more readable than explicit version
double[] temps2= Stream.generate(Traveler::generate) .limit(10000) .filter(t->t.speed>20.0) // fast moving .mapToDouble(t->t.temperature) // take temperature .toArray(); System.out.println(DoubleStream.of(temps2).min() +"" + DoubleStream.of(temps2).max());
I do not overuse – many anonymous intermediate results may confuse about what is actually happening Declarative concurrency 160/160 Java 8 streams - Parallel execution
Advantages to usual for loop:
I easily parallelizable, e.g. run the example in parallel
double[] temps3= Stream.generate(Traveler::generate) .parallel() .limit(10000) .filter(t->t.speed>20.0) // fast moving .mapToDouble(t->t.temperature) // take temperature .toArray(); System.out.println(DoubleStream.of(temps3).min() +"" + DoubleStream.of(temps3).max());
I streams are composable, e.g. may write
Stream
I combine it in different contexts I no operation is executed until terminal operation for the whole stream is defined