CPL 2016, Week 7 Performance Considerations

CPL 2016, week 7 Performance considerations Oleg Batrashev Institute of Computer Science, Tartu, Estonia March 21, 2016 Overview Studied so far: 1. Inter-thread visibility: JMM 2. Inter-thread synchronization: locks and monitors 3. Thread management: executors, tasks, cancelation 4. Inter-thread communication: confinements, queues, back pressure 5. Inter-thread collaboration: actors, inboxes, state diagrams 6. Asynchronous execution: callbacks, Pyramid of Doom, Java 8 promises. Today: I Performance considerations: asynchronous IO, Java 8 streams. Performance considerations 140/160 Context switch - Outline Performance considerations Context switch Green threads Asynchronous IO Java NIO Declarative concurrency Java 8 streams Performance considerations 141/160 Context switch - Variants of context switch Context switch may refer to different things I application changes CPU priority level (kernel/user) of a running code I system calls – set of basic operations supported by OS that applications use to open file/socket, write/read it, ... I CPU registers, stack, ... are reloaded in the core I OS changes the thread that runs on a core I OS changes the process that runs on a core Our main interest is in switching threads, e.g.: I if lock is taken, the thread must be suspended until it is released I if queue is empty, then consumer must be suspended I if no more data in a socket, reader must be suspended Too many context switches may degrade the performance! Performance considerations 142/160 Context switch - Context switch test Thread/process context switch is 1-10 microseconds (system dependent). 1. Two actors with own thread: producer writes 1 million integer values to the consumer actor, which sums them up. 2. Two actors with own thread: ping-pong of 0.5 million values. 3. Two actors with shared thread: ping-pong of 0.5 million values. Case Total time Two actors: producer-consumer 0.41 s Two actors with own threads 6 s Two actors with shared thread 0.18 s I ping-pong between two threads causes expected decline in efficiency (6 µs per context switch, i.e. p(io)ng) Performance considerations 143/160 Context switch - Solutions to context switch 1. Let the same thread do most of the work I from queue/actor model back to wandering threads 2. Make sure single thread does enough work before switching I make message processing work expensive (in terms of computation) I keep queues full enough for consumers/transducers/actors – handle several in a row before switching off to another thread I not always possible 3. Do not switch thread when switching actors, consumers, and/or transducers I use green threads This problem is only relevant in case of many actors and/or many batch non-applicapable messages! Performance considerations 144/160 Green threads - Outline Performance considerations Context switch Green threads Asynchronous IO Java NIO Declarative concurrency Java 8 streams Performance considerations 145/160 Green threads - Idea Green threads (library threads, user-level threads): I user-level thread is maintained outside OS, on the user level I implemented by library or VM I 1 kernel-level (OS) thread per n user-level threads I OS resources are allocated for 1 thread I cheaper scheduling – no context switch needed I m kernel threads per n user threads Problems: I need a way to suspend execution and save/restore thread stack I i.e. preempt executing thread I non-preemptable threads need to yield periodically I IO may block OS thread, which is needed by other green threads Performance considerations 146/160 Green threads - Implementations Languages/VMs: I Java 1.1 had green threads as the main implementation I Erlang VM uses green threads with no shared state I Go, Smalltalk Libraries/frameworks/engines: I Akka (Java) uses m-n model (specify dispatcher for an actor) I CPython greenlet, eventlet, gevent I Quasar (Java) modifies your code to save the stack (location and local variables) See also: I fibers, coroutines Performance considerations 147/160 Asynchronous IO - Outline Performance considerations Context switch Green threads Asynchronous IO Java NIO Declarative concurrency Java 8 streams Performance considerations 148/160 Asynchronous IO - Blocking IO problem I IO may block OS thread that is used for many green threads Solutions: 1. Use dedicated thread pool for blocking IO (Clojure) 2. Use asynchronous IO (Erlang) Some frameworks: I Netty is a non-blocking I/O (NIO) client-server framework for the development of Java network applications I Asynchronous servlets in Servlet 3.0 Performance considerations 149/160 Asynchronous IO - Ideas I Synchronous IO suspends if no data is yet available I Asynchronous IO – use callbacks that are executed when IO is readable/writeable I does not block on IO operations I may read multiple sockets by single thread (selectors) Advantages: I avoids context switch when reading from multiple sockets I solves green thread blocking IO problem Disadvantages: I requires more code to handle IO I code becomes more scattered Performance considerations 150/160 Asynchronous IO - Java NIO Buffers and channels http://tutorials.jenkov.com/java-nio/index.html I buffers are much like arrays I provide typical write-flip-read sequence I used for Java NIO channels I ByteBuffer.allocate(100) I channels are much like streams, but I both readable/writeable I support asynchronous operation, read AsynchronousByteChannel: Future<Integer> read(ByteBuffer dst) void read(ByteBuffer dst,A attachment, CompletionHandler<Integer,? superA> handler) I write also supports these 2 forms: future and callback Performance considerations 151/160 Asynchronous IO - Java NIO Selectors I may register callback for each channel we are interested I easier way is to use selectors I register as many channels as we want, select desired operation: channel.configureBlocking(false); SelectionKey key= channel.register(selector, SelectionKey.OP_READ); I supported operations OP_CONNECT, OP_ACCEPT, OP_READ, OP_WRITE I use selector.select() – blocks until at least one channel is ready for the events you registered for I selector.selectedKeys() – returns the channels that are ready Performance considerations 152/160 Summary - I context switch is changing executing mode, thread or process I context switch is quite expensive on OS (kernel) level I green threads (user-level threads) may mitigate the cost I green threads have problems with preemption, saving stack and blocking IO I blocking IO may be solved by: I using dedicated thread pool I using asynchronous IO Declarative concurrency 153/160- Ideas I Java <8 lacked functional style I declarative = pure functional (see later Erlang,Clojure) I single assignment variables, lock-step execution I deterministic, no side effects, no race conditions I lazyness, dataflow programming I interest in performance (utilizing cores) I structured declarative concurrency I parallel map/filter/reduce Declarative concurrency 154/160 Java 8 streams - Outline Performance considerations Context switch Green threads Asynchronous IO Java NIO Declarative concurrency Java 8 streams Declarative concurrency 155/160 Java 8 streams - Java8 streams Like usual streams: I sequence of values. Unlike usual streams: I do not have state, only for data transformation I support map/filter/reduce transformations I lazy – do not execute until data is needed Create stream: Stream<E> Collection<E>.stream() Arrays.stream(Object[]) Stream.of(Object[]) static<T> Stream<T> generate(Supplier<T>s) static<T> Stream<T> iterate(T seed, UnaryOperator<T>f) I last 2 produce infinite streams Declarative concurrency 156/160 Java 8 streams - Collecting stream I streams are not executed until their results are needed I terminal operation – one that produces the result Some terminal operations: long count() Optional<T> max(Comparator<? superT> comparator) Optional<T> reduce(BinaryOperator<T> accumulator) void forEach(Consumer<? superT> action) Object[] toArray() <R,A>R collect(Collector<? superT,A,R> collector) I Collector interface is very general I Collectors class contains a lot of standard implementations I toList(), toSet(), ... Declarative concurrency 157/160 Java 8 streams - Transforming stream I map – transform each element and return new stream <R> Stream<R> map(Function<? superT,? extendsR> mapper) I filter – select only some elements from the stream Stream<T> filter(Predicate<? superT> predicate) I reduce – aggregate stream into the final result Optional<T> reduce(BinaryOperator<T> accumulator) T reduce(T identity, BinaryOperator<T> accumulator) I flatMap – like map but combining resulting streams <R> Stream<R> flatMap(Function< ? superT, ? extends Stream<? extendsR>> mapper) I analogue of compose in CompletableFuture Declarative concurrency 158/160 Java 8 streams - Example: explicit Collect travelers that has speed>20, take their max and min temperatures I types of intermediate streams are given explicitly I see next slide for more compact version I limit() takes first n elements Stream<Traveler> travelers= Stream.generate(Traveler::generate); Stream<Traveler> trav10000= travelers.limit(10000); Stream<Traveler> carTrav= trav10000.filter(t->t.speed>20.0); List<Double> carTemps= carTrav.map(t->t.temperature) .collect(Collectors.toList()); Optional<Double> minT= carTemps.stream().min(Double::compare); Optional<Double> maxT= carTemps.stream().max(Double::compare); System.out.println(minT+""+ maxT); Declarative concurrency 159/160 Java 8 streams - Example: inline I more readable than explicit version double[] temps2= Stream.generate(Traveler::generate) .limit(10000) .filter(t->t.speed>20.0)

CPL 2016, Week 7 Performance Considerations

Events, Co-Routines, Continuations and Threads OS (And Application)Execution Models System Building

Designing an Ultra Low-Overhead Multithreading Runtime for Nim

An Ideal Match?

Tcl and Java Performance

Thread Scheduling in Multi-Core Operating Systems Redha Gouicem

Comparison of Concurrency Frameworks for the Java Virtual Machine

Threading and GUI Issues for R

Fibers Without Scheduler

O'reilly & Associates, Inc

Ruby Programming

Warm-Up: Computations Using Multiple Cores on a Single Machine

Discussions on Asynchronous Programming Apis Xiao-Feng Li [email protected] 2019-10-1 What Is Asynchronous Operation?