Immutable Collections [email protected] @PaulSandoz

JavaOne 2017 Immutable Collections CON6079 1 Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole

discretion of Oracle.

JavaOne 2017 Immutable Collections CON6079 2 Agenda

• A recap of unmodifiable collections in the JDK

• A brief overview of immutable collections in external Java libraries and JVM-based platforms

• Immutable collections leveraging persistent data structures

JavaOne 2017 Immutable Collections CON6079 3 When referring to immutable collections there are no claims made as to the immutability of the collection’s elements

JavaOne 2017 Immutable Collections CON6079 4 Advantages of immutability

• Don’t need to think about concurrency and data races

• Resistant to misbehaving libraries

• Are constants that may be optimized at runtime

• Implementations can optimize over time, space for representation and transformation

JavaOne 2017 Immutable Collections CON6079 5 Immutable collections wish list

• Manifests immutability (of the collections, not their elements)

• Sealed (not publicly extensible)

• Provide a bridge to mutable collections (not extension of)

• Efficient construction, updates, and copying

JavaOne 2017 Immutable Collections CON6079 6 Unmodifiable in the JDK

• The JDK has the notion of unmodifiable collections

• Unmodifiable is a runtime property of a collection

• Modifying (add, put, remove, …) methods throw UnsupportedOperationException

• No way to directly query

JavaOne 2017 Immutable Collections CON6079 7 Two forms of unmodifiable

• Unmodifiable view or wrapper to a source or backing collection

List uvl = Collections .unmodifiableList(sourceList);

• Directly unmodifiable

List dul = List.of(1, 2, 3, …);

JavaOne 2017 Immutable Collections CON6079 8 Immutability with unmodifiable collections

• When wrapping ensure the source collection is never accessible*

List uvl = Collections .unmodifiableList(new ArrayList<>(source)); source = null;

List dul = stream.collect( collectingAndThen(toList(), Collections::unmodifiableList))

• List.of and friends are is if the source is never accessible

* Except, of course, to the unmodifiable wrapper

JavaOne 2017 Immutable Collections CON6079 9 JDK collections as immutable collections

✗ Manifests immutability

✗ Sealed

• Provide a bridge to mutable collections

✗ Efficient construction, updates and copying

JavaOne 2017 Immutable Collections CON6079 10 Unmodifiable is a reasonable abstraction for mutable collections but not for immutable collections

JavaOne 2017 Immutable Collections CON6079 11 Guava’s immutable collections

• Defines sealed types such as ImmutableList, ImmutableMap, …

• These implement the corresponding JDK mutable collection type (ImmutableList implements List)

• Copying is smart ImmutableList.copyOf(otherCollection)

JavaOne 2017 Immutable Collections CON6079 12 Guava’s collections are a good compromise

✔ Manifests immutability

✔ Sealed

✘ Provide a bridge to mutable collections

✘ Efficient construction (✔✘), updates (✘), and copying (✔✘)

JavaOne 2017 Immutable Collections CON6079 13 Eclipse collections: something for everyone

✔ Manifests immutability

✘ Sealed

✔ Provide a bridge to mutable collections

✘ Efficient construction, updates, and copying

JavaOne 2017 Immutable Collections CON6079 14 Vavr (Java), , Scala

✔ Manifests immutability

✔ Sealed*

✔ Provide a bridge to mutable collections*

✔ Efficient construction, updates, and copying

* Not completely verified but believed to be mostly true

JavaOne 2017 Immutable Collections CON6079 15 Vavr (Java), Clojure, Scala

✔ Efficient updates (addition, removal, replace, merge)

• The immutable collection implementations leverage persistent data structures for maps, sets and vectors (non-linked lists)

JavaOne 2017 Immutable Collections CON6079 16 Persistent data structures

• A persistent data structure preserves the previous version of itself when modified

• Hash Array Mapped (HAMTs) are the basis of efficient persistent (immutable) maps, sets, and vectors

• Provide structural sharing between a new and previous version of a collection

• Effectively constant time for many operations

• Cache friendly

JavaOne 2017 Immutable Collections CON6079 17

In computer science, a trie, also called digital tree and sometimes or prefix tree (as they can be searched by prefixes), is a kind of search tree — an ordered tree data structure that is used to store a dynamic set or where the keys are usually strings

A trie for keys "A","to", "tea", "ted", "ten", "i", "in", and "inn".

JavaOne 2017 Immutable Collections CON6079 18 Hash Array Mapped Trie

• Symbol is a 5 bit sequence

• String is fixed in size, 32 bits, consisting of 7 symbols (last symbol is truncated to 2 bits)

• String is the hashCode of an Object (the key)

JavaOne 2017 Immutable Collections CON6079 19 Hash Array Mapped Trie

0xCAFEBABE

32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 1 1 0

s7 s6 s5 s4 s3 s2 s1

JavaOne 2017 Immutable Collections CON6079 20 HAMT properties

• Wide branching factor, 32

• Limited tree depth, 6

• Effectively constant time lookup

O(log32N) = O(log2N/log232) = O(log2N/5)

JavaOne 2017 Immutable Collections CON6079 21 HAMT properties

• Good structural sharing (for updates, merging and splitting) but also good memory usage and cache coherency

• The basis for vectors, where index is the hash code (see also Relaxed Radix Balanced trees), and multi- maps

• Can be applied to mutable collections, for efficient construction of an immutable collection

• Efficiently implemented in Java

JavaOne 2017 Immutable Collections CON6079 22 A naive implementation

public class PMap { Object[] nodes = new Object[32 * 2];

public Optional get(K k) { return get(k, hash(k), 0); }

private Optional get(K k, int h, int d) { int symbol = symbolAtDepth(h, d); Object _k = nodes[symbol * 2];

if (_k == SUB_LAYER_NODE) { PMap n = (PMap) nodes[symbol * 2 + 1]; return n.get(k, h, d + 1); } else if (k.equals(_k)) return Optional.of((V) nodes[symbol * 2 + 1]); else return Optional.empty(); }

static int symbolAtDepth(int h, int d) { return (h >>> (d * 5) & (32 - 1)); } }

JavaOne 2017 Immutable Collections CON6079 23 A naive implementation

public class PMap { Object[] nodes = new Object[32 * 2];

public Optional get(K k) { return get(k, hash(k), 0); }

private Optional get(K k, int h, int d) { int symbol = symbolAtDepth(h, d); Object _k = nodes[symbol * 2];

if (_k == SUB_LAYER_NODE) { PMap n = (PMap) nodes[symbol * 2 + 1]; return n.get(k, h, d + 1); } else if (k.equals(_k)) return Optional.of((V) nodes[symbol * 2 + 1]); else return Optional.empty(); }

static int symbolAtDepth(int h, int d) { return (h >>> (d * 5) & (32 - 1)); } }

JavaOne 2017 Immutable Collections CON6079 24 A naive implementation

public class PMap { Object[] nodes = new Object[32 * 2];

public Optional get(K k) { return get(k, hash(k), 0); }

private Optional get(K k, int h, int d) { int symbol = symbolAtDepth(h, d); Object _k = nodes[symbol * 2];

if (_k == SUB_LAYER_NODE) { PMap n = (PMap) nodes[symbol * 2 + 1]; return n.get(k, h, d + 1); } else if (k.equals(_k)) return Optional.of((V) nodes[symbol * 2 + 1]); else return Optional.empty(); }

static int symbolAtDepth(int h, int d) { return (h >>> (d * 5) & (32 - 1)); } }

JavaOne 2017 Immutable Collections CON6079 25 A compact representation

public class PMap { // bit map of symbols @Stable private final int bitmap;

// [..., k, v, ....] or // [..., SUB_LAYER_NODE, PMap, ...] or // [..., COLLISION_NODE, CollisionNode, ...] or // invariant: a sub-layer will not consist of a single mapping node @Stable private final Object[] nodes;

JavaOne 2017 Immutable Collections CON6079 26 A better representation

private Optional get(K k, int h, int dShift) { int symbol = symbolAtDepth(h, dShift); if (bitmapGet(bitmap, symbol) == 0) return Optional.empty();

int nodeCount = bitmapCountFrom(bitmap, symbol); Object _k = nodes[nodeCount * 2]; if (_k == SUB_LAYER_NODE) { PMap s = (PMap) nodes[nodeCount * 2 + 1]; return s.get(k, h, dShift + 5); } else if (_k.equals(k)) return Optional.of((V) nodes[nodeCount * 2 + 1]); else return Optional.empty(); }

private static int bitmapCountFrom(int bitmap, int symbol) { return Integer.bitCount(bitmap & ((1 << symbol) - 1)); }

JavaOne 2017 Immutable Collections CON6079 27 A better representation

private Optional get(K k, int h, int dShift) { int symbol = symbolAtDepth(h, dShift); if (bitmapGet(bitmap, symbol) == 0) return Optional.empty();

int nodeCount = bitmapCountFrom(bitmap, symbol); Object _k = nodes[nodeCount * 2]; if (_k == SUB_LAYER_NODE) { PMap s = (PMap) nodes[nodeCount * 2 + 1]; return s.get(k, h, dShift + 5); } else if (_k.equals(k)) return Optional.of((V) nodes[nodeCount * 2 + 1]); else return Optional.empty(); }

private static int bitmapCountFrom(int bitmap, int symbol) { return Integer.bitCount(bitmap & ((1 << symbol) - 1)); }

JavaOne 2017 Immutable Collections CON6079 28 A better representation

private Optional get(K k, int h, int dShift) { int symbol = symbolAtDepth(h, dShift); if (bitmapGet(bitmap, symbol) == 0) return Optional.empty();

int nodeCount = bitmapCountFrom(bitmap, symbol); Object _k = nodes[nodeCount * 2]; if (_k == SUB_LAYER_NODE) { PMap s = (PMap) nodes[nodeCount * 2 + 1]; return s.get(k, h, dShift + 5); } else if (_k.equals(k)) return Optional.of((V) nodes[nodeCount * 2 + 1]); else return Optional.empty(); }

private static int bitmapCountFrom(int bitmap, int symbol) { return Integer.bitCount(bitmap & ((1 << symbol) - 1)); }

JavaOne 2017 Immutable Collections CON6079 29 A better representation

private Optional get(K k, int h, int dShift) { int symbol = symbolAtDepth(h, dShift); if (bitmapGet(bitmap, symbol) == 0) return Optional.empty();

int nodeCount = bitmapCountFrom(bitmap, symbol); Object _k = nodes[nodeCount * 2]; if (_k == SUB_LAYER_NODE) { PMap s = (PMap) nodes[nodeCount * 2 + 1]; return s.get(k, h, dShift + 5); } else if (_k.equals(k)) return Optional.of((V) nodes[nodeCount * 2 + 1]); else return Optional.empty(); }

private static int bitmapCountFrom(int bitmap, int symbol) { return Integer.bitCount(bitmap & ((1 << symbol) - 1)); }

Compiles to POPCNT on x64

JavaOne 2017 Immutable Collections CON6079 30 A better representation

• Space is required only for present nodes, using the bitmap (made possible with HotSpot optimzations)

• Further refinements (and tradeoffs) possible

• Sub nodes and entries could be separated for more cache friendly traversal (see Steindorfer’s work compressed HAMTs aka CHAMP)

• Hash codes could be cached

JavaOne 2017 Immutable Collections CON6079 31 Persistent Map API

public void forEach(BiConsumer action);

public Optional get(K k);

public PMap put(K k, V v);

public PMap remove(K k);

JavaOne 2017 Immutable Collections CON6079 32 Persistent collections API

• Modifying methods return a new collection

• An implementation shares unmodified structure with the previous collection

• Require mutable builders to efficiently construct in a confined manner

• For example, closure/thread confined construction then freezing

JavaOne 2017 Immutable Collections CON6079 33 Demo: Visualizing HAMT-based persistent maps https://github.com/PaulSandoz/per/

JavaOne 2017 Immutable Collections CON6079 34 Summary

• Unmodifiable is a reasonable abstraction for mutable but not immutable

• For efficient immutable collections we need persistent collections

• Sets, maps and vectors using HAMTs have proven to be effective in many libraries and platforms

JavaOne 2017 Immutable Collections CON6079 35 What about Java?

• We shall continue to improve on unmodifiable in the JDK

• Selective sedimentation of persistent collections into the Java platform?

• Claim: possibly to optimize such collections very aggressively with internal APIs, HotSpot, and safely contained unsafe mechanisms

JavaOne 2017 Immutable Collections CON6079 36 References

• Fast And Space Efficient Trie Searches, Bagwell https://pdfs.semanticscholar.org/93a1/fe7f226cfbc7cb2bceac39308a66c8aef0b0.pdf

• Ideal Hash Trees, Bagwell http://lampwww.epfl.ch/papers/idealhashtrees.pdf

• RRB-Trees: Efficient Immutable Vectors, Bagwell and Rompf https://infoscience.epfl.ch/record/169879/files/RMTrees.pdf

• Optimizing Hash-Array Mapped Tries for Fast Lean Immutable JVM Collections, Steindorfer and Vinju https://michael.steindorfer.name/publications/oopsla15.pdf

• Efficient Immutable Collections - PhD Thesis - Steindorfer https://michael.steindorfer.name/publications/phd-thesis-efficient-immutable-collections.pdf

• Cache-Aware Lock-Free Concurrent Hash Tries, Prokopec, Bagwell, Odersky https://infoscience.epfl.ch/record/166908/files/-techreport.pdf

JavaOne 2017 Immutable Collections CON6079 37