Partial Sums on the Ultra-Wide Word RAM∗ Philip Bille Inge Li Gørtz Frederik Rye Skjoldjensen
[email protected] [email protected] [email protected] Abstract We consider the classic partial sums problem on the ultra-wide word RAM model of computation. This model extends the classic w-bit word RAM model with special ultrawords of length w2 bits that support standard arithmetic and boolean operation and scattered memory access operations that can access w (non- contiguous) locations in memory. The ultra-wide word RAM model captures (and idealizes) modern vector processor architectures. Our main result is a new in-place data structure for the partial sum problem that only stores a constant number of ultrawords in addition to the input and supports operations in doubly logarithmic time. This matches the best known time bounds for the problem (among polynomial space data structures) while improving the space from superlinear to a constant number of ultrawords. Our results are based on a simple and elegant in-place word RAM data structure, known as the Fenwick tree. Our main technical contribution is a new efficient parallel ultra-wide word RAM implementation of the Fenwick tree, which is likely of independent interest. 1 Introduction Let A[1; : : : ; n] be an array of integers of length n. The partial sums problem is to maintain a data structure for A under the following operations: sum(i): return Pi A[k]. • k=1 update(i; ∆): set A[i] A[i] + ∆. • The partial sums problem is a classic and well-studied data structure problem [1,2,3,4, arXiv:1908.10159v2 [cs.DS] 30 Sep 2020 10,13,15,17,18,19,20,22,23,24,25,32,33,39].