<<

arkouda Release 2020.07.07

Michael Merrill and William Reus

Sep 30, 2021

CONTENTS:

1 Prerequisites 1 1.1 Chapel...... 1 1.2 Python 3 (Anaconda recommended)...... 1 1.3 HDF5 and ZMQ (included with Anaconda)...... 2 1.4 (included with Anaconda)...... 2 1.5 (recommended; included with Anaconda)...... 2

2 Installation 3 2.1 Download...... 3 2.2 Environment Setup...... 3 2.3 Build the Server...... 3 2.4 Install the Client...... 4 2.5 Troubleshooting...... 4

3 Performance Testing 5 3.1 Argsort...... 5 3.2 Gather...... 6 3.3 Reduce...... 6 3.4 Scan...... 7 3.5 Scatter...... 8 3.6 Stream...... 9

4 Quickstart 11 4.1 Launch Arkouda Server...... 11 4.2 Connect the Python 3 Client...... 12 4.3 Simple Computations...... 12 4.4 Shutdown the server (optional)...... 13

5 Usage 15 5.1 Startup...... 15 5.2 The pdarray class...... 16 5.3 Creating Arrays...... 19 5.4 Data I/O...... 25 5.5 Arithmetic and Numeric Operations...... 33 5.6 Indexing and Assignment...... 43 5.7 Summarizing Data...... 45 5.8 Sorting...... 49 5.9 Array Operations...... 50 5.10 GroupBy...... 55 5.11 Strings in Arkouda...... 66

i 5.12 Categoricals...... 74

6 Examples 79 6.1 DataFrame-like Patterns...... 79 6.2 Graphs...... 80

7 Contributing 83 7.1 Adding Python Functionality...... 83 7.2 Adding Functionality to the Arkouda Server...... 84

8 API Reference 87 8.1 arkouda ...... 87

9 Chapel API Reference 293

10 Indices and tables 295

Python Module Index 297

Index 299

ii CHAPTER ONE

PREREQUISITES

1.1 Chapel

(version 1.24.1 or greater) The arkouda server application is written in Chapel, a productive and performant parallel programming language. In order to use arkouda, you must first download the current version of Chapel and build it according to the instructions for your platform(s). Below are tips for building Chapel to support arkouda.

1.1.1 Multi-Locale

Chapel and Arkouda are both designed to be portable, running with minimal reconfiguration on a laptop and a super- computer. In fact, the developers of arkouda typically implement new functionality on a workstation, test performance on a small cluster, and support users on a processing architecture. The Chapel documentation has detailed instructions for multilocale Chapel execution, which are important to carefully observe on multi-node systems. For an individual machine (e.g. a laptop or a workstation), you have two options. The default is single-locale mode, which is also the easiest and most performant. You do not need any special settings to enable this mode; simply build Chapel according to the above instructions. However, if you want your single machine to emulate a multi-node system (e.g. you want to test multi-node functionality on your laptop before moving to a larger system), you can enabling multilocale execution on a single machineby simply setting these environment variables: export CHPL_COMM=gasnet export CHPL_LAUNCHER=smp and (re)running make within $CHPL_HOME. Both single- and multi-locale Chapel builds can happily coexist side- by-side. If you have built Chapel with both configurations, you can switch between them by setting export CHPL_COMM=none or export CHPL_COMM=gasnet before compiling your Chapel program (e.g. the arkouda server).

1.2 Python 3 (Anaconda recommended)

(version 3.6 or greater) Currently, the arkouda client is written in Python 3. We recommend using the Anaconda Python 3 distribution, with Python 3.6 or greater, because it automatically satisfies the remaining prerequisites.

1 arkouda, Release 2020.07.07

1.3 HDF5 and ZMQ (included with Anaconda)

Arkouda uses HDF5 for file I/O and ZMQ for server-client communication. Both libraries can either be downloaded and built manually or acquired via a Python package manager. For example, both libraries come pre-installed with the Anaconda Python distribution and can be found in the include, bin, and lib subdirectories of the Anaconda root directory. Alternatively, running pip3 install arkouda will also install these dependencies from the PyPI.

1.4 Numpy (included with Anaconda)

Arkouda interoperates with the numerical Python package NumPy, using NumPy data types and supporting conversion between NumPy ndarray and arkouda pdarray classes. The best way to get NumPy is via the Anaconda distribution or from the PyPI via pip3 install arkouda.

1.5 Pandas (recommended; included with Anaconda)

While Pandas is not required by the arkouda client, some of the arkouda tests use Pandas as a standard to check the correctness of arkouda operations. As with NumPy, the best way to get Pandas is via the Anaconda distribution or a the PyPI.

2 Chapter 1. Prerequisites CHAPTER TWO

INSTALLATION

Before installing arkouda, make sure to satisfy all the Prerequisites, including setting up your environment for Chapel.

2.1 Download

The easiest way to get arkouda is to download, clone, or fork the arkouda github repo.

2.2 Environment Setup

1. Ensure that CHPL_HOME is set and $CHPL_HOME/bin is in your PATH (consider adding to your .*rc file). 2. Tell arkouda where to find the HDF5 and ZMQ libraries. Do this by creating or modifying the Makefile.paths file in the arkouda root directory and adding one or more lines oftheform $(eval $(call add-path,/path/to/HDF5/root)) $(eval $(call add-path,/path/to/ZMQ/root))

However, if you have the Anaconda Python distribution, the HDF5 and ZMQ libraries will be in subdirectories of the Anaconda root directory, so your Makefile.paths need only contain one line: $(eval $(call add-path,/path/to/Anaconda/root))

Be sure to customize these paths appropriately for your system.

2.3 Build the Server

Run make in the arkouda root directory to build the arkouda server program. Note: this will take 10-20 minutes, depending on your processor. We recommend adding the arkouda root directory to your PATH environment variable.

3 arkouda, Release 2020.07.07

2.4 Install the Client

There are two ways to install the python client. It is available from the Python Package Index (PyPI) with: pip3 install arkouda

If you are planning to contribute to arkouda as a developer, you may wish to install an editable version linked to your local copy of the github repo: pip3 install -e path/to/local/arkouda/repo

2.5 Troubleshooting

2.5.1 Chapel not built for this configuration

Error: Build fails with a message stating Chapel was not built for this configuration Solution: While a full rebuild of Chapel is not required, some additional components must be built with the current environment settings. Do this by setting all your arkouda environment variables as above and running: cd $CHPL_HOME make

This should build the extra components needed by Chapel to compile arkouda.

2.5.2 Unable to find HDF5 or ZMQ

Error: Cannot find -lzmq or -lhdf5 Solution: Ensure the path(s) in the arkouda Makefile.paths file are valid, and that the files lib/libzmq.so and lib/libhdf5.so appear there. If not, try reinstalling HDF5 and/or ZMQ at those locations, or install the Anaconda distribution and place the Anaconda root directory in Makefile.paths.

4 Chapter 2. Installation CHAPTER THREE

PERFORMANCE TESTING

The benchmarks directory contains scripts for measuring the performance of arkouda, optionally compared to the (single-node) performance of NumPy.

3.1 Argsort

Measure performance of sorting an array of random values. usage: argsort.py [-h] [-n SIZE] [-t TRIALS] [-d DTYPE] [--numpy] [--correctness-only] [-s SEED] hostname port

3.1.1 Positional Arguments

hostname Hostname of arkouda server port Port of arkouda server

3.1.2 Named Arguments

-n, --size Problem size: length of array to argsort Default: 100000000 -t, --trials Number of times to run the benchmark Default: 3 -d, --dtype Dtype of array (int64, float64) Default: “int64” --numpy Run the same operation in NumPy to compare performance. Default: False --correctness-only Only check correctness, not performance. Default: False -s, --seed to initialize random number generator

5 arkouda, Release 2020.07.07

3.2 Gather

Measure the performance of random gather: = V[I] usage: gather.py [-h] [-n SIZE] [-i INDEX_SIZE] [-v VALUE_SIZE] [-t TRIALS] [-d DTYPE] [-] [--numpy] [--correctness-only] [-s SEED] hostname port

3.2.1 Positional Arguments

hostname Hostname of arkouda server port Port of arkouda server

3.2.2 Named Arguments

-n, --size Problem size: length of index and gather arrays Default: 100000000 -i, --index-size Length of index array (number of gathers to perform) -v, --value-size Length of array from which values are gathered -t, --trials Number of times to run the benchmark Default: 6 -d, --dtype Dtype of value array (int64, float64, bool) Default: “int64” -r, --randomize Use random values instead of ones Default: False --numpy Run the same operation in NumPy to compare performance. Default: False --correctness-only Only check correctness, not performance. Default: False -s, --seed Value to initialize random number generator

3.3 Reduce

Measure performance of reductions over arrays. usage: reduce.py [-h] [-n SIZE] [-t TRIALS] [-d DTYPE] [-r] [--numpy] [--correctness-only] [-s SEED] hostname port

6 Chapter 3. Performance Testing arkouda, Release 2020.07.07

3.3.1 Positional Arguments

hostname Hostname of arkouda server port Port of arkouda server

3.3.2 Named Arguments

-n, --size Problem size: length of array to reduce Default: 100000000 -t, --trials Number of times to run the benchmark Default: 6 -d, --dtype Dtype of array (int64, float64) Default: “int64” -r, --randomize Fill array with random values instead of range Default: False --numpy Run the same operation in NumPy to compare performance. Default: False --correctness-only Only check correctness, not performance. Default: False -s, --seed Value to initialize random number generator

3.4 Scan

Measure the performance of scans (cumulative reductions) over arrays. usage: scan.py [-h] [-n SIZE] [-t TRIALS] [-d DTYPE] [-r] [--numpy] [--correctness-only] [-s SEED] hostname port

3.4.1 Positional Arguments

hostname Hostname of arkouda server port Port of arkouda server

3.4. Scan 7 arkouda, Release 2020.07.07

3.4.2 Named Arguments

-n, --size Problem size: length of array Default: 100000000 -t, --trials Number of times to run the benchmark Default: 6 -d, --dtype Dtype of array (int64, float64) Default: “int64” -r, --randomize Fill array with random values instead of range Default: False --numpy Run the same operation in NumPy to compare performance. Default: False --correctness-only Only check correctness, not performance. Default: False -s, --seed Value to initialize random number generator

3.5 Scatter

Measure performance of random scatter: C[I] = V usage: scatter.py [-h] [-n SIZE] [-i INDEX_SIZE] [-v VALUE_SIZE] [-t TRIALS] [-d DTYPE] [-r] [--numpy] [--correctness-only] [-s SEED] hostname port

3.5.1 Positional Arguments

hostname Hostname of arkouda server port Port of arkouda server

3.5.2 Named Arguments

-n, --size Problem size: length of index and scatter arrays Default: 100000000 -i, --index-size Length of index array (number of scatters to perform) -v, --value-size Length of array from which values are scattered -t, --trials Number of times to run the benchmark Default: 6 -d, --dtype Dtype of value array (int64, float64, bool) Default: “int64”

8 Chapter 3. Performance Testing arkouda, Release 2020.07.07

-r, --randomize Use random values instead of ones Default: False --numpy Run the same operation in NumPy to compare performance. Default: False --correctness-only Only check correctness, not performance. Default: False -s, --seed Value to initialize random number generator

3.6 Stream

Run the stream benchmark: C = A + alpha*B usage: stream.py [-h] [-n SIZE] [-t TRIALS] [-d DTYPE] [-r] [-a ALPHA] [--numpy] [--correctness-only] [-s SEED] hostname port

3.6.1 Positional Arguments

hostname Hostname of arkouda server port Port of arkouda server

3.6.2 Named Arguments

-n, --size Problem size: length of arrays A and B Default: 100000000 -t, --trials Number of times to run the benchmark Default: 6 -d, --dtype Dtype of arrays (int64, float64) Default: “float64” -r, --randomize Fill arrays with random values instead of ones Default: False -a, --alpha Scalar multiple Default: 1.0 --numpy Run the same operation in NumPy to compare performance. Default: False --correctness-only Only check correctness, not performance. Default: False -s, --seed Value to initialize random number generator

3.6. Stream 9 arkouda, Release 2020.07.07

10 Chapter 3. Performance Testing CHAPTER FOUR

QUICKSTART

This guide assumes you have satisfied the Prerequisites and followed the Installation to build the arkouda server. Also, both your PATH and PYTHONPATH environment variables should contain the arkouda root directory.

4.1 Launch Arkouda Server

In a terminal, run the arkouda server program with one locale You should see a startup message like $ ./arkouda_server -nl 1 arkouda server version = 2020.07.07 memory tracking = true initialized the .arkouda directory /Your/PATH/arkouda/.arkouda getMemLimit() = 123695058124 bytes of memoryUsed() = 2462 server listening on tcp://node01:5555

or with authentication turned on $ ./arkouda_server -nl 1 --authenticate arkouda server version = 2020.07.07 memory tracking = true initialized the .arkouda directory /Your/PATH/arkouda/.arkouda getMemLimit() = 123695058124 bytes of memoryUsed() = 2462 server listening on tcp://node01:5555?token=vikq8Co2fqv20usbrRnRtFsLr9nNbad

The last line is the most important, because it contains the connection url with the hostname and port required for the client to connect to the server.

11 arkouda, Release 2020.07.07

4.2 Connect the Python 3 Client

In another terminal window, launch an interactive Python 3 session, such as ipython or jupyter notebook (both included with the Anaconda distribution). To connect to the arkouda server, you must import the arkouda module and call connect with the connection url from the server startup messages. In Python, run >>> import arkouda as ak >>> default way to connect is >>> ak.connect(connect_url='tcp://node01:5555') ... connected to tcp://node01:5555

substituting the hostname and port appropriately (defaults are ‘localhost’ and 5555).

4.3 Simple Computations

4.3.1 Create and sum an array

The following code creates an arkouda pdarray that resides on the arkouda server and performs a server-side compu- tation, returning the result to the Python client. # Create a server-side array with integers from 1 to N inclusive # This syntax is from NumPy >>>N= 10**6 >>>A= ak.arange(1,N+1,1) # Sum the array, returning the result to Python >>> print(A.sum()) # Check the result >>> assert A.sum() ==(N*(N+1))//2

4.3.2 Array arithmetic

Now, we will perform an operation on two pdarray objects to create a new pdarray. This time, the result will not be returned to the Python client, but will be stored on the server. In general, only scalar results are automatically returned to Python; pdarray results remain on the server unless explicitly transferred by the user (see arkouda.pdarray. to_ndarray()). # Generate two (server-side) arrays of random integers 0-9 >>>B= ak.randint(0, 10, N) >>>C= ak.randint(0, 10, N) # Multiply them (server-side) >>>D=B*C # Print a small representation of the array # This does NOT move the array to the client >>> print(D) # Get the min and max values # Because these are scalars, they live in Python >>> minVal=D.min() >>> maxVal=D.max() >>> print(minVal, maxVal)

12 Chapter 4. Quickstart arkouda, Release 2020.07.07

4.3.3 Indexing

Arkouda pdarray objects support most of the same indexing and assignment syntax of 1-dimensional NumPy ndarray``s (arkouda currently only supports 1-D arrays). This code shows two ways to get the even elements of ``A from above: with a slice, and with logical indexing. # Use a slice >>> evens1=A[1::2] # Create a logical index # Bool pdarray of same size as A >>> evenInds= ((A%2) ==0) # Use it to get the evens >>> evens2= A[evenInds] # Compare the two (server-side) arrays >>> assert (evens1 == evens2).all()

4.3.4 Sorting

Sorting arrays is a ubiquitous operation, and it is often useful to use the sorting of one array to order other arrays. Like NumPy, arkouda provides this functionality via the argsort function, which returns a vector that can be used as an index to order other arrays. Here, we will order the arrays B and C from above according to the product of their elements (D). # Compute the permutation that sorts the product array >>> perm= ak.argsort(D) # Reorder B, C, and D >>>B= B[perm] >>>C= C[perm] >>>D= D[perm] # Check that D is monotonically non-decreasing >>> assert (D[:-1]<=D[1:]).all() # Check that reordered B and C still produce D >>> assert ((B*C) ==D).all()

4.3.5 And More

See the Usage section for the full list of operations supported on arkouda arrays. These operations are quite composable and can be used to implement more complex as in the Examples section.

4.4 Shutdown the server (optional)

If desired, you can shutdown the arkouda server from a connected client with >>> ak.shutdown()

This command will delete all server-side arrays and cause the arkouda_server in the first terminal to exit.

4.4. Shutdown the server (optional) 13 arkouda, Release 2020.07.07

14 Chapter 4. Quickstart CHAPTER FIVE

USAGE

5.1 Startup

5.1.1 Launch arkouda server

Follow the Installation instructions to build the arkouda server program. In a terminal, launch it with arkouda_server -nl

Choose a number of locales that is right for your system and data. The -h flag gives a detailed usage with additional command-line options added by Chapel. The last line of output from the arkouda_server command should look like server listening on tcp://node01:5555

Use this hostname and port in the next step to connect to the server.

5.1.2 Connect a Python 3 client

In Python 3, connect to the arkouda server using the hostname and port shown by the server program (example values shown here) >>> import arkouda as ak >>> ak.connect(connect_url='tcp://node01:5555') ... connected to node01:5555

If the output does not say “connected”, then something went wrong (even if the command executes). Check that the hostname and port match what the server printed, and that the hostname is reachable from the machine on which the client is running (e.g. not “localhost” for a remote server) arkouda.connect(server: str = 'localhost', port: int = 5555, timeout: int = 0, access_token: Optional[str] = None, connect_url=None) → None Connect to a running arkouda server. Parameters • server (str, optional) – The hostname of the server (must be visible to the current machine). Defaults to localhost. • port (int, optional) – The port of the server. Defaults to 5555.

15 arkouda, Release 2020.07.07

• timeout (int, optional) – The timeout in seconds for client send and receive operations. Defaults to 0 seconds, whicn is interpreted as no timeout. • access_token (str, optional) – The token used to connect to an existing socket to en- able access to an Arkouda server where authentication is enabled. Defaults to None. • connect_url (str, optional) – The complete url in the format of tcp://server: port?token= where the token is optional Returns Return type None Raises • ConnectionError – Raised if there’s an error in connecting to the Arkouda server • ValueError – Raised if there’s an error in parsing the connect_url parameter • RuntimeError – Raised if there is a server-side error

Notes

On success, prints the connected address, as seen by the server. If called with an existing connection, the socket will be re-initialized.

5.2 The pdarray class

Just as the backbone of NumPy is the ndarray, the backbone of arkouda is an array class called pdarray. And just as the ndarray object is a Python wrapper for C-style data with C and methods, the pdarray object is a Python wrapper for distributed data with parallel methods written in Chapel. The API of pdarray is similar, but not identical, to that of ndarray. class arkouda.pdarray(name: str, mydtype: numpy.dtype, size: Union[int, numpy.int64], ndim: Union[int, numpy.int64], shape: Sequence[int], itemsize: Union[int, numpy.int64]) The basic arkouda array class. This class contains only the attributies of the array; the data resides on the arkouda server. When a server operation results in a new array, arkouda will create a pdarray instance that points to the array data on the server. As such, the user should not initialize pdarray instances directly. name The server-side identifier for the array Type str dtype The element type of the array Type dtype size The number of elements in the array Type int_scalars ndim The rank of the array (currently only rank 1 arrays supported) Type int_scalars shape A list or tuple containing the sizes of each dimension of the array

16 Chapter 5. Usage arkouda, Release 2020.07.07

Type Sequence[int] itemsize The size in bytes of each element Type int_scalars

5.2.1

Currently, pdarray supports three user-facing data types (strings are exposed via a separate class, see Strings in Ark- ouda): • int64: 64-bit signed integer • float64: IEEE 64-bit floating point number • bool: 8-bit boolean value Arkouda inherits all of its data types from numpy. For example, ak.int64 is derived from np.int64.

5.2.2 Rank

Currently, a pdarray can only have rank 1. We plan to support sparse, multi-dimensional arrays via data incorporating rank-1 pdarray objects.

5.2.3 Name

The name attribute of an array is a string used by the arkouda server to identify the pdarray object in its symbol . This name is chosen by the server, and the user should not overwrite it.

5.2.4 Operators

The pdarray class supports most Python special methods, including arithmetic, bitwise, and comparison operators.

5.2.5 Iteration

Iterating directly over a pdarray with for x in array is not supported to discourage transferring all array data from the arkouda server to the Python client since there is almost always a more array-oriented way to express an iterator- based computation. To force this transfer, use the to_ndarray function to return the pdarray as a numpy.ndarray. This transfer will raise an error if it exceeds the byte limit defined in arkouda.maxTransferBytes. arkouda.pdarray.to_ndarray(self ) → numpy.ndarray Convert the array to a np.ndarray, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised. Returns A numpy ndarray with the same attributes and data as the pdarray Return type np.ndarray Raises RuntimeError – Raised if there is a server-side error thrown, if the pdarray size exceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

5.2. The pdarray class 17 arkouda, Release 2020.07.07

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python clientis running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution. See also: array

Examples

>>> a= ak.arange(0,5,1) >>> a.to_ndarray() array([0, 1, 2, 3, 4])

>>> type(a.to_ndarray()) numpy.ndarray

5.2.6 Type Casting

Conversion between dtypes is sometimes implicit, as in the following example: >>> a= ak.arange(10) >>> b= 1.0*a >>> b.dtype dtype('float64')

Explicit conversion is supported via the cast function. arkouda.cast(pda: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings], dt: Union[numpy.dtype, str]) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Cast an array to another dtype. Parameters • pda (pdarray or Strings) – The array of values to cast • dtype (np.dtype or str) – The target dtype to cast values to Returns Array of values cast to desired dtype Return type pdarray or Strings

18 Chapter 5. Usage arkouda, Release 2020.07.07

Notes

The cast is performed according to Chapel’s casting rules and is NOT safe from overflows or underflows. The user must ensure that the target dtype has the precision and capacity to hold the desired result.

Examples

>>> ak.cast(ak.linspace(1.0,5.0,5), dt=ak.int64) array([1, 2, 3, 4, 5])

>>> ak.cast(ak.arange(0,5), dt=ak.float64).dtype dtype('float64')

>>> ak.cast(ak.arange(0,5), dt=ak.bool) array([False, True, True, True, True])

>>> ak.cast(ak.linspace(0,4,5), dt=ak.bool) array([False, True, True, True, True])

5.3 Creating Arrays

There are several ways to initialize arkouda pdarray objects, most of which come from NumPy.

5.3.1 Constant arkouda.zeros(size: Union[int, numpy.int64], dtype: type = ) → arkouda.pdarrayclass.pdarray Create a pdarray filled with zeros. Parameters • size (int_scalars) – Size of the array (only rank-1 arrays supported) • dtype (all_scalars) – Type of resulting array, default float64 Returns Zeros of the requested size and dtype Return type pdarray Raises TypeError – Raised if the supplied dtype is not supported or if the size parameter is neither an int nor a str that is parseable to an int. See also: ones, zeros_like

5.3. Creating Arrays 19 arkouda, Release 2020.07.07

Examples

>>> ak.zeros(5, dtype=ak.int64) array([0, 0, 0, 0, 0])

>>> ak.zeros(5, dtype=ak.float64) array([0, 0, 0, 0, 0])

>>> ak.zeros(5, dtype=ak.bool) array([False, False, False, False, False]) arkouda.ones(size: Union[int, numpy.int64], dtype: type = dtype('float64')) → arkouda.pdarrayclass.pdarray Create a pdarray filled with ones. Parameters • size (int_scalars) – Size of the array (only rank-1 arrays supported) • dtype (Union[float64, int64, bool]) – Resulting array type, default float64 Returns Ones of the requested size and dtype Return type pdarray Raises TypeError – Raised if the supplied dtype is not supported or if the size parameter is neither an int nor a str that is parseable to an int. See also: zeros, ones_like

Examples

>>> ak.ones(5, dtype=ak.int64) array([1, 1, 1, 1, 1])

>>> ak.ones(5, dtype=ak.float64) array([1, 1, 1, 1, 1])

>>> ak.ones(5, dtype=ak.bool) array([True, True, True, True, True]) arkouda.zeros_like(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Create a zero-filled pdarray of the same size and dtype as an existing pdarray. Parameters pda (pdarray) – Array to use for size and dtype Returns Equivalent to ak.zeros(pda.size, pda.dtype) Return type pdarray Raises TypeError – Raised if the pda parameter is not a pdarray. See also: zeros, ones_like

20 Chapter 5. Usage arkouda, Release 2020.07.07

Examples

>>> zeros= ak.zeros(5, dtype=ak.int64) >>> ak.zeros_like(zeros) array([0, 0, 0, 0, 0])

>>> zeros= ak.zeros(5, dtype=ak.float64) >>> ak.zeros_like(zeros) array([0, 0, 0, 0, 0])

>>> zeros= ak.zeros(5, dtype=ak.bool) >>> ak.zeros_like(zeros) array([False, False, False, False, False]) arkouda.ones_like(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Create a one-filled pdarray of the same size and dtype as an existing pdarray. Parameters pda (pdarray) – Array to use for size and dtype Returns Equivalent to ak.ones(pda.size, pda.dtype) Return type pdarray Raises TypeError – Raised if the pda parameter is not a pdarray. See also: ones, zeros_like

Notes

Logic for generating the pdarray is delegated to the ak.ones method. Accordingly, the supported dtypes match are defined by the ak.ones method.

Examples

>>> ones= ak.ones(5, dtype=ak.int64) >>> ak.ones_like(ones) array([1, 1, 1, 1, 1])

>>> ones= ak.ones(5, dtype=ak.float64) >>> ak.ones_like(ones) array([1, 1, 1, 1, 1])

>>> ones= ak.ones(5, dtype=ak.bool) >>> ak.ones_like(ones) array([True, True, True, True, True])

5.3. Creating Arrays 21 arkouda, Release 2020.07.07

5.3.2 Regular arkouda.arange([start ], stop[, stride ]) Create a pdarray of consecutive integers within the interval [start, stop). If only one arg is given then arg is the stop parameter. If two args are given, then the first arg is start and second is stop. If three args are given, then the first arg is start, second is stop, third is stride. Parameters • start (int_scalars, optional) – Starting value (inclusive) • stop (int_scalars) – Stopping value (exclusive) • stride (int_scalars, optional) – The difference between consecutive elements, the default stride is 1, if stride is specified then start must also be specified. Returns Integers from start (inclusive) to stop (exclusive) by stride Return type pdarray, int64 Raises • TypeError – Raised if start, stop, or stride is not an int object • ZeroDivisionError – Raised if stride == 0 See also: linspace, zeros, ones, randint

Notes

Negative strides result in decreasing values. Currently, only int64 pdarrays can be created with this method. For float64 arrays, use the linspace method.

Examples

>>> ak.arange(0,5,1) array([0, 1, 2, 3, 4])

>>> ak.arange(5,0,-1) array([5, 4, 3, 2, 1])

>>> ak.arange(0, 10,2) array([0, 2, 4, 6, 8])

>>> ak.arange(-5,-10,-1) array([-5, -6, -7, -8, -9]) arkouda.linspace(start: Union[float, numpy.float64, int, numpy.int64], stop: Union[float, numpy.float64, int, numpy.int64], length: Union[int, numpy.int64]) → arkouda.pdarrayclass.pdarray Create a pdarray of linearly-spaced floats in a closed interval. Parameters • start (numeric_scalars) – Start of interval (inclusive) • stop (numeric_scalars) – End of interval (inclusive)

22 Chapter 5. Usage arkouda, Release 2020.07.07

• length (int_scalars) – Number of points Returns Array of evenly spaced float values along the interval Return type pdarray, float64 Raises TypeError – Raised if start or stop is not a float or int or if length is not anint See also: arange

Notes

If that start is greater than stop, the pdarray values are generated in descending order.

Examples

>>> ak.linspace(0,1,5) array([0, 0.25, 0.5, 0.75, 1])

>>> ak.linspace(start=1, stop=0, length=5) array([1, 0.75, 0.5, 0.25, 0])

>>> ak.linspace(start=-5, stop=0, length=5) array([-5, -3.75, -2.5, -1.25, 0])

5.3.3 Random arkouda.randint(low: Union[float, numpy.float64, int, numpy.int64], high: Union[float, numpy.float64, int, numpy.int64], size: Union[int, numpy.int64], dtype=dtype('int64'), seed: Union[int, numpy.int64] = None) → arkouda.pdarrayclass.pdarray Generate a pdarray of randomized int, float, or bool values in a specified range bounded by the lowandhigh parameters. Parameters • low (numeric_scalars) – The low value (inclusive) of the range • high (numeric_scalars) – The high value (exclusive for int, inclusive for float) of the range • size (int_scalars) – The length of the returned array • dtype (Union[int64, float64, bool]) – The dtype of the array • seed (int_scalars) – Index for where to pull the first returned value Returns Values drawn uniformly from the specified range having the desired dtype Return type pdarray Raises • TypeError – Raised if dtype.name not in DTypes, size is not an int, low or high is not an int or float, or seed is not anint • ValueError – Raised if size < 0 or if high < low

5.3. Creating Arrays 23 arkouda, Release 2020.07.07

Notes

Calling randint with dtype=float64 will result in uniform non-integral floating point values.

Examples

>>> ak.randint(0, 10,5) array([5, 7, 4, 8, 3])

>>> ak.randint(0,1,3, dtype=ak.float64) array([0.92176432277231968, 0.083130710959903542, 0.68894208386667544])

>>> ak.randint(0,1,5, dtype=ak.bool) array([True, False, True, True, True])

>>> ak.randint(1,5, 10, seed=2) array([4, 3, 1, 3, 4, 4, 2, 4, 3, 2])

>>> ak.randint(1,5,3, dtype=ak.float64, seed=2) array([2.9160772326374946, 4.353429832157099, 4.5392023718621486])

>>> ak.randint(1,5, 10, dtype=ak.bool, seed=2) array([False, True, True, True, True, False, True, True, True, True])

5.3.4 Concatenation

Performance note: in multi-locale settings, the default (ordered) mode of concatenate is very communication- intensive because the distribution of the original and resulting arrays are unrelated and most data must be moved non-locally. If the application does not require the concatenated array to be ordered (e.g. if the result is simply going to be sorted anyway), then using the keyword ordered=False will greatly speed up concatenation by minimizing non-local data movement. arkouda.concatenate(arrays: Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]], ordered: bool = True) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical] Concatenate a list or tuple of pdarray or Strings objects into one pdarray or Strings object, respectively. Parameters • arrays (Sequence[Union[pdarray,Strings,Categorical]]) – The arrays to con- catenate. Must all have same dtype. • ordered (bool) – If True (default), the arrays will be appended in the order given. If False, array data may be interleaved in blocks, which can greatly improve performance but results in non-deterministic ordering of elements. Returns Single pdarray or Strings object containing all values, returned in the original order Return type Union[pdarray,Strings,Categorical] Raises • ValueError – Raised if arrays is empty or if 1..n pdarrays have differing dtypes

24 Chapter 5. Usage arkouda, Release 2020.07.07

• TypeError – Raised if arrays is not a pdarrays or Strings python Sequence such as a list or tuple • RuntimeError – Raised if 1..n array elements are dtypes for which concatenate has not been implemented.

Examples

>>> ak.concatenate([ak.array([1,2,3]), ak.array([4,5,6])]) array([1, 2, 3, 4, 5, 6])

>>> ak.concatenate([ak.array([True,False,True]),ak.array([False,True,True])]) array([True, False, True, False, True, True])

>>> ak.concatenate([ak.array(['one','two']),ak.array(['three','four','five'])]) array(['one', 'two', 'three', 'four', 'five'])

5.4 Data I/O

5.4.1 Between client and server

Arkouda is designed to integrate with NumPy and Pandas, with arkouda handling large, distributed data in parallel while receiving and sending smaller input and output data to/from Python as NumPy ndarray objects. A common arkouda workflow looks like 1. Load in a large dataset with arkouda 2. Enter or create a small NumPy array with user data to compare against the large dataset 3. Convert the NumPy array to an arkouda array (transferring the data to the server) 4. Run computations that filter or summarize the large dataset 5. Pass the smaller result set back to Python as a NumPy array for plotting or inspection Below are the functions that enable both sides of this transfer. arkouda.array(a: Union[arkouda.pdarrayclass.pdarray, numpy.ndarray, Iterable]) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Convert a Python or Numpy Iterable to a pdarray or Strings object, sending the corresponding data to the arkouda server. Parameters a (Union[pdarray, np.ndarray]) – Rank-1 array of a supported dtype Returns A pdarray instance stored on arkouda server or Strings instance, which is composed of two pdarrays stored on arkouda server Return type pdarray or Strings Raises • TypeError – Raised if a is not a pdarray, np.ndarray, or Python Iterable such as a list, array, tuple, or deque • RuntimeError – Raised if a is not one-dimensional, nbytes > maxTransferBytes, a.dtype is not supported (not in DTypes), or if the product of a size and a.itemsize > maxTransferBytes

5.4. Data I/O 25 arkouda, Release 2020.07.07

• ValueError – Raised if the returned message is malformed or does not contain the fields required to generate the array. See also: pdarray.to_ndarray

Notes

The number of bytes in the input array cannot exceed arkouda.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overwhelming the connection between the Python client and the arkouda server, under the assumption that it is a low-bandwidth connection. The user may override this limit by setting ak.maxTransferBytes to a larger value, but should proceed with caution. If the pdrray or ndarray is of type U, this method is called twice recursively to create the Strings object and the two corresponding pdarrays for string bytes and offsets, respectively.

Examples

>>> ak.array(np.arange(1,10)) array([1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> ak.array(range(1,10)) array([1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> strings= ak.array([ 'string {}'.format(i) for i in range(0,5)]) >>> type(strings) arkouda.pdarray.to_ndarray(self ) → numpy.ndarray Convert the array to a np.ndarray, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised. Returns A numpy ndarray with the same attributes and data as the pdarray Return type np.ndarray Raises RuntimeError – Raised if there is a server-side error thrown, if the pdarray size exceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python clientis running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution. See also: array

26 Chapter 5. Usage arkouda, Release 2020.07.07

Examples

>>> a= ak.arange(0,5,1) >>> a.to_ndarray() array([0, 1, 2, 3, 4])

>>> type(a.to_ndarray()) numpy.ndarray arkouda.Strings.to_ndarray(self ) → numpy.ndarray Convert the array to a np.ndarray, transferring array data from the arkouda server to Python. If the array exceeds a built-in size limit, a RuntimeError is raised. Returns A numpy ndarray with the same strings as this array Return type np.ndarray

Notes

The number of bytes in the array cannot exceed arkouda.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.maxTransferBytes to a larger value, but proceed with caution. See also: array

Examples

>>> a= ak.array(["hello","my","world"]) >>> a.to_ndarray() array(['hello', 'my', 'world'], dtype='>> type(a.to_ndarray()) numpy.ndarray

5.4.2 Large Datasets

Data Preprocessing

Arkouda is designed to work primarily with columnar data spread across multiple files of non-uniform size. All disk- based I/O uses the HDF5 file format and associates each column of data with an HDF5 dataset present at the rootlevel of all files. Files are processed in parallel with one file per locale. While HDF5 has an MPI layer for concurrent reading and writing of a single file from multiple nodes, arkouda does not yet support this functionality. Because most data does not come in HDF5 format, the arkouda developers use arkouda in conjunction with several data preprocessing pipelines. While each dataset requires a unique conversion strategy, all preprocessing should: • Transpose row-based formats (e.g. CSV) to columns and output each column as an HDF5 dataset

5.4. Data I/O 27 arkouda, Release 2020.07.07

• NOT aggregate input files too aggressively, but keep them separate to enable parallel I/O (hundreds or thousands of files is appropriate, in our experience) • Convert text to numeric types where possible Much of this preprocessing can be accomplished with the Pandas read* functions for ingest and the h5py module for output. See this example for ideas.

Reading HDF5 data from disk arkouda.read_hdf(dsetName: str, filenames: Union[str, List[str]], strictTypes: bool = True, allow_errors: bool = False, calc_string_offsets: bool = False) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Read a single dataset from multiple HDF5 files into an Arkouda pdarray or Strings object. Parameters • dsetName (str) – The name of the dataset (must be the same across all files) • filenames (list or str) – Either a list of filenames or shell expression • strictTypes (bool) – If True (default), require all dtypes in all files to have the same precision and sign. If False, allow dtypes of different precision and sign across different files. For example, if one file contains a uint32 dataset and another contains an int64 dataset, the contents of both will be read into an int64 pdarray. • allow_errors (bool) – Default False, if True will allow files with read errors to be skipped instead of failing. A warning will be included in the return containing the total number of files skipped due to failure and up to 10 filenames. • calc_string_offsets (bool) – Default False, if True this will tell the server to calculate the offsets/segments array on the server versus loading them from HDF5 files. In thefuture this option may be set to True as the default. Returns A pdarray or Strings instance pointing to the server-side data Return type Union[pdarray,Strings] Raises • TypeError – Raised if dsetName is not a str or if filenames is neither a string nor a listof strings • ValueError – Raised if all datasets are not present in all hdf5 files • RuntimeError – Raised if one or more of the specified files cannot be opened See also: get_datasets, ls_hdf , read_all, load, save

28 Chapter 5. Usage arkouda, Release 2020.07.07

Notes

If filenames is a string, it is interpreted as a shell expression (a single filename is a valid expression, soitwill work) and is expanded with glob to read all matching files. Use get_datasets to show the names of datasets in HDF5 files. If dsetName is not present in all files, a TypeError is raised. For convenience, multiple datasets can be read in to create a dictionary of pdarrays. arkouda.read_all(filenames: Union[str, List[str]], datasets: Optional[Union[str, List[str]]] = None, iterative: bool = False, strictTypes: bool = True, allow_errors: bool = False, calc_string_offsets=False) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Mapping[str, Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]] Read datasets from HDF5 files. Parameters • filenames (list or str) – Either a list of filenames or shell expression • datasets (list or str or None) – (List of) name(s) of dataset(s) to read (default: all available) • iterative (bool) – Iterative (True) or Single (False) function call(s) to server • strictTypes (bool) – If True (default), require all dtypes of a given dataset to have the same precision and sign. If False, allow dtypes of different precision and sign across different files. For example, if one file contains a uint32 dataset and another contains an int64 dataset with the same name, the contents of both will be read into an int64 pdarray. • allow_errors (bool) – Default False, if True will allow files with read errors to be skipped instead of failing. A warning will be included in the return containing the total number of files skipped due to failure and up to 10 filenames. • calc_string_offsets (bool) – Default False, if True this will tell the server to calculate the offsets/segments array on the server versus loading them from HDF5 files. In thefuture this option may be set to True as the default. Returns • For a single dataset returns an Arkouda pdarray or Arkouda Strings object • and for multiple datasets returns a dictionary of Arkouda pdarrays or • Arkouda Strings. – Dictionary of {datasetName: pdarray or String} Raises • ValueError – Raised if all datasets are not present in all hdf5 files or if one or more of the specified files do not exist • RuntimeError – Raised if one or more of the specified files cannot be opened. If al- low_errors is true this may be raised if no values are returned from the server. • TypeError – Raised if we receive an unknown arkouda_type returned from the server See also: read_hdf , get_datasets, ls_hdf

5.4. Data I/O 29 arkouda, Release 2020.07.07

Notes

If filenames is a string, it is interpreted as a shell expression (a single filename is a valid expression, soitwill work) and is expanded with glob to read all matching files. If iterative == True each dataset name and file names are passed to the server as independent sequential strings while if iterative == False all dataset names and file names are passed to the server in a single string. If datasets is None, infer the names of datasets from the first file and read all of them. Use get_datasets to show the names of datasets to HDF5 files. HDF5 files can be queried via the server for dataset names and sizes. arkouda.get_datasets(filename: str) → List[str] Get the names of datasets in an HDF5 file. Parameters filename (str) – Name of an HDF5 file visible to the arkouda server Returns Names of the datasets in the file Return type List[str] Raises • TypeError – Raised if filename is not a str • ValueError – Raised if filename is empty or contains only whitespace • RuntimeError – Raised if error occurs in executing ls on an HDF5 file See also: ls_hdf arkouda.ls_hdf(filename: str) → str This function calls the h5ls utility on a filename visible to the arkouda server. Parameters filename (str) – The name of the file to pass to h5ls Returns The string output of h5ls from the server Return type str Raises • TypeError – Raised if filename is not a str • ValueError – Raised if filename is empty or contains only whitespace • RuntimeError – Raised if error occurs in executing ls on an HDF5 file

Persisting pdarray data to disk

Arkouda supports saving pdarrays to HDF5 files. Unfortunately, arkouda does not yet support writing to a single HDF5 file from multiple locales and must create one output file perlocale. arkouda.pdarray.save(self, prefix_path: str, dataset: str = 'array', mode: str = 'truncate') → str Save the pdarray to HDF5. The result is a collection of HDF5 files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the array to its corresponding file. Parameters • prefix_path (str) – Directory and filename prefix that all output files share • dataset (str) – Name of the dataset to create in HDF5 files (must not already exist)

30 Chapter 5. Usage arkouda, Release 2020.07.07

• mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files. Returns Return type string message indicating result of save operation Raises • RuntimeError – Raised if a server-side error is thrown saving the pdarray • ValueError – Raised if there is an error in parsing the prefix path pointing to file write location or if the mode parameter is neither truncate nor append • TypeError – Raised if any one of the prefix_path, dataset, or mode parameters is not a string See also: save_all, load, read_hdf, read_all

Notes

The prefix_path must be visible to the arkouda server and the user must have write permission. Output files have names of the form _LOCALE.hdf, where ranges from 0 to numLocales. If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with thesame name already exists, a RuntimeError will result.

Examples

>>> a= ak.arange(0, 100,1) >>> a.save('arkouda_range', dataset='array')

Array is saved in numLocales files with names like tmp/arkouda_range_LOCALE0.hdf The array can be read back in as follows >>> b= ak.load( 'arkouda_range', dataset='array') >>> (a == b).all() True arkouda.save_all(columns: Union[Mapping[str, arkouda.pdarrayclass.pdarray], List[arkouda.pdarrayclass.pdarray]], prefix_path: str, names: Optional[List[str]] = None, mode: str = 'truncate') → None Save multiple named pdarrays to HDF5 files. Parameters • columns (dict or list of pdarrays) – Collection of arrays to save • prefix_path (str) – Directory and filename prefix for output files • names (list of str) – Dataset names for the pdarrays • mode ({'truncate' | 'append'}) – By default, truncate (overwrite) the output files if they exist. If ‘append’, attempt to create new dataset in existing files. Returns Return type None

5.4. Data I/O 31 arkouda, Release 2020.07.07

Raises ValueError – Raised if (1) the lengths of columns and values differ or (2) the mode is not ‘truncate’ or ‘append’ See also: save, load_all

Notes

Creates one file per locale containing that locale’s chunk of each pdarray. If columns is a dictionary, thekeys are used as the HDF5 dataset names. Otherwise, if no names are supplied, 0-up integers are used. By default, any existing files at path_prefix will be overwritten, unless the user specifies the ‘append’ mode, inwhichcase arkouda will attempt to add as new datasets to existing files. If the wrong number of files is present or dataset names already exist, a RuntimeError is raised.

Loading persisted arrays from disk

These functions allow loading pdarray data persisted with save() and save_all(). arkouda.load(path_prefix: str, dataset: str = 'array', calc_string_offsets: bool = False) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Load a pdarray previously saved with pdarray.save(). Parameters • path_prefix (str) – Filename prefix used to save the original pdarray • dataset (str) – Dataset name where the pdarray was saved, defaults to ‘array’ • calc_string_offsets (bool) – If True the server will ignore Segmented Strings ‘offsets’ array and derive it from the null-byte terminators. Defaults to False currently Returns The pdarray or Strings that was previously saved Return type Union[pdarray, Strings] Raises • TypeError – Raised if either path_prefix or dataset is not astr • ValueError – Raised if the dataset is not present in all hdf5 files or if the path_prefix does not correspond to files accessible to Arkouda • RuntimeError – Raised if the hdf5 files are present but there is an error in opening oneor more of them See also: save, load_all, read_hdf , read_all arkouda.load_all(path_prefix: str) → Mapping[str, Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, arkouda.categorical.Categorical]] Load multiple pdarrays or Strings previously saved with save_all(). Parameters path_prefix (str) – Filename prefix used to save the original pdarray Returns Dictionary of {datsetName: pdarray} with the previously saved pdarrays Return type Mapping[str,pdarray] Raises • TypeError: – Raised if path_prefix is not a str

32 Chapter 5. Usage arkouda, Release 2020.07.07

• ValueError – Raised if all datasets are not present in all hdf5 files or if the path_prefix does not correspond to files accessible to Arkouda • RuntimeError – Raised if the hdf5 files are present but there is an error in opening oneor more of them See also: save_all, load, read_hdf , read_all

5.5 Arithmetic and Numeric Operations

5.5.1 Vector and Scalar Arithmetic

A large subset of Python’s binary and in-place operators are supported on pdarray objects. Where supported, the behavior of these operators is identical to that of NumPy ndarray objects. >>> A= ak.arange(10) >>> A+=2 >>> A array([2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) >>> A+A array([4, 6, 8, 10, 12, 14, 16, 18, 20, 22]) >>> 2*A array([4, 6, 8, 10, 12, 14, 16, 18, 20, 22]) >>> A ==A array([True, True, True, True, True, True, True, True, True, True])

Operations that are not implemented will raise a RuntimeError. In-place operations that would change the dtype of the pdarray are not implemented.

5.5.2 Element-wise Functions

Arrays support several mathematical functions that operate element-wise and return a pdarray of the same length. arkouda.abs(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise absolute value of the array. Parameters pda (pdarray)– Returns A pdarray containing absolute values of the input array elements Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

5.5. Arithmetic and Numeric Operations 33 arkouda, Release 2020.07.07

Examples

>>> ak.abs(ak.arange(-5,-1)) array([5, 4, 3, 2])

>>> ak.abs(ak.linspace(-5,-1,5)) array([5, 4, 3, 2, 1]) arkouda.log(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise natural log of the array. Parameters pda (pdarray)– Returns A pdarray containing natural log values of the input array elements Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

Notes

Logarithms with other bases can be computed as follows:

Examples

>>> A= ak.array([1, 10, 100]) # Natural log >>> ak.log(A) array([0, 2.3025850929940459, 4.6051701859880918]) # Log base 10 >>> ak.log(A)/ np.log(10) array([0, 1, 2]) # Log base 2 >>> ak.log(A)/ np.log(2) array([0, 3.3219280948873626, 6.6438561897747253]) arkouda.exp(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise exponential of the array. Parameters pda (pdarray)– Returns A pdarray containing exponential values of the input array elements Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

34 Chapter 5. Usage arkouda, Release 2020.07.07

Examples

>>> ak.exp(ak.arange(1,5)) array([2.7182818284590451, 7.3890560989306504, 20.085536923187668, 54.

˓→598150033144236])

>>> ak.exp(ak.uniform(5,1.0,5.0)) array([11.84010843172504, 46.454368507659211, 5.5571769623557188, 33.494295836924771, 13.478894913238722]) arkouda.sin(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise sine of the array. Parameters pda (pdarray)– Returns A pdarray containing sin for each element of the original pdarray Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray arkouda.cos(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise cosine of the array. Parameters pda (pdarray)– Returns A pdarray containing cosine for each element of the original pdarray Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

5.5.3 Scans

Scans perform a cumulative reduction over a pdarray, returning a pdarray of the same size. arkouda.cumsum(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the cumulative sum over the array. The sum is inclusive, such that the i th element of the result is the sum of elements up to and including i. Parameters pda (pdarray)– Returns A pdarray containing cumulative sums for each element of the original pdarray Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.cumsum(ak.arange([1,5])) array([1, 3, 6])

>>> ak.cumsum(ak.uniform(5,1.0,5.0)) array([3.1598310770203937, 5.4110385860243131, 9.1622479306453748, 12.710615785506533, 13.945880905466208])

5.5. Arithmetic and Numeric Operations 35 arkouda, Release 2020.07.07

>>> ak.cumsum(ak.randint(0,1,5, dtype=ak.bool)) array([0, 1, 1, 2, 3])

arkouda.cumprod(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the cumulative product over the array. The product is inclusive, such that the i th element of the result is the product of elements up to and including i. Parameters pda (pdarray)– Returns A pdarray containing cumulative products for each element of the original pdarray Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.cumprod(ak.arange(1,5)) array([1, 2, 6, 24]))

>>> ak.cumprod(ak.uniform(5,1.0,5.0)) array([1.5728783400481925, 7.0472855509390593, 33.78523998586553, 134.05309592737584, 450.21589865655358])

5.5.4 Reductions

Reductions return a scalar value. arkouda.any(pda: arkouda.pdarrayclass.pdarray) → numpy.bool_ Return True iff any element of the array evaluates to True. Parameters pda (pdarray) – The pdarray instance to be evaluated Returns Indicates if 1..n pdarray elements evaluate to True Return type bool Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.all(pda: arkouda.pdarrayclass.pdarray) → numpy.bool_ Return True iff all elements of the array evaluate to True. Parameters pda (pdarray) – The pdarray instance to be evaluated Returns Indicates if all pdarray elements evaluate to True Return type bool Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.is_sorted(pda: arkouda.pdarrayclass.pdarray) → numpy.bool_ Return True iff the array is monotonically non-decreasing.

36 Chapter 5. Usage arkouda, Release 2020.07.07

Parameters pda (pdarray) – The pdarray instance to be evaluated Returns Indicates if the array is monotonically non-decreasing Return type bool Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.sum(pda: arkouda.pdarrayclass.pdarray) → numpy.float64 Return the sum of all elements in the array. Parameters pda (pdarray) – Values for which to calculate the sum Returns The sum of all elements in the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.prod(pda: arkouda.pdarrayclass.pdarray) → numpy.float64 Return the product of all elements in the array. Return value is always a np.float64 or np.int64 Parameters pda (pdarray) – Values for which to calculate the product Returns The product calculated from the pda Return type numpy_scalars Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.min(pda: arkouda.pdarrayclass.pdarray) → Union[numpy.float64, numpy.int64, bool, numpy.uint8, str, numpy.str_] Return the minimum value of the array. Parameters pda (pdarray) – Values for which to calculate the min Returns The min calculated from the pda Return type numpy_scalars Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.max(pda: arkouda.pdarrayclass.pdarray) → Union[numpy.float64, numpy.int64, bool, numpy.uint8, str, numpy.str_] Return the maximum value of the array. Parameters pda (pdarray) – Values for which to calculate the max Returns The max calculated from the pda Return type numpy_scalars Raises

5.5. Arithmetic and Numeric Operations 37 arkouda, Release 2020.07.07

• TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.argmin(pda: arkouda.pdarrayclass.pdarray) → numpy.int64 Return the index of the first occurrence of the array min value. Parameters pda (pdarray) – Values for which to calculate the argmin Returns The index of the argmin calculated from the pda Return type np.int64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.argmax(pda: arkouda.pdarrayclass.pdarray) → numpy.int64 Return the index of the first occurrence of the array max value. Parameters pda (pdarray) – Values for which to calculate the argmax Returns The index of the argmax calculated from the pda Return type np.int64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.mean(pda: arkouda.pdarrayclass.pdarray) → numpy.float64 Return the mean of the array. Parameters pda (pdarray) – Values for which to calculate the mean Returns The mean calculated from the pda sum and size Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.var(pda: arkouda.pdarrayclass.pdarray, ddof: Union[int, numpy.int64] = 0) → numpy.float64 Return the variance of values in the array. Parameters • pda (pdarray) – Values for which to calculate the variance • ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating var Returns The scalar variance of the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • ValueError – Raised if the ddof >= pdarray size • RuntimeError – Raised if there’s a server-side error thrown

38 Chapter 5. Usage arkouda, Release 2020.07.07

See also: mean, std

Notes

The variance is the average of the squared deviations from the mean, i.e., var = mean((x - x.mean())**2). The mean is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. arkouda.std(pda: arkouda.pdarrayclass.pdarray, ddof: Union[int, numpy.int64] = 0) → numpy.float64 Return the standard deviation of values in the array. The standard deviation is implemented as the square root of the variance. Parameters • pda (pdarray) – values for which to calculate the standard deviation • ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std Returns The scalar standard deviation of the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance or ddof is not an integer • ValueError – Raised if ddof is an integer < 0 • RuntimeError – Raised if there’s a server-side error thrown See also: mean, var

Notes

The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean((x - x.mean())**2)). The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se. arkouda.mink(pda: arkouda.pdarrayclass.pdarray, k: Union[int, numpy.int64]) → arkouda.pdarrayclass.pdarray Find the k minimum values of an array. Returns the smallest k values of an array, sorted Parameters • pda (pdarray) – Input array. • k (int_scalars) – The desired count of minimum values to be returned by the output. Returns The minimum k values from pda, sorted

5.5. Arithmetic and Numeric Operations 39 arkouda, Release 2020.07.07

Return type pdarray Raises • TypeError – Raised if pda is not a pdarray • ValueError – Raised if the pda is empty or k < 1

Notes

This call is equivalent in value to: a[ak.argsort(a)[:k]] and generally outperforms this operation. This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degredation has been observed.

Examples

>>> A= ak.array([10,5,1,3,7,2,9,0]) >>> ak.mink(A,3) array([0, 1, 2]) >>> ak.mink(A,4) array([0, 1, 2, 3]) arkouda.maxk(pda: arkouda.pdarrayclass.pdarray, k: Union[int, numpy.int64]) → arkouda.pdarrayclass.pdarray Find the k maximum values of an array. Returns the largest k values of an array, sorted Parameters • pda (pdarray) – Input array. • k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda, sorted Return type pdarray, int Raises • TypeError – Raised if pda is not a pdarray or k is not an integer • ValueError – Raised if the pda is empty or k < 1

Notes

This call is equivalent in value to: a[ak.argsort(a)[k:]] and generally outperforms this operation. This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degredation has been observed.

40 Chapter 5. Usage arkouda, Release 2020.07.07

Examples

>>> A= ak.array([10,5,1,3,7,2,9,0]) >>> ak.maxk(A,3) array([7, 9, 10]) >>> ak.maxk(A,4) array([5, 7, 9, 10]) arkouda.argmink(pda: arkouda.pdarrayclass.pdarray, k: Union[int, numpy.int64]) → arkouda.pdarrayclass.pdarray Finds the indices corresponding to the k minimum values of an array. Parameters • pda (pdarray) – Input array. • k (int_scalars) – The desired count of indices corresponding to minimum array values Returns The indices of the minimum k values from the pda, sorted Return type pdarray, int Raises • TypeError – Raised if pda is not a pdarray or k is not an integer • ValueError – Raised if the pda is empty or k < 1

Notes

This call is equivalent in value to: ak.argsort(a)[:k] and generally outperforms this operation. This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degradation has been observed.

Examples

>>> A= ak.array([10,5,1,3,7,2,9,0]) >>> ak.argmink(A,3) array([7, 2, 5]) >>> ak.argmink(A,4) array([7, 2, 5, 3]) arkouda.argmaxk(pda: arkouda.pdarrayclass.pdarray, k: Union[int, numpy.int64]) → arkouda.pdarrayclass.pdarray Find the indices corresponding to the k maximum values of an array. Returns the largest k values of an array, sorted Parameters • pda (pdarray) – Input array. • k (int_scalars) – The desired count of indices corresponding to maxmum array values Returns The indices of the maximum k values from the pda, sorted

5.5. Arithmetic and Numeric Operations 41 arkouda, Release 2020.07.07

Return type pdarray, int Raises • TypeError – Raised if pda is not a pdarray or k is not an integer • ValueError – Raised if the pda is empty or k < 1

Notes

This call is equivalent in value to: ak.argsort(a)[k:] and generally outperforms this operation. This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degradation has been observed.

Examples

>>> A= ak.array([10,5,1,3,7,2,9,0]) >>> ak.argmaxk(A,3) array([4, 6, 0]) >>> ak.argmaxk(A,4) array([1, 4, 6, 0])

5.5.5 Where

The where function is a way to multiplex two pdarray (or a pdarray and a scalar) based on a condition: arkouda.where(condition: arkouda.pdarrayclass.pdarray, A: Union[float, numpy.float64, int, numpy.int64, arkouda.pdarrayclass.pdarray], B: Union[float, numpy.float64, int, numpy.int64, arkouda.pdarrayclass.pdarray]) → arkouda.pdarrayclass.pdarray Returns an array with elements chosen from A and B based upon a conditioning array. As is the case with numpy.where, the return array consists of values from the first array (A) where the conditioning array elements are True and from the second array (B) where the conditioning array elements are False. Parameters • condition (pdarray) – Used to choose values from A or B • A (Union[numeric_scalars, pdarray]) – Value(s) used when condition is True • B (Union[numeric_scalars, pdarray]) – Value(s) used when condition is False Returns Values chosen from A where the condition is True and B where the condition is False Return type pdarray Raises • TypeError – Raised if the condition object is not a pdarray, if A or B is not an int, np.int64, float, np.float64, or pdarray, if pdarray dtypes are not supported or do not match, ormultiple condition clauses (see Notes section) are applied • ValueError – Raised if the shapes of the condition, A, and B pdarrays are unequal

42 Chapter 5. Usage arkouda, Release 2020.07.07

Examples

>>> a1= ak.arange(1,10) >>> a2= ak.ones(9, dtype=np.int64) >>> cond= a1<5 >>> ak.where(cond,a1,a2) array([1, 2, 3, 4, 1, 1, 1, 1, 1])

>>> a1= ak.arange(1,10) >>> a2= ak.ones(9, dtype=np.int64) >>> cond= a1 ==5 >>> ak.where(cond,a1,a2) array([1, 1, 1, 1, 5, 1, 1, 1, 1])

>>> a1= ak.arange(1,10) >>> a2= 10 >>> cond= a1<5 >>> ak.where(cond,a1,a2) array([1, 2, 3, 4, 10, 10, 10, 10, 10])

Notes

A and B must have the same dtype and only one conditional clause is supported e.g., n < 5, n > 1, which is supported in numpy is not currently supported in Arkouda

5.6 Indexing and Assignment

Arkouda pdarray objects support the same indexing and assignment syntax as rank-1 NumPy arrays.

5.6.1 Integer

Indexing and assigment with a single integer work the same as in Python. >>> A= ak.arange(0, 10,1) >>> A[5] 5 >>> A[5]= 42 >>> A[5] 42

5.6. Indexing and Assignment 43 arkouda, Release 2020.07.07

5.6.2 Slice

Indexing and assignment are also supported via Python-like slices. A Python slice has a start (inclusive), stop (exclu- sive), and stride. All three of these parameters can be implied; the default start is the beginning of the array (0 for positive strides, -1 for negative), the default stop is the end of the array (len for positive strides, -1 for negative), and the default stride is 1. >>> A= ak.arange(0, 10,1) >>> A[2:6] array([2, 3, 4, 5]) >>> A[::2] array([0, 2, 4, 6, 8]) >>> A[3::-1] array([3, 2, 1, 0]) >>> A[1::2]= ak.zeros(5) >>> A array([0, 0, 2, 0, 4, 0, 6, 0, 8, 0])

5.6.3 Gather/Scatter (pdarray)

Gather and scatter operations can be expressed using a pdarray as an index to another pdarray.

Integer pdarray index

With an integer pdarray, you can gather a list of indices from the target array. The indices can be out of order and non-unique. For assignment, the right-hand side must be a pdarray the same size as the index array. >>> A= ak.arange(10, 20,1) >>> inds= ak.array([8,2,5]) >>> A[inds] array([18, 12, 15]) >>> A[inds]= ak.zeros(3) >>> A array([10, 11, 0, 13, 14, 0, 16, 17, 0, 19])

Logical indexing

Logical indexing is a powerful construct from NumPy (and Matlab). In logical indexing, the index must be a pdarray of type bool that is the same size as the outer pdarray being indexed. The indexing only touches those elements of the outer pdarray where the corresponding element of the index pdarray is True. >>> A= ak.arange(0, 10,1) >>> inds= ak.zeros(10, dtype=ak.bool) >>> inds[2]= True >>> inds[5]= True >>> A[inds] # boolean-compression indexing values where inds is True array([2, 5]) .. >>> A[inds]= 42 # boolean-expansion indexing with scalar sets values where inds is True >>> A (continues on next page)

44 Chapter 5. Usage arkouda, Release 2020.07.07

(continued from previous page) array([0, 1, 42, 3, 4, 42, 6, 7, 8, 9]) .. >>> B= ak.arange(0, 10,1) >>> lim= 10//2 >>> B[B< lim]= B[:lim]*-1 # boolean-expansion indexing with array sets values where␣

˓→True >>> B array([0, -1, -2, -3, -4, 5, 6, 7, 8, 9])

5.7 Summarizing Data

5.7.1 Descriptive Statistics

Simple descriptive statistics are available as reduction methods on pdarray objects. >>> A= ak.randint(-10, 11, 1000) >>> A.min() -10 >>> A.max() 10 >>> A.sum() 13 >>> A.mean() 0.013 >>> A.var() 36.934176000000015 >>> A.std() 6.07734942223993

The list of reductions supported on pdarray objects is: pdarray.any() → numpy.bool_ Return True iff any element of the array evaluates to True. pdarray.all() → numpy.bool_ Return True iff all elements of the array evaluate to True. pdarray.is_sorted() → numpy.bool_ Return True iff the array is monotonically non-decreasing. Parameters None – Returns Indicates if the array is monotonically non-decreasing Return type bool Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown pdarray.sum() → Union[numpy.float64, numpy.int64, bool, numpy.uint8, str, numpy.str_] Return the sum of all elements in the array.

5.7. Summarizing Data 45 arkouda, Release 2020.07.07 pdarray.prod() → numpy.float64 Return the product of all elements in the array. Return value is always a np.float64 or np.int64. pdarray.min() → Union[numpy.float64, numpy.int64, bool, numpy.uint8, str, numpy.str_] Return the minimum value of the array. pdarray.max() → Union[numpy.float64, numpy.int64, bool, numpy.uint8, str, numpy.str_] Return the maximum value of the array. pdarray.argmin() → numpy.int64 Return the index of the first occurrence of the array min value pdarray.argmax() → numpy.int64 Return the index of the first occurrence of the array max value. pdarray.mean() → numpy.float64 Return the mean of the array. pdarray.var(ddof: Union[int, numpy.int64] = 0) → numpy.float64 Compute the variance. See arkouda.var for details. Parameters ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating var Returns The scalar variance of the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • ValueError – Raised if the ddof >= pdarray size • RuntimeError – Raised if there’s a server-side error thrown pdarray.std(ddof: Union[int, numpy.int64] = 0) → numpy.float64 Compute the standard deviation. See arkouda.std for details. Parameters ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std Returns The scalar standard deviation of the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown pdarray.mink(k: Union[int, numpy.int64]) → arkouda.pdarrayclass.pdarray Compute the minimum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray pdarray.maxk(k: Union[int, numpy.int64]) → arkouda.pdarrayclass.pdarray Compute the maximum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda

46 Chapter 5. Usage arkouda, Release 2020.07.07

Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray pdarray.argmink(k: Union[int, numpy.int64]) → arkouda.pdarrayclass.pdarray Compute the minimum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns Indices corresponding to the maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray pdarray.argmaxk(k: Union[int, numpy.int64]) → arkouda.pdarrayclass.pdarray Finds the indices corresponding to the maximum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns Indices corresponding to the maximum k values, sorted Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray

5.7.2 Histogram

Arkouda can compute simple histograms on pdarray data. Currently, this function can only create histograms over evenly spaced bins between the min and max of the data. In the future, we plan to support using a pdarray to define custom bin edges. arkouda.histogram(pda: arkouda.pdarrayclass.pdarray, bins: Union[int, numpy.int64] = 10) → arkouda.pdarrayclass.pdarray Compute a histogram of evenly spaced bins over the range of an array. Parameters • pda (pdarray) – The values to histogram • bins (int_scalars) – The number of equal-size bins to use (default: 10) Returns The number of values present in each bin Return type pdarray, int64 or float64 Raises • TypeError – Raised if the parameter is not a pdarray or if bins is not an int. • ValueError – Raised if bins < 1 • NotImplementedError – Raised if pdarray dtype is bool or uint8 See also: value_counts

5.7. Summarizing Data 47 arkouda, Release 2020.07.07

Notes

The bins are evenly spaced in the interval [pda.min(), pda.max()]. Currently, the user must re-compute the bin edges, e.g. with np.linspace (see below) in order to plot the histogram.

Examples

>>> import .pyplot as plt >>> A= ak.arange(0, 10,1) >>> nbins=3 >>> h= ak.histogram(A, bins=nbins) >>> h array([3, 3, 4]) # Recreate the bin edges in NumPy >>> binEdges= np.linspace(A.min(), A.max(), nbins+1) >>> binEdges array([0., 3., 6., 9.]) # To plot, use only the left edges, and export the histogram to NumPy >>> plt.plot(binEdges[:-1], h.to_ndarray())

Since the histogram function currently does not return the bin edges, only the counts, the user can recreate the bin edges (e.g. for plotting) using: >>> binEdges= np.linspace(myarray.min(), myarray.max(), nbins+1)

5.7.3 Value Counts

For int64 pdarray objects, it is often useful to count only the unique values that appear. This function finds all unique values and their counts. arkouda.value_counts(pda: arkouda.pdarrayclass.pdarray) → Union[Categorical, Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings], Optional[arkouda.pdarrayclass.pdarray]]] Count the occurrences of the unique values of an array. Parameters pda (pdarray, int64) – The array of values to count Returns • unique_values (pdarray, int64 or Strings) – The unique values, sorted in ascending order • counts (pdarray, int64) – The number of times the corresponding unique value occurs Raises TypeError – Raised if the parameter is not a pdarray See also: unique, histogram

48 Chapter 5. Usage arkouda, Release 2020.07.07

Notes

This function differs from histogram() in that it only returns counts for values that are present, leaving out empty “bins”. This function delegates all logic to the unique() method where the return_counts parameter is set to True.

Examples

>>> A= ak.array([2,0,2,4,0,0]) >>> ak.value_counts(A) (array([0, 2, 4]), array([3, 2, 1]))

5.8 Sorting

Note: The sorting in arkouda is currently optimized for a Cray interconnect with a high message rate. For now, sorting runs slowly on Infiniband because of the lower message rate, but upcoming changes to the Chapel runtime involving message buffering should greatly improve sorting speed. arkouda.argsort(pda: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]) → arkouda.pdarrayclass.pdarray Return the permutation that sorts the array. Parameters pda (pdarray or Strings or Categorical) – The array to sort (int64 or float64) Returns The indices such that pda[indices] is sorted Return type pdarray, int64 Raises TypeError – Raised if the parameter is other than a pdarray or Strings See also: coargsort

Notes

Uses a least-significant-digit radix sort, which is stable and resilient to non-uniformity in data but communication intensive.

Examples

>>> a= ak.randint(0, 10, 10) >>> perm= ak.argsort(a) >>> a[perm] array([0, 1, 1, 3, 4, 5, 7, 8, 8, 9]) arkouda.coargsort(arrays: Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]]) → arkouda.pdarrayclass.pdarray Return the permutation that groups the rows (left-to-right), if the input arrays are treated as columns. The permu- tation sorts numeric columns, but not strings/Categoricals – strings/Categoricals are grouped, but not ordered. Parameters arrays (Sequence[Union[Strings, pdarray, Categorical]]) – The columns (int64, float64, Strings, or Categorical) to sort by row

5.8. Sorting 49 arkouda, Release 2020.07.07

Returns The indices that permute the rows to grouped order Return type pdarray, int64 Raises ValueError – Raised if the pdarrays are not of the same size or if the parameter is not an Iterable containing pdarrays, Strings, or Categoricals See also: argsort

Notes

Uses a least-significant-digit radix sort, which is stable and resilient to non-uniformity in data but communication intensive. Starts with the last array and moves forward. This sort operates directly on numeric types, but for Strings, it operates on a hash. Thus, while grouping of equivalent strings is guaranteed, lexicographic ordering of the groups is not. For Categoricals, coargsort sorts based on Categorical.codes which guarantees grouping of equivalent categories but not lexicographic ordering of those groups.

Examples

>>> a= ak.array([0,1,0,1]) >>> b= ak.array([1,1,0,0]) >>> perm= ak.coargsort([a, b]) >>> perm array([2, 0, 3, 1]) >>> a[perm] array([0, 0, 1, 1]) >>> b[perm] array([0, 1, 0, 1])

5.9 Array Set Operations

Following numpy.lib.arraysetops, arkouda supports parallel, distributed set operations using pdarray objects. The unique function effectively converts a pdarray to a set: arkouda.unique(pda: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], return_counts: bool = False) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical, Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], Optional[arkouda.pdarrayclass.pdarray]]] Find the unique elements of an array. Returns the unique elements of an array, sorted if the values are integers. There is an optional output in addition to the unique elements: the number of times each unique value comes up in the input array. Parameters • pda (pdarray or Strings or Categorical) – Input array. • return_counts (bool, optional) – If True, also return the number of times each unique item appears in pda. Returns

50 Chapter 5. Usage arkouda, Release 2020.07.07

• unique (pdarray or Strings) – The unique values. If input dtype is int64, return values will be sorted. • unique_counts (pdarray, optional) – The number of times each of the unique values comes up in the original array. Only provided if return_counts is True. Raises • TypeError – Raised if pda is not a pdarray or Strings object • RuntimeError – Raised if the pdarray or Strings dtype is unsupported

Notes

For integer arrays, this function checks to see whether pda is sorted and, if so, whether it is already unique. This step can save considerable computation. Otherwise, this function will sort pda.

Examples

>>> A= ak.array([3,2,1,1,2,3]) >>> ak.unique(A) array([1, 2, 3]) arkouda.in1d(pda1: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], pda2: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], invert: bool = False) → arkouda.pdarrayclass.pdarray Test whether each element of a 1-D array is also present in a second array. Returns a boolean array the same length as pda1 that is True where an element of pda1 is in pda2 and False otherwise. Parameters • pda1 (pdarray or Strings or Categorical) – Input array. • pda2 (pdarray or Strings or Categorical) – The values against which to test each value of pda1. Must be the same type as pda1. • invert (bool, optional) – If True, the values in the returned array are inverted (that is, False where an element of pda1 is in pda2 and True otherwise). Default is False. ak. in1d(a, b, invert=True) is equivalent to (but is faster than) ~ak.in1d(a, b). Returns The values pda1[in1d] are in pda2. Return type pdarray, bool Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray, Strings, or Categorical object or if invert is not a bool • RuntimeError – Raised if the dtype of either array is not supported See also: unique, intersect1d, union1d

5.9. Array Set Operations 51 arkouda, Release 2020.07.07

Notes

in1d can be considered as an element-wise function version of the python keyword in, for 1-D sequences. in1d(a, b) is logically equivalent to ak.array([item in b for item in a]), but is much faster and scales to arbitrarily large a. ak.in1d is not supported for bool or float64 pdarrays

Examples

>>> ak.in1d(ak.array([-1,0,1]), ak.array([-2,0,2])) array([False, True, False])

>>> ak.in1d(ak.array(['one','two']),ak.array(['two', 'three','four','five'])) array([False, True]) arkouda.union1d(pda1: arkouda.pdarrayclass.pdarray, pda2: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Find the union of two arrays. Return the unique, of values that are in either of the two input arrays. Parameters • pda1 (pdarray) – Input array • pda2 (pdarray) – Input array Returns Unique, sorted union of the input arrays. Return type pdarray Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray • RuntimeError – Raised if the dtype of either array is not supported See also: intersect1d, unique

Notes

ak.union1d is not supported for bool or float64 pdarrays

Examples

>>> ak.union1d(ak.array([-1,0,1]), ak.array([-2,0,2])) array([-2, -1, 0, 1, 2]) arkouda.intersect1d(pda1: arkouda.pdarrayclass.pdarray, pda2: arkouda.pdarrayclass.pdarray, assume_unique: bool = False) → arkouda.pdarrayclass.pdarray Find the intersection of two arrays. Return the sorted, unique values that are in both of the input arrays. Parameters

52 Chapter 5. Usage arkouda, Release 2020.07.07

• pda1 (pdarray) – Input array • pda2 (pdarray) – Input array • assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False. Returns Sorted 1D array of common and unique elements. Return type pdarray Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray • RuntimeError – Raised if the dtype of either pdarray is not supported See also: unique, union1d

Notes

ak.intersect1d is not supported for bool or float64 pdarrays

Examples

>>> ak.intersect1d([1,3,4,3], [3,1,2,1]) array([1, 3]) arkouda.setdiff1d(pda1: arkouda.pdarrayclass.pdarray, pda2: arkouda.pdarrayclass.pdarray, assume_unique: bool = False) → arkouda.pdarrayclass.pdarray Find the set difference of two arrays. Return the sorted, unique values in pda1 that are not in pda2. Parameters • pda1 (pdarray) – Input array. • pda2 (pdarray) – Input comparison array. • assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False. Returns Sorted 1D array of values in pda1 that are not in pda2. Return type pdarray Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray • RuntimeError – Raised if the dtype of either pdarray is not supported See also: unique, setxor1d

5.9. Array Set Operations 53 arkouda, Release 2020.07.07

Notes

ak.setdiff1d is not supported for bool or float64 pdarrays

Examples

>>> a= ak.array([1,2,3,2,4,1]) >>> b= ak.array([3,4,5,6]) >>> ak.setdiff1d(a, b) array([1, 2]) arkouda.setxor1d(pda1: arkouda.pdarrayclass.pdarray, pda2: arkouda.pdarrayclass.pdarray, assume_unique: bool = False) → arkouda.pdarrayclass.pdarray Find the set exclusive-or (symmetric difference) of two arrays. Return the sorted, unique values that are in only one (not both) of the input arrays. Parameters • pda1 (pdarray) – Input array. • pda2 (pdarray) – Input array. • assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False. Returns Sorted 1D array of unique values that are in only one of the input arrays. Return type pdarray Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray • RuntimeError – Raised if the dtype of either pdarray is not supported

Notes

ak.setxor1d is not supported for bool or float64 pdarrays

Examples

>>> a= ak.array([1,2,3,2,4]) >>> b= ak.array([2,3,5,7,5]) >>> ak.setxor1d(a,b) array([1, 4, 5, 7])

54 Chapter 5. Usage arkouda, Release 2020.07.07

5.10 GroupBy

The groupby-aggregate pattern is the workhorse operation in many data science applications, such as feature extraction and graph construction. It relies on argsort() to group an array of keys and then perform aggregations on other arrays of values. For example, imagine a dataset with two columns, userID and dayOfWeek. The following groupby-aggregate opera- tion would show how many user IDs were active on each day of the week: # Note: The GroupBy arg should be the values of the dayOfWeek column # and must be an Arkouda compatible data i.e. `pdarray` byDayOfWeek= ak.GroupBy(data[ 'dayOfWeek']) day, numIDs= byDayOfWeek.aggregate(userID, 'nunique')

class arkouda.GroupBy(keys: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical, Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]]], assume_sorted: bool = False, hash_strings: bool = True) Group an array or list of arrays by value, usually in preparation for aggregating the within-group values of another array. Parameters • keys ((list of ) pdarray, int64, Strings, or Categorical) – The array to group by value, or if list, the column arrays to group by row • assume_sorted (bool) – If True, assume keys is already sorted (Default: False) nkeys The number of key arrays (columns) Type int size The length of the input array(s), i.e. number of rows Type int permutation The permutation that sorts the keys array(s) by value (row) Type pdarray unique_keys The unique values of the keys array(s), in grouped order Type (list of) pdarray, Strings, or Categorical ngroups The length of the unique_keys array(s), i.e. number of groups Type int segments The start index of each group in the grouped array(s) Type pdarray logger Used for all logging operations Type ArkoudaLogger

5.10. GroupBy 55 arkouda, Release 2020.07.07

Raises TypeError – Raised if keys is a pdarray with a dtype other than int64

Notes

Only accepts (list of) pdarrays of int64 dtype, Strings, or Categorical. AND(values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Bitwise AND of values in each segment. Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise AND reduction on each group. Parameters values (pdarray, int64) – The values to group and reduce with AND Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • result (pdarray, int64) – Bitwise AND of values in segments corresponding to keys Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64 • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if all is not supported for the values dtype OR(values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Bitwise OR of values in each segment. Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise OR reduction on each group. Parameters values (pdarray, int64) – The values to group and reduce with OR Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • result (pdarray, int64) – Bitwise OR of values in segments corresponding to keys Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64 • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if all is not supported for the values dtype XOR(values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Bitwise XOR of values in each segment. Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise XOR reduction on each group. Parameters values (pdarray, int64) – The values to group and reduce with XOR Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

56 Chapter 5. Usage arkouda, Release 2020.07.07

• result (pdarray, int64) – Bitwise XOR of values in segments corresponding to keys Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64 • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if all is not supported for the values dtype aggregate(values: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical, Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]]], operator: str, skipna: bool = True) → Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical, Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and apply a reduction to each group’s values. Parameters • values (pdarray) – The values to group and reduce • operator (str) – The name of the reduction operator to use Returns • unique_keys (groupable) – The unique keys, in grouped order • aggregates (groupable) – One aggregate value per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if the requested operator is not supported for the values dtype

Examples

>>> keys= ak.arange(0, 10) >>> vals= ak.linspace(-1,1, 10) >>> g= ak.GroupBy(keys) >>> g.aggregate(vals, 'sum') (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([-1, -0.77777777777777768, -0.55555555555555536, -0.33333333333333348, -0.11111111111111116, 0.11111111111111116, 0.33333333333333348, 0.55555555555555536, 0.

˓→77777777777777768, 1])) >>> g.aggregate(vals, 'min') (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([-1, -0.77777777777777779, -0.55555555555555558, -0.33333333333333337, -0.11111111111111116, 0.

˓→11111111111111116, 0.33333333333333326, 0.55555555555555536, 0.77777777777777768, 1]))

all(values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and perform an “and” reduction on each group.

5.10. GroupBy 57 arkouda, Release 2020.07.07

Parameters values (pdarray, bool) – The values to group and reduce with “and” Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_any (pdarray, bool) – One bool per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not bool • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if all is not supported for the values dtype any(values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and perform an “or” reduction on each group. Parameters values (pdarray, bool) – The values to group and reduce with “or” Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_any (pdarray, bool) – One bool per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not bool • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array argmax(values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical, Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the location of the first maximum of each group’s values. Parameters values (pdarray) – The values to group and find argmax Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_argmaxima (pdarray, int64) – One index per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object or if argmax is not supported for the values dtype • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

58 Chapter 5. Usage arkouda, Release 2020.07.07

Notes

The returned indices refer to the original values array as passed in, not the permutation applied by the GroupBy instance.

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.argmax(b) (array([2, 3, 4]), array([9, 3, 2]))

argmin(values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical, Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the location of the first minimum of each group’s values. Parameters values (pdarray) – The values to group and find argmin Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_argminima (pdarray, int64) – One index per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object or if argmax is not supported for the values dtype • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if argmin is not supported for the values dtype

Notes

The returned indices refer to the original values array as passed in, not the permutation applied by the GroupBy instance.

5.10. GroupBy 59 arkouda, Release 2020.07.07

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.argmin(b) (array([2, 3, 4]), array([5, 4, 2]))

broadcast(values: arkouda.pdarrayclass.pdarray, permute: bool = True) → arkouda.pdarrayclass.pdarray Fill each group’s segment with a constant value. Parameters • values (pdarray) – The values to put in each group’s segment • permute (bool) – If True (default), permute broadcast values back to the ordering of the original array on which GroupBy was called. If False, the broadcast values are grouped by value. Returns The broadcast values Return type pdarray Raises • TypeError – Raised if value is not a pdarray object • ValueError – Raised if the values array does not have one value per segment

Notes

This function is a sparse analog of np.broadcast. If a GroupBy object represents a (tensor), then this function takes a (dense) column vector and replicates each value to the non-zero elements in the corresponding row.

Examples

>>> a= ak.array([0,1,0,1,0]) >>> values= ak.array([3,5]) >>> g= ak.GroupBy(a) # By default, result is in original order >>> g.broadcast(values) array([3, 5, 3, 5, 3])

# With permute=False, result is in grouped order >>> g.broadcast(values, permute=False) array([3, 3, 3, 5, 5] >>> a= ak.randint(1,5,10) >>> a (continues on next page)

60 Chapter 5. Usage arkouda, Release 2020.07.07

(continued from previous page) array([3, 1, 4, 4, 4, 1, 3, 3, 2, 2]) >>> g= ak.GroupBy(a) >>> keys,counts=g.count() >>> g.broadcast(counts>2) array([True False True True True False True True False False]) >>> g.broadcast(counts ==3) array([True False True True True False True True False False]) >>> g.broadcast(counts<4) array([True True True True True True True True True True])

count() → Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical, Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]]], arkouda.pdarrayclass.pdarray] Count the number of elements in each group, i.e. the number of times each key appears. Parameters none – Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • counts (pdarray, int64) – The number of times each unique key appears

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 2, 3, 1, 2, 4, 3, 4, 3, 4]) >>> g= ak.GroupBy(a) >>> keys,counts=g.count() >>> keys array([1, 2, 3, 4]) >>> counts array([1, 2, 4, 3])

max(values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical, Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the maxi- mum of each group’s values. Parameters values (pdarray) – The values to group and find Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_maxima (pdarray) – One maximum per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object or if max is not supported for the values dtype • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

5.10. GroupBy 61 arkouda, Release 2020.07.07

• RuntimeError – Raised if max is not supported for the values dtype

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.max(b) (array([2, 3, 4]), array([4, 4, 3]))

mean(values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical, Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and compute the mean of each group’s values. Parameters values (pdarray) – The values to group and average Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_means (pdarray, float64) – One mean value per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The return dtype is always float64.

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) (continues on next page)

62 Chapter 5. Usage arkouda, Release 2020.07.07

(continued from previous page) >>> g.mean(b) (array([2, 3, 4]), array([2.6666666666666665, 2.7999999999999998, 3]))

min(values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical, Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the mini- mum of each group’s values. Parameters values (pdarray) – The values to group and find minima Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_minima (pdarray) – One minimum per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object or if min is not supported for the values dtype • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if min is not supported for the values dtype

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.min(b) (array([2, 3, 4]), array([1, 1, 3]))

nunique(values: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical, Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]]]) → Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical, Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the number of unique values in each group. Parameters values (pdarray, int64) – The values to group and find unique values Returns • unique_keys (groupable) – The unique keys, in grouped order • group_nunique (groupable) – Number of unique values per unique key in the GroupBy instance

5.10. GroupBy 63 arkouda, Release 2020.07.07

Raises • TypeError – Raised if the dtype(s) of values array(s) does/do not support the nunique method • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if nunique is not supported for the values dtype

Examples

>>> data= ak.array([3,4,3,1,1,4,3,4,1,4]) >>> data array([3, 4, 3, 1, 1, 4, 3, 4, 1, 4]) >>> labels= ak.array([1,1,1,2,2,2,3,3,3,4]) >>> labels ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4]) >>> g= ak.GroupBy(labels) >>> g.keys ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4]) >>> g.nunique(data) array([1,2,3,4]), array([2, 2, 3, 1]) # Group (1,1,1) has values [3,4,3] -> there are 2 unique values 3&4 # Group (2,2,2) has values [1,1,4] -> 2 unique values 1&4 # Group (3,3,3) has values [3,4,1] -> 3 unique values # Group (4) has values [4] -> 1 unique value

prod(values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical, Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and compute the product of each group’s values. Parameters values (pdarray) – The values to group and multiply Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_products (pdarray, float64) – One product per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if prod is not supported for the values dtype

64 Chapter 5. Usage arkouda, Release 2020.07.07

Notes

The return dtype is always float64.

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.prod(b) (array([2, 3, 4]), array([12, 108.00000000000003, 8.9999999999999982]))

sum(values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical, Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and sum each group’s values. Parameters values (pdarray) – The values to group and sum Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_sums (pdarray) – One sum per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The grouped sum of a boolean pdarray returns integers.

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b (continues on next page)

5.10. GroupBy 65 arkouda, Release 2020.07.07

(continued from previous page) array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.sum(b) (array([2, 3, 4]), array([8, 14, 6]))

5.11 Strings in Arkouda

Like NumPy, Arkouda supports arrays of strings, but whereas in NumPy arrays of strings are still ndarray objects, in Arkouda the array of strings is its own class: Strings. In order to efficiently store strings with a wide range of lengths, Arkouda uses a “segmented array” , comprising: • bytes:A uint8 array containing the concatenated bytes of all the strings, separated by null (0) bytes. • offsets:A int64 array with the start index of each string

5.11.1 Performance

Because strings are a variable-width data type, and because of the way Arkouda represents strings, operations on strings are considerably slower than operations on numeric data. Use numeric data whenever possible. For example, if your raw data contains string data that could be represented numerically, consider setting up a processing pipeline performs the conversion (and stores the result in HDF5 format) on ingest.

5.11.2 I/O

Arrays of strings can be transferred between the Arkouda client and server using the arkouda.array and Strings. to_ndarray functions (see Data I/O). The former converts a Python list or NumPy ndarray of strings to an Arkouda Strings object, whereas the latter converts an Arkouda Strings object to a NumPy ndarray. As with numeric arrays, if the size of the data exceeds the threshold set by arkouda.maxTransferBytes, the client will raise an exception. Arkouda currently only supports the HDF5 file format for disk-based I/O. In order to read an array of strings from an HDF5 file, the strings must be stored in an HDF5 group containing two datasets: segments (an integer array corresponding to offsets above) and values (a uint8 array corresponding to bytes above). See Data Preprocessing for more information and guidelines.

5.11.3 Iteration

Iterating directly over a Strings with for x in string is not supported to discourage transferring all the Strings object’s data from the arkouda server to the Python client since there is almost always a more array-oriented way to express an iterator-based computation. To force this transfer, use the to_ndarray function to return the Strings as a numpy.ndarray. See I/O for more details about using to_ndarray with Strings arkouda.Strings.to_ndarray(self ) → numpy.ndarray Convert the array to a np.ndarray, transferring array data from the arkouda server to Python. If the array exceeds a built-in size limit, a RuntimeError is raised. Returns A numpy ndarray with the same strings as this array Return type np.ndarray

66 Chapter 5. Usage arkouda, Release 2020.07.07

Notes

The number of bytes in the array cannot exceed arkouda.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.maxTransferBytes to a larger value, but proceed with caution. See also: array

Examples

>>> a= ak.array(["hello","my","world"]) >>> a.to_ndarray() array(['hello', 'my', 'world'], dtype='>> type(a.to_ndarray()) numpy.ndarray

5.11.4 Operations

Arkouda Strings objects support the following operations: • Indexing with integer, slice, integer pdarray, and boolean pdarray (see Indexing and Assignment) • Comparison (== and !=) with string literal or other Strings object of same size • Array Set Operations, e.g. unique and in1d • Sorting, via argsort and coargsort • GroupBy, both alone and in conjunction with numeric arrays • Type Casting to and from numeric arrays • Concatenation with other Strings

5.11.5 String-Specific Methods

Substring search

Strings.contains(substr: Union[bytes, str, numpy.str_], regex: bool = False) → arkouda.pdarrayclass.pdarray Check whether each element contains the given substring. Parameters • substr (str_scalars) – The substring in the form of string or byte array to search for • regex (bool) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns True for elements that contain substr, False otherwise Return type pdarray, bool

5.11. Strings in Arkouda 67 arkouda, Release 2020.07.07

Raises • TypeError – Raised if the substr parameter is not bytes or str_scalars • ValueError – Rasied if substr is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: Strings.startswith, Strings.endswith

Examples

>>> strings= ak.array([ '{} string {}'.format(i, i) for i in range(1,6)]) >>> strings array(['1 string 1', '2 string 2', '3 string 3', '4 string 4', '5 string 5

˓→']) >>> strings.contains('string') array([True, True, True, True, True]) >>> strings.contains('string\d ', regex=True) array([True, True, True, True, True])

Strings.startswith(substr: Union[bytes, str, numpy.str_], regex: bool = False) → arkouda.pdarrayclass.pdarray Check whether each element starts with the given substring. Parameters • substr (Union[bytes, str_scalars]) – The prefix to search for • regex (bool) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns True for elements that start with substr, False otherwise Return type pdarray, bool Raises • TypeError – Raised if the substr parameter is not a bytes ior str_scalars • ValueError – Rasied if substr is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: Strings.contains, Strings.endswith

Examples

>>> strings_end= ak.array([ 'string {}'.format(i) for i in range(1,6)]) >>> strings_end array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5']) >>> strings_end.startswith('string') array([True, True, True, True, True]) >>> strings_start= ak.array([ '{} string'.format(i) for i in range(1,6)]) >>> strings_start (continues on next page)

68 Chapter 5. Usage arkouda, Release 2020.07.07

(continued from previous page) array(['1 string', '2 string', '3 string', '4 string', '5 string']) >>> strings_start.startswith('\d str', regex= True) array([True, True, True, True, True])

Strings.endswith(substr: Union[bytes, str, numpy.str_], regex: bool = False) → arkouda.pdarrayclass.pdarray Check whether each element ends with the given substring. Parameters • substr (Union[bytes, str_scalars]) – The suffix to search for • regex (bool) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns True for elements that end with substr, False otherwise Return type pdarray, bool Raises • TypeError – Raised if the substr parameter is not bytes or str_scalars • ValueError – Rasied if substr is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: Strings.contains, Strings.startswith

Examples

>>> strings_start= ak.array([ '{} string'.format(i) for i in range(1,6)]) >>> strings_start array(['1 string', '2 string', '3 string', '4 string', '5 string']) >>> strings_start.endswith('ing') array([True, True, True, True, True]) >>> strings_end= ak.array([ 'string {}'.format(i) for i in range(1,6)]) >>> strings_end array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5']) >>> strings_end.endswith('ing\d ', regex= True) array([True, True, True, True, True])

Splitting and joining

Strings.peel(delimiter: Union[bytes, str, numpy.str_], times: Union[int, numpy.int64] = 1, includeDelimiter: bool = False, keepPartial: bool = False, fromRight: bool = False, regex: bool = False) → Tuple Peel off one or more delimited fields from each string (similar to string.partition), returning twonew arrays of strings. Warning: This function is experimental and not guaranteed to work. Parameters • delimiter (Union[bytes, str_scalars]) – The separator where the split will occur

5.11. Strings in Arkouda 69 arkouda, Release 2020.07.07

• times (Union[int, np.int64]) – The number of times the delimiter is sought, i.e. skip over the first (times-1) delimiters • includeDelimiter (bool) – If true, append the delimiter to the end of the first return array. By default, it is prepended to the beginning of the second return array. • keepPartial (bool) – If true, a string that does not contain instances of the delimiter will be returned in the first array. By default, such strings are returned in the second array. • fromRight (bool) – If true, peel from the right instead of the left (see also rpeel) • regex (bool) – Indicates whether delimiter is a regular expression Note: only han- dles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns left: Strings The field(s) peeled from the end of each string (unless fromRight istrue) right: Strings The remainder of each string after peeling (unless fromRight is true) Return type Tuple[Strings, Strings] Raises • TypeError – Raised if the delimiter parameter is not byte or str_scalars, if times is not int64, or if includeDelimiter, keepPartial, or fromRight is not bool • ValueError – Raised if times is < 1 or if delimiter is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: rpeel, stick, lstick

Examples

>>> s= ak.array([ 'a.b', 'c.d', 'e.f.g']) >>> s.peel('.') (array(['a', 'c', 'e']), array(['b', 'd', 'f.g'])) >>> s.peel('.', includeDelimiter=True) (array(['a.', 'c.', 'e.']), array(['b', 'd', 'f.g'])) >>> s.peel('.', times=2) (array(['', '', 'e.f']), array(['a.b', 'c.d', 'g'])) >>> s.peel('.', times=2, keepPartial=True) (array(['a.b', 'c.d', 'e.f']), array(['', '', 'g']))

Strings.rpeel(delimiter: Union[bytes, str, numpy.str_], times: Union[int, numpy.int64] = 1, includeDelimiter: bool = False, keepPartial: bool = False, regex: bool = False) Peel off one or more delimited fields from the end of each string (similar to string.rpartition), returning two new arrays of strings. Warning: This function is experimental and not guaranteed to work. Parameters • delimiter (Union[bytes, str_scalars]) – The separator where the split will occur • times (Union[int, np.int64]) – The number of times the delimiter is sought, i.e. skip over the last (times-1) delimiters

70 Chapter 5. Usage arkouda, Release 2020.07.07

• includeDelimiter (bool) – If true, prepend the delimiter to the start of the first return array. By default, it is appended to the end of the second return array. • keepPartial (bool) – If true, a string that does not contain instances of the delimiter will be returned in the second array. By default, such strings are returned in the first array. • regex (bool) – Indicates whether delimiter is a regular expression Note: only han- dles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns left: Strings The remainder of the string after peeling right: Strings The field(s) that were peeled from the right of each string Return type Tuple[Strings, Strings] Raises • TypeError – Raised if the delimiter parameter is not bytes or str_scalars or if times is not int64 • ValueError – Raised if times is < 1 or if delimiter is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: peel, stick, lstick

Examples

>>> s= ak.array([ 'a.b', 'c.d', 'e.f.g']) >>> s.rpeel('.') (array(['a', 'c', 'e.f']), array(['b', 'd', 'g'])) # Compared against peel >>> s.peel('.') (array(['a', 'c', 'e']), array(['b', 'd', 'f.g']))

Strings.stick(other: arkouda.strings.Strings, delimiter: Union[bytes, str, numpy.str_] = '', toLeft: bool = False) → arkouda.strings.Strings Join the strings from another array onto one end of the strings of this array, optionally inserting a delimiter. Warning: This function is experimental and not guaranteed to work. Parameters • other (Strings) – The strings to join onto self’s strings • delimiter (str) – String inserted between self and other • toLeft (bool) – If true, join other strings to the left of self. By default, other is joined to the right of self. Returns The array of joined strings Return type Strings Raises • TypeError – Raised if the delimiter parameter is not bytes or str_scalars or if the other parameter is not a Strings instance

5.11. Strings in Arkouda 71 arkouda, Release 2020.07.07

• ValueError – Raised if times is < 1 • RuntimeError – Raised if there is a server-side error thrown See also: lstick, peel, rpeel

Examples

>>> s= ak.array([ 'a', 'c', 'e']) >>> t= ak.array([ 'b', 'd', 'f']) >>> s.stick(t, delimiter='.') array(['a.b', 'c.d', 'e.f'])

Strings.lstick(other: arkouda.strings.Strings, delimiter: Union[bytes, str, numpy.str_] = '') → arkouda.strings.Strings Join the strings from another array onto the left of the strings of this array, optionally inserting a delimiter. Warning: This function is experimental and not guaranteed to work. Parameters • other (Strings) – The strings to join onto self’s strings • delimiter (Union[bytes,str_scalars]) – String inserted between self and other Returns The array of joined strings, as other + self Return type Strings Raises • TypeError – Raised if the delimiter parameter is neither bytes nor a str or if the other parameter is not a Strings instance • RuntimeError – Raised if there is a server-side error thrown See also: stick, peel, rpeel

Examples

>>> s= ak.array([ 'a', 'c', 'e']) >>> t= ak.array([ 'b', 'd', 'f']) >>> s.lstick(t, delimiter='.') array(['b.a', 'd.c', 'f.e'])

72 Chapter 5. Usage arkouda, Release 2020.07.07

Flattening

Given an array of strings where each string encodes a variable-length sequence delimited by a common substring, flattening offers a method for unpacking the sequences into a flat array of individual elements. A mappingbetween original strings and new array elements can be preserved, if desired. This method can be used in pipe Strings.flatten(delimiter: str, return_segments: bool = False, regex: bool = False) → Union[arkouda.strings.Strings, Tuple] Unpack delimiter-joined substrings into a flat array. Parameters • delimiter (str) – Characters used to split strings into substrings • return_segments (bool) – If True, also return mapping of original strings to first substring in return array. • regex (bool) – Indicates whether delimiter is a regular expression Note: only han- dles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns • Strings – Flattened substrings with delimiters removed • pdarray, int64 (optional) – For each original string, the index of first corresponding substring in the return array See also: peel, rpeel

Examples

>>> orig= ak.array([ 'one|two', 'three|four|five', 'six']) >>> orig.flatten('|') array(['one', 'two', 'three', 'four', 'five', 'six']) >>> flat, map= orig.flatten( '|', return_segments=True) >>> map array([0, 2, 5]) >>> under= ak.array([ 'one_two', 'three_____four____five', 'six']) >>> under_flat, under_map= under.flatten( '_+', return_segments=True,␣

˓→regex=True) >>> under_flat array(['one', 'two', 'three', 'four', 'five', 'six']) >>> under_map array([0, 2, 5])

5.11. Strings in Arkouda 73 arkouda, Release 2020.07.07

5.12 Categoricals

Categorical arrays are a concept from Pandas that speeds up many operations on strings, especially when an array of strings contains many repeated values. A Categorical object stores the unique strings as category labels and represents the values of the original array as integer indices into this category array.

5.12.1 Construction

The typical way to construct a Categorical is from a Strings object: class arkouda.Categorical(values, **kwargs) Represents an array of values belonging to named categories. Converting a Strings object to Cate- gorical often saves memory and speeds up operations, especially if there are many repeated values, at the cost of some one-time work in initialization. Parameters values (Strings) – String values to convert to categories categories The set of category labels (determined automatically) Type Strings codes The category indices of the values or -1 for N/A Type pdarray, int64 permutation The permutation that groups the values in the same order as categories Type pdarray, int64 segments When values are grouped, the starting offset of each group Type pdarray, int64 size The number of items in the array Type Union[int,np.int64] nlevels The number of distinct categories Type Union[int,np.int64] ndim The rank of the array (currently only rank 1 arrays supported) Type Union[int,np.int64] shape The sizes of each dimension of the array Type tuple However, if one already has pre-computed unique categories and integer indices, the following constructor is useful: classmethod Categorical.from_codes(codes: arkouda.pdarrayclass.pdarray, categories: arkouda.strings.Strings, permutation=None, segments=None) → arkouda.categorical.Categorical Make a Categorical from codes and categories arrays. If codes and categories have already been pre-computed, this constructor saves time. If not, please use the normal constructor. Parameters

74 Chapter 5. Usage arkouda, Release 2020.07.07

• codes (pdarray, int64) – Category indices of each value • categories (Strings) – Unique category labels • permutation (pdarray, int64) – The permutation that groups the values in the same order as categories • segments (pdarray, int64) – When values are grouped, the starting offset of each group Returns The Categorical object created from the input parameters Return type Categorical Raises TypeError – Raised if codes is not a pdarray of int64 objects or if categories is not a Strings object

5.12.2 Operations

Arkouda Categorical objects support all operations that Strings support, and they will almost always execute faster: • Indexing with integer, slice, integer pdarray, and boolean pdarray (see Indexing and Assignment) • Comparison (== and !=) with string literal or other Categorical object of same size • Substring search Categorical.contains(substr: str) → arkouda.pdarrayclass.pdarray Check whether each element contains the given substring. Parameters substr (str) – The substring to search for Returns True for elements that contain substr, False otherwise Return type pdarray, bool Raises TypeError – Raised if substr is not a str

Notes

This method can be significantly faster than the corresponding method on Strings objects, because it searches the unique category labels instead of the full array. See also: Categorical.startswith, Categorical.endswith Categorical.startswith(substr: str) → arkouda.pdarrayclass.pdarray Check whether each element starts with the given substring. Parameters substr (str) – The substring to search for Raises TypeError – Raised if substr is not a str Returns True for elements that contain substr, False otherwise Return type pdarray, bool

5.12. Categoricals 75 arkouda, Release 2020.07.07

Notes

This method can be significantly faster than the corresponding method on Strings objects, because it searches the unique category labels instead of the full array. See also: Categorical.contains, Categorical.endswith Categorical.endswith(substr: str) → arkouda.pdarrayclass.pdarray Check whether each element ends with the given substring. Parameters substr (str) – The substring to search for Raises TypeError – Raised if substr is not a str Returns True for elements that contain substr, False otherwise Return type pdarray, bool

Notes

This method can be significantly faster than the corresponding method on Strings objects, because it searches the unique category labels instead of the full array. See also: Categorical.startswith, Categorical.contains • Array Set Operations, e.g. unique and in1d • Sorting, via argsort and coargsort • GroupBy, both alone and in conjunction with numeric arrays

5.12.3 Iteration

Iterating directly over a Categorical with for x in categorical is not supported to discourage transferring all the Categorical object’s data from the arkouda server to the Python client since there is almost always a more array- oriented way to express an iterator-based computation. To force this transfer, use the to_ndarray function to return the categorical as a numpy.ndarray. This transfer will raise an error if it exceeds the byte limit defined in arkouda. maxTransferBytes. arkouda.Categorical.to_ndarray(self ) → numpy.ndarray Convert the array to a np.ndarray, transferring array data from the arkouda server to Python. This conversion discards category information and produces an ndarray of strings. If the arrays exceeds a built-in size limit, a RuntimeError is raised. Returns A numpy ndarray of strings corresponding to the values in this array Return type np.ndarray

76 Chapter 5. Usage arkouda, Release 2020.07.07

Notes

The number of bytes in the array cannot exceed arkouda.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.maxTransferBytes to a larger value, but proceed with caution.

5.12. Categoricals 77 arkouda, Release 2020.07.07

78 Chapter 5. Usage CHAPTER SIX

EXAMPLES

6.1 DataFrame-like Patterns

DataFrames (e.g. from pandas) are a useful abstraction for working with tabular data. While arkouda does not yet have an actual DataFrame class, it is possible to do many of the same operations. Here, we will create and use a pseudo- DataFrame: a dict of named pdarray objects, which are analogous to columns of a DataFrame. Let the following represent transactions in which a userID purchased an item on a particular day for a certain amount of money: >>> userName= ak.array([ 'Alice', 'Bob', 'Alice', 'Carol', 'Bob', 'Alice']) >>> userID= ak.array([111, 222, 111, 333, 222, 111]) >>> item= ak.array([0,0,1,1,2,0]) >>> day= ak.array([5,5,6,5,6,6]) >>> amount= ak.array([0.5, 0.6, 1.1, 1.2, 4.3, 0.6]) >>> data={ 'userName': userName, 'userID': userID, 'item': item, 'day': day, 'amount': amount}

6.1.1 Selection

The df.loc[condition] syntax is useful for selecting subsets of data by value and can be emulated in arkouda. For example, here we select all transactions involving user 111 of an amount less than 1.0: >>> condition= (data[ 'userID'] == 111)& (data[ 'amount']< 1.0) >>> u1= {col: a[condition] for col, a in data.items()} >>> u1 {'userName': array(['Alice', 'Alice']), 'userID': array([111, 111]), 'item': array([0, 0]), 'day': array([5, 6]), 'amount': array([0.5, 0.59999999999999998])}

79 arkouda, Release 2020.07.07

6.1.2 Description

>>> ak.value_counts(data['day']) (array([5, 6]), array([3, 3])) >>> ak.histogram(data['amount'], 10) array([3, 2, 0, 0, 0, 0, 0, 0, 0, 1])

6.1.3 Grouping

In Pandas, groupby-aggregate is a very useful pattern that can be computationally intensive. Arkouda supports grouping by key and most aggregations in Pandas. Note that, because arkouda does not yet have a true DataFrame class, the arkouda GroupBy operation does not conform to the pandas API. Here we group the data by item and get the number of unique users who bought the item, and the total revenue generated by the item. >>> byItem= ak.GroupBy(data[ 'item']) >>> byItem.nunique(data['userID']) (array([0, 1, 2]), array([2, 2, 1])) >>> byItem.sum(data['amount']) (array([0, 1, 2]), array([1.7000000000000002, 2.2999999999999998, 4.3000000000000007]))

6.1.4 Integration with Pandas

Often, it is useful to load data in arkouda and bring back a small subset of the data to explore further in Pandas. This can be done as long as each column is less than arkouda.maxTransferBytes in size (default 1 GB). # Assume some filtering takes place here >>> subset= data >>> df= pd.DataFrame({col: a.to_ndarray() for col, a in subset.items()}) >>> df amount day item userID userName 0 0.550 111 Alice 1 0.650 222 Bob 2 1.161 111 Alice 3 1.251 333 Carol 4 4.362 222 Bob 5 0.660 111 Alice

6.2 Graphs

Arkouda can be used for constructing and performing basic analysis of graphs. Consider the following arkouda code (from toys/ak_rmat.py), which generates an RMAT graph: def gen_rmat_edges(lgNv, Ne_per_v, p, perm=False): # number of vertices Nv=2**lgNv (continues on next page)

80 Chapter 6. Examples arkouda, Release 2020.07.07

(continued from previous page) # number of edges Ne= Ne_per_v* Nv # probabilities a=p b=(1.0- a)/ 3.0 c=b d=b # init edge arrays ii= ak.ones(Ne,dtype=ak.int64) jj= ak.ones(Ne,dtype=ak.int64) # quantites to use in edge generation loop ab=a+b c_norm=c/ (c+ d) a_norm=a/ (a+ b) # generate edges for ib in range(1,lgNv): ii_bit= (ak.randint(0,1,Ne,dtype=ak.float64)> ab) jj_bit= (ak.randint(0,1,Ne,dtype=ak.float64)> (c_norm* ii_bit+ a_norm*(~␣

˓→ii_bit))) ii= ii+((2**(ib-1))* ii_bit) jj= jj+((2**(ib-1))* jj_bit) # sort all based on ii and jj using coargsort # all edges should be sorted based on both vertices of the edge iv= ak.coargsort((ii,jj)) # permute into sorted order ii= ii[iv] # permute first vertex into sorted order jj= jj[iv] # permute second vertex into sorted order # to premute/rename vertices if perm: # generate permutation for new vertex numbers(names) ir= ak.argsort(ak.randint(0,1,Nv,dtype=ak.float64)) # renumber(rename) vertices ii= ir[ii] # rename first vertex jj= ir[jj] # rename second vertex # # maybe: remove edges which are self-loops??? # # return pair of ndarrays return (ii,jj)

Here we generate a random-looking edge-list representing one million vertices and about 10 million edges >>> src, dst= gen_rmat_edges(20, 10, 0.01, True)

Calculate out degrees using GroupBy: >>> bySrc= ak.GroupBy(src) >>> srcID, outDeg= bySrc.count()

Breadth first search is relatively straightforward to implement using Array Set Operations. This example is from toys/ ak_bfs_conn_comp.py.

6.2. Graphs 81 arkouda, Release 2020.07.07

# src and dst pdarrays hold the edge list # seeds pdarray with starting vertices/seeds def bfs(src,dst,seeds,printLayers=False): # holds vertices in the current layer of the bfs Z= ak.unique(seeds) # holds the visited vertices V= ak.unique(Z) # holds vertices in Z to start with # frontiers F= [Z] while Z.size !=0: if printLayers: print("Z.size =",Z.size,"Z=",Z) fZv= ak.in1d(src,Z) # find src vertex edges W= ak.unique(dst[fZv]) # compress out dst vertices to match and make them unique Z= ak.setdiff1d(W,V) # subtract out vertices already visited V= ak.union1d(V,Z) # union current frontier into vertices already visited F.append(Z) return (F,V)

Now we do a breadth-first search from the first vertex: >>> layers, visited= bfs(src, dst, ak.array([src[0]])) >>> [l.size for l in layers] [1, 1, 2056, 42584, 410889, 24146, 2, 0] >>> visited.size 479679

From this we see the number of new vertices in each frontier, as well as the total number of vertices reachable from the seed.

82 Chapter 6. Examples CHAPTER SEVEN

CONTRIBUTING

This section describes how to add new functionality to arkouda.

7.1 Adding Python Functionality

Python functions should follow the API of NumPy or Pandas, were possible. In general, functions should conform to the following: 1. Be defined somewhere in the arkouda subdirectory, such as in arkouda/pdarraysetops.py 2. Have a complete docstring in NumPy format 3. Check argument types and properties, raising exceptions if necessary 4. Send a request message using generic_msg(request) 5. Process the reply message 6. Possibly create one or more pdarray objects 7. Return any results

7.1.1 Example

def foo(pda): """ Return the foo() of the array.

Parameters ------pda : pdarray The array to foo

Returns ------pdarray The foo'd array """ if isinstance(pda, pdarray): repMsg= generic_msg("foo {}".format(pda.name)) return create_pdarray(repMsg) (continues on next page)

83 arkouda, Release 2020.07.07

(continued from previous page) else: raise TypeError("must be pdarray {}".format(pda))

7.2 Adding Functionality to the Arkouda Server

Your contribution must include all the machinery to process a command from the client, in addition to the logic of the coputation. When the client issues a command foo arg1 arg2 ... to the arkouda server, this is what typically happens: 1. The select block in arkouda_server.chpl sees “foo” and calls fooMsg(reqMsg, st), passing the com- mand string and the symbol table. 2. The fooMsg function is found via the MsgProcessing module, which contains use FooMsg and thus gets all symbols from the FooMsg module where fooMsg() is defined. 3. The fooMsg() function (in the FooMsg module) parses and executes the command by 1. Splitting the command string 2. Casting any scalar args 3. Looking up pdarray (GenSymEntry) args in the symbol table with st.lookup(arg) and checking for nil result 4. Checking dtypes of arrays and branching to corresponding code 5. Casting GenSymEntry objects to correct types with toSymEntry() 6. Executing the operation, usually on the array data entry.a 7. If necessary, creating new SymEntry and adding it to the symbol table with st.addEntry() 8. Returning an appropriate message string 1. If the return is an array, “created ” 2. If the return is multiple arrays, one creation string per array, joined by “+” 3. If the return is a scalar, “ ” 4. If any error occurred, then “Error: ” (see ServerErrorStrings.chpl for functions to generate common error strings)

7.2.1 Example

First, in src/arkouda_server.chpl, add a when statement to register the “foo” command: // parse requests, execute requests, format responses select cmd { // ... when "foo" {repMsg= fooMsg(reqMsg, st);} // ... }

Next, in the MsgProcessing module, add public use FooMsg; in the appropriate location:

84 Chapter 7. Contributing arkouda, Release 2020.07.07

module MsgProcessing { use ServerConfig; use Time only; use Math only; use MultiTypeSymbolTable; use MultiTypeSymEntry; use ServerErrorStrings; use AryUtil;

public use OperatorMsg; // ... public use FooMsg; // ...

Then, define your argument parsing and function logic in src/FooMsg.chpl in the following manner: module FooMsg { use ServerConfig;

use MultiTypeSymEntry; use ServerErrorStrings; use MultiTypeSymbolTable;

// do foo on array a proc foo(a: [?aD] int): [aD] int { //... return(ret); }

/* Parse, execute, and respond to a foo message :arg reqMsg: request containing (cmd,dtype,size) :type reqMsg: string :arg st: SymTab to act on :type st: borrowed SymTab :returns: (string) response message */ proc fooMsg(reqMsg: string, st: borrowed SymTab): string throws { var repMsg: string; // response message // split request into fields var (cmd, name)= reqMsg.splitMsgToTuple(2); // get next symbol name var rname= st.nextName();

var gEnt: borrowed GenSymEntry= st.lookup(name); if (gEnt== nil){return unknownSymbolError("set",name);} // if verbose print action if v {try! writeln("%s %s: %s".format(cmd,name,rname)); try! stdout.flush();} select (gEnt.dtype) { when (DType.Int64) { var e= toSymEntry(gEnt, int); (continues on next page)

7.2. Adding Functionality to the Arkouda Server 85 arkouda, Release 2020.07.07

(continued from previous page) var ret= foo(e.a); st.addEntry(rname, new shared SymEntry(ret)); } otherwise {return notImplementedError("foo",gEnt.dtype);} } // response message return try! "created "+ st.attrib(rname); } }

86 Chapter 7. Contributing CHAPTER EIGHT

API REFERENCE

This page contains auto-generated API reference documentation1.

8.1 arkouda

8.1.1 Submodules arkouda._version

Git implementation of _version.py.

Module Contents

Classes

VersioneerConfig Container for Versioneer configuration parameters.

Functions

get_keywords() Get the keywords needed to look up the version informa- tion. get_config() Create, populate and return the VersioneerConfig() ob- ject. register_vcs_handler(vcs, method) Create decorator to mark a method as the handler of a VCS. run_command(commands, args, cwd=None, ver- Call the given command(s). bose=False, hide_stderr=False, env=None) versions_from_parentdir(parentdir_prefix, root, Try to determine the version from the parent directory verbose) name. git_get_keywords(versionfile_abs) Extract version information from the given file. git_versions_from_keywords(keywords, Get version information from git keywords. tag_prefix, verbose) continues on next page

1 Created with sphinx-autoapi

87 arkouda, Release 2020.07.07

Table 2 – continued from previous page git_pieces_from_vcs(tag_prefix, root, verbose, Get version from 'git describe' in the root of the source run_command=run_command) . plus_or_dot(pieces) Return a + if we don't already have one, else return a . render_pep440(pieces) Build up version string, with post-release "local version identifier". render_pep440_pre(pieces) TAG[.post0.devDISTANCE] -- No -dirty. render_pep440_post(pieces) TAG[.postDISTANCE[.dev0]+gHEX] . render_pep440_old(pieces) TAG[.postDISTANCE[.dev0]] . render_git_describe(pieces) TAG[-DISTANCE-gHEX][-dirty]. render_git_describe_long(pieces) TAG-DISTANCE-gHEX[-dirty]. render(pieces, style) Render the given version pieces into the requested style. get_versions() Get version information or return default if unable to do so.

Attributes

LONG_VERSION_PY

HANDLERS arkouda._version.get_keywords() Get the keywords needed to look up the version information. class arkouda._version.VersioneerConfig Container for Versioneer configuration parameters. arkouda._version.get_config() Create, populate and return the VersioneerConfig() object. exception arkouda._version.NotThisMethod Bases: Exception Exception raised if a method is not valid for the current scenario. arkouda._version.LONG_VERSION_PY arkouda._version.HANDLERS arkouda._version.register_vcs_handler(vcs, method) Create decorator to mark a method as the handler of a VCS. arkouda._version.run_command(commands, args, cwd=None, verbose=False, hide_stderr=False, env=None) Call the given command(s). arkouda._version.versions_from_parentdir(parentdir_prefix, root, verbose) Try to determine the version from the parent directory name. Source tarballs conventionally unpack into a directory that includes both the project name and a version string. We will also support searching up two directory levels for an appropriately named parent directory arkouda._version.git_get_keywords(versionfile_abs) Extract version information from the given file. arkouda._version.git_versions_from_keywords(keywords, tag_prefix, verbose) Get version information from git keywords.

88 Chapter 8. API Reference arkouda, Release 2020.07.07 arkouda._version.git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command) Get version from ‘git describe’ in the root of the source tree. This only gets called if the git-archive ‘subst’ keywords were not expanded, and _version.py hasn’t already been rewritten with a short version string, meaning we’re inside a checked out source tree. arkouda._version.plus_or_dot(pieces) Return a + if we don’t already have one, else return a . arkouda._version.render_pep440(pieces) Build up version string, with post-release “local version identifier”. Our goal: TAG[+DISTANCE.gHEX[.dirty]] . Note that if you get a tagged build and then dirty it, you’ll get TAG+0.gHEX.dirty Exceptions: 1: no tags. git_describe was just HEX. 0+untagged.DISTANCE.gHEX[.dirty] arkouda._version.render_pep440_pre(pieces) TAG[.post0.devDISTANCE] – No -dirty. Exceptions: 1: no tags. 0.post0.devDISTANCE arkouda._version.render_pep440_post(pieces) TAG[.postDISTANCE[.dev0]+gHEX] . The “.dev0” means dirty. Note that .dev0 sorts backwards (a dirty tree will appear “older” than the corresponding clean one), but you shouldn’t be releasing software with -dirty anyways. Exceptions: 1: no tags. 0.postDISTANCE[.dev0] arkouda._version.render_pep440_old(pieces) TAG[.postDISTANCE[.dev0]] . The “.dev0” means dirty. Exceptions: 1: no tags. 0.postDISTANCE[.dev0] arkouda._version.render_git_describe(pieces) TAG[-DISTANCE-gHEX][-dirty]. Like ‘git describe –tags –dirty –always’. Exceptions: 1: no tags. HEX[-dirty] (note: no ‘g’ prefix) arkouda._version.render_git_describe_long(pieces) TAG-DISTANCE-gHEX[-dirty]. Like ‘git describe –tags –dirty –always -long’. The distance/hash is unconditional. Exceptions: 1: no tags. HEX[-dirty] (note: no ‘g’ prefix) arkouda._version.render(pieces, style) Render the given version pieces into the requested style. arkouda._version.get_versions() Get version information or return default if unable to do so.

8.1. arkouda 89 arkouda, Release 2020.07.07 arkouda.categorical

Module Contents

Classes

Categorical Represents an array of values belonging to named cate- gories. Converting a class arkouda.categorical.Categorical(values, **kwargs) Represents an array of values belonging to named categories. Converting a Strings object to Categorical often saves memory and speeds up operations, especially if there are many repeated values, at the cost of some one-time work in initialization. Parameters values (Strings) – String values to convert to categories categories The set of category labels (determined automatically) Type Strings codes The category indices of the values or -1 for N/A Type pdarray, int64 permutation The permutation that groups the values in the same order as categories Type pdarray, int64 segments When values are grouped, the starting offset of each group Type pdarray, int64 size The number of items in the array Type Union[int,np.int64] nlevels The number of distinct categories Type Union[int,np.int64] ndim The rank of the array (currently only rank 1 arrays supported) Type Union[int,np.int64] shape The sizes of each dimension of the array Type tuple BinOps RegisterablePieces RequiredPieces

90 Chapter 8. API Reference arkouda, Release 2020.07.07

objtype = category permutation segments classmethod from_codes(cls, codes: arkouda.pdarrayclass.pdarray, categories: arkouda.strings.Strings, permutation=None, segments=None) → Categorical Make a Categorical from codes and categories arrays. If codes and categories have already been pre- computed, this constructor saves time. If not, please use the normal constructor. Parameters • codes (pdarray, int64) – Category indices of each value • categories (Strings) – Unique category labels • permutation (pdarray, int64) – The permutation that groups the values in the same order as categories • segments (pdarray, int64) – When values are grouped, the starting offset of each group Returns The Categorical object created from the input parameters Return type Categorical Raises TypeError – Raised if codes is not a pdarray of int64 objects or if categories is not a Strings object to_ndarray(self ) → numpy.ndarray Convert the array to a np.ndarray, transferring array data from the arkouda server to Python. This conversion discards category information and produces an ndarray of strings. If the arrays exceeds a built-in size limit, a RuntimeError is raised. Returns A numpy ndarray of strings corresponding to the values in this array Return type np.ndarray

Notes

The number of bytes in the array cannot exceed arkouda.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.maxTransferBytes to a larger value, but proceed with caution. abstract __iter__(self ) __len__(self ) __str__(self ) Return str(self). __repr__(self ) Return repr(self). _binop(self, other: Union[Categorical, arkouda.dtypes.str_scalars], op: arkouda.dtypes.str_scalars) → arkouda.pdarrayclass.pdarray Executes the requested binop on this Categorical instance and returns the results within a pdarray object. Parameters

8.1. arkouda 91 arkouda, Release 2020.07.07

• other (Union[Categorical,str_scalars]) – the other object is a Categorical object or string scalar • op (str_scalars) – name of the binary operation to be performed Returns encapsulating the results of the requested binop Return type pdarray Raises • ValueError – Raised if (1) the op is not in the self.BinOps set, or (2) if the sizes of this and the other instance don’t match • RuntimeError – Raised if a server-side error is thrown while executing the binary opera- tion _r_binop(self, other: Union[Categorical, arkouda.dtypes.str_scalars], op: arkouda.dtypes.str_scalars) → arkouda.pdarrayclass.pdarray

Executes the requested reverse binop on this Categorical instance and returns the results within a pdarray object. other [Union[Categorical,str_scalars]] the other object is a Categorical object or string scalar op [str_scalars] name of the binary operation to be performed

pdarray encapsulating the results of the requested binop

Raises

• ValueError Raised if (1) the op is not in the self.BinOps set, or (2) if the sizes of this and the other instance don’t match RuntimeError Raised if a server-side error is thrown while executing the binary operation

__eq__(self, other) Return self==value. __ne__(self, other) Return self!=value. __getitem__(self, key) → Categorical reset_categories(self ) → Categorical Recompute the category labels, discarding any unused labels. This method is often useful after slicing or indexing a Categorical array, when the resulting array only contains a subset of the original categories. In this case, eliminating unused categories can speed up other operations. Returns A Categorical object generated from the current instance Return type Categorical contains(self, substr: str) → arkouda.pdarrayclass.pdarray Check whether each element contains the given substring. Parameters substr (str) – The substring to search for Returns True for elements that contain substr, False otherwise Return type pdarray, bool Raises TypeError – Raised if substr is not a str

92 Chapter 8. API Reference arkouda, Release 2020.07.07

Notes

This method can be significantly faster than the corresponding method on Strings objects, because it searches the unique category labels instead of the full array. See also: Categorical.startswith, Categorical.endswith startswith(self, substr: str) → arkouda.pdarrayclass.pdarray Check whether each element starts with the given substring. Parameters substr (str) – The substring to search for Raises TypeError – Raised if substr is not a str Returns True for elements that contain substr, False otherwise Return type pdarray, bool

Notes

This method can be significantly faster than the corresponding method on Strings objects, because it searches the unique category labels instead of the full array. See also: Categorical.contains, Categorical.endswith endswith(self, substr: str) → arkouda.pdarrayclass.pdarray Check whether each element ends with the given substring. Parameters substr (str) – The substring to search for Raises TypeError – Raised if substr is not a str Returns True for elements that contain substr, False otherwise Return type pdarray, bool

Notes

This method can be significantly faster than the corresponding method on Strings objects, because it searches the unique category labels instead of the full array. See also: Categorical.startswith, Categorical.contains in1d(self, test: Union[arkouda.strings.Strings, Categorical]) → arkouda.pdarrayclass.pdarray Test whether each element of the Categorical object is also present in the test Strings or Categorical object. Returns a boolean array the same length as self that is True where an element of self is in test and False otherwise. Parameters test (Union[Strings,Categorical]) – The values against which to test each value of ‘self`. Returns The values self[in1d] are in the test Strings or Categorical object. Return type pdarray, bool Raises TypeError – Raised if test is not a Strings or Categorical object

8.1. arkouda 93 arkouda, Release 2020.07.07

See also: unique, intersect1d, union1d

Notes

in1d can be considered as an element-wise function version of the python keyword in, for 1-D sequences. in1d(a, b) is logically equivalent to ak.array([item in b for item in a]), but is much faster and scales to arbitrarily large a.

Examples

>>> strings= ak.array([ 'String {}'.format(i) for i in range(0,5)]) >>> cat= ak.Categorical(strings) >>> ak.in1d(cat,strings) array([True, True, True, True, True]) >>> strings= ak.array([ 'String {}'.format(i) for i in range(5,9)]) >>> catTwo= ak.Categorical(strings) >>> ak.in1d(cat,catTwo) array([False, False, False, False, False])

unique(self ) → Categorical group(self ) → arkouda.pdarrayclass.pdarray Return the permutation that groups the array, placing equivalent categories together. All instances of the same category are guaranteed to lie in one contiguous block of the permuted array, but the blocks are not necessarily ordered. Returns The permutation that groups the array by value Return type pdarray See also: GroupBy, unique

Notes

This method is faster than the corresponding Strings method. If the Categorical was created from a Strings object, then this function simply returns the cached permutation. Even if the Categorical was created using from_codes(), this function will be faster than Strings.group() because it sorts dense integer values, rather than 128-bit hash values. argsort(self ) sort(self ) concatenate(self, others: Sequence[Categorical], ordered: bool = True) → Categorical Merge this Categorical with other Categorical objects in the array, concatenating the arrays and synchro- nizing the categories. Parameters • others (Sequence[Categorical]) – The Categorical arrays to concatenate and merge with this one

94 Chapter 8. API Reference arkouda, Release 2020.07.07

• ordered (bool) – If True (default), the arrays will be appended in the order given. If False, array data may be interleaved in blocks, which can greatly improve performance but results in non-deterministic ordering of elements. Returns The merged Categorical object Return type Categorical Raises TypeError – Raised if any others array objects are not Categorical objects

Notes

This operation can be expensive – slower than concatenating Strings. save(self, prefix_path: str, dataset: str = 'categorical_array', mode: str = 'truncate') → str Save the Categorical object to HDF5. The result is a collection of HDF5 files, one file per locale ofthe arkouda server, where each filename starts with prefix_path and dataset. Each locale saves its chunk ofthe Strings array to its corresponding file. Parameters • prefix_path (str) – Directory and filename prefix that all output files share • dataset (str) – Name of the dataset to create in HDF5 files (must not already exist) • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, create a new Categorical dataset within existing files. Returns Return type String message indicating result of save operation Raises • ValueError – Raised if the lengths of columns and values differ, or the mode is neither ‘truncate’ nor ‘append’ • TypeError – Raised if prefix_path, dataset, or mode is not astr See also: pdarrayIO.save, pdarrayIO.load_all

Notes

Important implementation notes: (1) Strings state is saved as two datasets within an hdf5 group: one for the string characters and one for the segments corresponding to the start of each string, (2) the hdf5 group is named via the dataset parameter. register(self, user_defined_name: str) → Categorical Register this Categorical object and underlying components with the Arkouda server Parameters user_defined_name (str) – user defined name the Categorical is to be registered under, this will be the root name for underlying components Returns The same Categorical which is now registered with the arkouda server and has an up- dated name. This is an in-place modification, the original is returned to support a fluid pro- gramming style. Please note you cannot register two different Categoricals with the same name. Return type Categorical Raises

8.1. arkouda 95 arkouda, Release 2020.07.07

• TypeError – Raised if user_defined_name is not a str • RegistrationError – If the server was unable to register the Categorical with the user_defined_name See also: unregister, attach, unregister_categorical_by_name, is_registered

Notes

Objects registered with the server are immune to deletion until they are unregistered. unregister(self ) → None Unregister this Categorical object in the arkouda server which was previously registered using register() and/or attached to using attach() Raises RegistrationError – If the object is already unregistered or if there is a server error when attempting to unregister See also: register, attach, unregister_categorical_by_name, is_registered

Notes

Objects registered with the server are immune to deletion until they are unregistered. is_registered(self ) → numpy.bool_

Return True iff the object is contained in the registry

Returns Indicates if the object is contained in the registry Return type numpy.bool Raises RegistrationError – Raised if there’s a server-side error or a mis-match of registered components

See also: register, attach, unregister, unregister_categorical_by_name

Notes

Objects registered with the server are immune to deletion until they are unregistered. _get_components_dict(self ) → Dict Internal function that returns a dictionary with all required or non-None components of self Required Categorical components (Codes and Categories) are always included in returned components_dict Optional Categorical components (Permutation and Segments) are only included if they’ve been set (are not None) Returns Dictionary of all required or non-None components of self Keys: component names (Codes, Categories, Permutation, Segments) Values: components of self

96 Chapter 8. API Reference arkouda, Release 2020.07.07

Return type Dict _list_component_names(self ) → List[str] Internal function that returns a list of all component names Parameters None – Returns List of all component names Return type List[str] info(self ) → str Returns a JSON formatted string containing information about all components of self Parameters None – Returns JSON string containing information about all components of self Return type str pretty_print_info(self ) → None Prints information about all components of self in a human readable format Parameters None – Returns Return type None static attach(user_defined_name: str) → Categorical

Function to return a Categorical object attached to the registered name in the arkouda server which was registered using register() user_defined_name [str] user defined name which Categorical object was registered under

Categorical The Categorical object created by re-attaching to the corresponding server compo- nents

Raises TypeError – if user_defined_name is not a string See also: register, is_registered, unregister, unregister_categorical_by_name

static unregister_categorical_by_name(user_defined_name: str) → None Function to unregister Categorical object by name which was registered with the arkouda server via regis- ter() Parameters user_defined_name (str) – Name under which the Categorical object was regis- tered Raises • TypeError – if user_defined_name is not a string • RegistrationError – if there is an issue attempting to unregister any underlying com- ponents See also: register, unregister, attach, is_registered

8.1. arkouda 97 arkouda, Release 2020.07.07

static parse_hdf_categoricals(d: Mapping[str, Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]) → Tuple[List[str], Dict[str, Categorical]] This function should be used in conjunction with the load_all function which reads hdf5 files and reconsti- tutes Categorical objects. Categorical objects use a naming convention and HDF5 structure so they can be identified and constructed for the user. In general you should not call this method directly Parameters d (Dictionary of String to either Pdarray or Strings object)– Returns • 2-Tuple of List of strings containing key names which should be removed and Dictionary of base name to • Categorical object See also: Categorical.save, load_all arkouda.client

Module Contents

Functions

connect(server: str = 'localhost', port: int = 5555, Connect to a running arkouda server. timeout: int = 0, access_token: str = None, con- nect_url=None) → None disconnect() → None Disconnects the client from the Arkouda server shutdown() → None Sends a shutdown message to the Arkouda server that does the get_config() → Mapping[str, Union[str, int, float]] Get runtime information about the server. get_mem_used() → int Compute the amount of memory used by objects in the server's symbol table. ruok() → str Simply sends an "ruok" message to the server and, if the return message is

arkouda.client.connect(server: str = 'localhost', port: int = 5555, timeout: int = 0, access_token: str = None, connect_url=None) → None Connect to a running arkouda server. Parameters • server (str, optional) – The hostname of the server (must be visible to the current machine). Defaults to localhost. • port (int, optional) – The port of the server. Defaults to 5555. • timeout (int, optional) – The timeout in seconds for client send and receive operations. Defaults to 0 seconds, whicn is interpreted as no timeout. • access_token (str, optional) – The token used to connect to an existing socket to en- able access to an Arkouda server where authentication is enabled. Defaults to None. • connect_url (str, optional) – The complete url in the format of tcp://server: port?token= where the token is optional

98 Chapter 8. API Reference arkouda, Release 2020.07.07

Returns Return type None Raises • ConnectionError – Raised if there’s an error in connecting to the Arkouda server • ValueError – Raised if there’s an error in parsing the connect_url parameter • RuntimeError – Raised if there is a server-side error

Notes

On success, prints the connected address, as seen by the server. If called with an existing connection, the socket will be re-initialized. arkouda.client.disconnect() → None Disconnects the client from the Arkouda server Returns Return type None Raises ConnectionError – Raised if there’s an error disconnecting from the Arkouda server arkouda.client.shutdown() → None Sends a shutdown message to the Arkouda server that does the following: 1. Delete all objects in the SymTable 2. Shuts down the Arkouda server 3. Disconnects the client from the stopped Arkouda Server

Returns Return type None Raises RuntimeError – Raised if the client is not connected to the Arkouda server or there is an error in disconnecting from the server arkouda.client.get_config() → Mapping[str, Union[str, int, float]] Get runtime information about the server. Returns serverHostname serverPort numLocales numPUs (number of processor units per locale) maxTaskPar (maximum number of tasks per locale) physicalMemory Return type Mapping[str, Union[str, int, float]] Raises RuntimeError – Raised if the client is not connected to a server arkouda.client.get_mem_used() → int Compute the amount of memory used by objects in the server’s symbol table. Returns Indicates the amount of memory allocated to symbol table objects. Return type int Raises • RuntimeError – Raised if there is a server-side error in getting memory used • ValueError – Raised if the returned value is not an int-formatted string

8.1. arkouda 99 arkouda, Release 2020.07.07 arkouda.client.ruok() → str Simply sends an “ruok” message to the server and, if the return message is “imok”, this means the arkouda_server is up and operating normally. A return message of “imnotok” indicates an error occurred or the connection timed out. This method is basically a way to do a quick healthcheck in a way that does not require error handling. Returns A string indicating if the server is operating normally (imok), if there’s an error server-side, or if ruok did not return a response (imnotok) in both of the latter cases Return type str arkouda.dtypes

Module Contents

Functions

check_np_dtype(dt: numpy.dtype) → None Assert that numpy dtype dt is one of the dtypes sup- ported translate_np_dtype(dt: numpy.dtype) → Tuple[str, Split numpy dtype dt into its kind and byte size, raising int] resolve_scalar_dtype(val: object) → str Try to infer what dtype arkouda_server should treat val as. get_byteorder(dt: numpy.dtype) → str Get a concrete byteorder (turns '=' into '<' or '>') get_server_byteorder() → str Get the server's byteorder

Attributes

dtype

bool

int64

float64

uint8

str_

bool_scalars

float_scalars

int_scalars

numeric_scalars

continues on next page

100 Chapter 8. API Reference arkouda, Release 2020.07.07

Table 7 – continued from previous page numpy_scalars

str_scalars

all_scalars The DType enum defines the supported Arkouda data types in string form. ARKOUDA_SUPPORTED_DTYPES

DTypes

DTypeObjects arkouda.dtypes.dtype arkouda.dtypes.bool arkouda.dtypes.int64 arkouda.dtypes.float64 arkouda.dtypes.uint8 arkouda.dtypes.str_ arkouda.dtypes.bool_scalars arkouda.dtypes.float_scalars arkouda.dtypes.int_scalars arkouda.dtypes.numeric_scalars arkouda.dtypes.numpy_scalars arkouda.dtypes.str_scalars arkouda.dtypes.all_scalars The DType enum defines the supported Arkouda data types in string form. arkouda.dtypes.ARKOUDA_SUPPORTED_DTYPES arkouda.dtypes.DTypes arkouda.dtypes.DTypeObjects arkouda.dtypes.check_np_dtype(dt: numpy.dtype) → None Assert that numpy dtype dt is one of the dtypes supported by arkouda, otherwise raise TypeError. Raises TypeError – Raised if the dtype is not in supported dtypes or if dt is not a np.dtype arkouda.dtypes.translate_np_dtype(dt: numpy.dtype) → Tuple[str, int] Split numpy dtype dt into its kind and byte size, raising TypeError for unsupported dtypes. Raises TypeError – Raised if the dtype is not in supported dtypes or if dt is not a np.dtype arkouda.dtypes.resolve_scalar_dtype(val: object) → str Try to infer what dtype arkouda_server should treat val as. arkouda.dtypes.get_byteorder(dt: numpy.dtype) → str Get a concrete byteorder (turns ‘=’ into ‘<’ or ‘>’) arkouda.dtypes.get_server_byteorder() → str Get the server’s byteorder

8.1. arkouda 101 arkouda, Release 2020.07.07 arkouda.groupbyclass

Module Contents

Classes

GroupBy Group an array or list of arrays by value, usually in preparation

Functions

broadcast(segments: arkouda.pdarrayclass.pdarray, Broadcast a dense column vector to the rows of a sparse values: arkouda.pdarrayclass.pdarray, size: matrix or grouped array. Union[int, numpy.int64] = -1, permutation: Union[arkouda.pdarrayclass.pdarray, None] = None)

Attributes

GROUPBY_REDUCTION_TYPES

arkouda.groupbyclass.GROUPBY_REDUCTION_TYPES class arkouda.groupbyclass.GroupBy(keys: groupable, assume_sorted: bool = False, hash_strings: bool = True) Group an array or list of arrays by value, usually in preparation for aggregating the within-group values of another array. Parameters • keys ((list of ) pdarray, int64, Strings, or Categorical) – The array to group by value, or if list, the column arrays to group by row • assume_sorted (bool) – If True, assume keys is already sorted (Default: False) nkeys The number of key arrays (columns) Type int size The length of the input array(s), i.e. number of rows Type int permutation The permutation that sorts the keys array(s) by value (row) Type pdarray unique_keys The unique values of the keys array(s), in grouped order Type (list of) pdarray, Strings, or Categorical

102 Chapter 8. API Reference arkouda, Release 2020.07.07

ngroups The length of the unique_keys array(s), i.e. number of groups Type int segments The start index of each group in the grouped array(s) Type pdarray logger Used for all logging operations Type ArkoudaLogger

Raises TypeError – Raised if keys is a pdarray with a dtype other than int64

Notes

Only accepts (list of) pdarrays of int64 dtype, Strings, or Categorical. Reductions find_segments(self ) → None count(self ) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Count the number of elements in each group, i.e. the number of times each key appears. Parameters none – Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • counts (pdarray, int64) – The number of times each unique key appears

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 2, 3, 1, 2, 4, 3, 4, 3, 4]) >>> g= ak.GroupBy(a) >>> keys,counts=g.count() >>> keys array([1, 2, 3, 4]) >>> counts array([1, 2, 4, 3])

aggregate(self, values: groupable, operator: str, skipna: bool = True) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and apply a reduction to each group’s values. Parameters • values (pdarray) – The values to group and reduce • operator (str) – The name of the reduction operator to use Returns

8.1. arkouda 103 arkouda, Release 2020.07.07

• unique_keys (groupable) – The unique keys, in grouped order • aggregates (groupable) – One aggregate value per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if the requested operator is not supported for the values dtype

Examples

>>> keys= ak.arange(0, 10) >>> vals= ak.linspace(-1,1, 10) >>> g= ak.GroupBy(keys) >>> g.aggregate(vals, 'sum') (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([-1, -0.77777777777777768, -0.55555555555555536, -0.33333333333333348, -0.11111111111111116, 0.11111111111111116, 0.33333333333333348, 0.55555555555555536, 0.

˓→77777777777777768, 1])) >>> g.aggregate(vals, 'min') (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([-1, -0.77777777777777779, -0.55555555555555558, -0.33333333333333337, -0.11111111111111116, 0.

˓→11111111111111116, 0.33333333333333326, 0.55555555555555536, 0.77777777777777768, 1]))

sum(self, values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and sum each group’s values. Parameters values (pdarray) – The values to group and sum Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_sums (pdarray) – One sum per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

104 Chapter 8. API Reference arkouda, Release 2020.07.07

Notes

The grouped sum of a boolean pdarray returns integers.

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.sum(b) (array([2, 3, 4]), array([8, 14, 6]))

prod(self, values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and compute the product of each group’s values. Parameters values (pdarray) – The values to group and multiply Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_products (pdarray, float64) – One product per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if prod is not supported for the values dtype

Notes

The return dtype is always float64.

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b (continues on next page)

8.1. arkouda 105 arkouda, Release 2020.07.07

(continued from previous page) array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.prod(b) (array([2, 3, 4]), array([12, 108.00000000000003, 8.9999999999999982]))

mean(self, values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and compute the mean of each group’s values. Parameters values (pdarray) – The values to group and average Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_means (pdarray, float64) – One mean value per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The return dtype is always float64.

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.mean(b) (array([2, 3, 4]), array([2.6666666666666665, 2.7999999999999998, 3]))

min(self, values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the mini- mum of each group’s values. Parameters values (pdarray) – The values to group and find minima Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_minima (pdarray) – One minimum per unique key in the GroupBy instance Raises

106 Chapter 8. API Reference arkouda, Release 2020.07.07

• TypeError – Raised if the values array is not a pdarray object or if min is not supported for the values dtype • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if min is not supported for the values dtype

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.min(b) (array([2, 3, 4]), array([1, 1, 3]))

max(self, values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the maxi- mum of each group’s values. Parameters values (pdarray) – The values to group and find maxima Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_maxima (pdarray) – One maximum per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object or if max is not supported for the values dtype • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if max is not supported for the values dtype

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) (continues on next page)

8.1. arkouda 107 arkouda, Release 2020.07.07

(continued from previous page) >>> g.max(b) (array([2, 3, 4]), array([4, 4, 3]))

argmin(self, values: arkouda.pdarrayclass.pdarray) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the location of the first minimum of each group’s values. Parameters values (pdarray) – The values to group and find argmin Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_argminima (pdarray, int64) – One index per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object or if argmax is not supported for the values dtype • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if argmin is not supported for the values dtype

Notes

The returned indices refer to the original values array as passed in, not the permutation applied by the GroupBy instance.

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.argmin(b) (array([2, 3, 4]), array([5, 4, 2]))

argmax(self, values: arkouda.pdarrayclass.pdarray) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the location of the first maximum of each group’s values. Parameters values (pdarray) – The values to group and find argmax Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_argmaxima (pdarray, int64) – One index per unique key in the GroupBy instance Raises

108 Chapter 8. API Reference arkouda, Release 2020.07.07

• TypeError – Raised if the values array is not a pdarray object or if argmax is not supported for the values dtype • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The returned indices refer to the original values array as passed in, not the permutation applied by the GroupBy instance.

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.argmax(b) (array([2, 3, 4]), array([9, 3, 2]))

nunique(self, values: groupable) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the number of unique values in each group. Parameters values (pdarray, int64) – The values to group and find unique values Returns • unique_keys (groupable) – The unique keys, in grouped order • group_nunique (groupable) – Number of unique values per unique key in the GroupBy instance Raises • TypeError – Raised if the dtype(s) of values array(s) does/do not support the nunique method • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if nunique is not supported for the values dtype

8.1. arkouda 109 arkouda, Release 2020.07.07

Examples

>>> data= ak.array([3,4,3,1,1,4,3,4,1,4]) >>> data array([3, 4, 3, 1, 1, 4, 3, 4, 1, 4]) >>> labels= ak.array([1,1,1,2,2,2,3,3,3,4]) >>> labels ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4]) >>> g= ak.GroupBy(labels) >>> g.keys ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4]) >>> g.nunique(data) array([1,2,3,4]), array([2, 2, 3, 1]) # Group (1,1,1) has values [3,4,3] -> there are 2 unique values 3&4 # Group (2,2,2) has values [1,1,4] -> 2 unique values 1&4 # Group (3,3,3) has values [3,4,1] -> 3 unique values # Group (4) has values [4] -> 1 unique value

any(self, values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and perform an “or” reduction on each group. Parameters values (pdarray, bool) – The values to group and reduce with “or” Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_any (pdarray, bool) – One bool per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not bool • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array all(self, values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and perform an “and” reduction on each group. Parameters values (pdarray, bool) – The values to group and reduce with “and” Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_any (pdarray, bool) – One bool per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not bool • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if all is not supported for the values dtype

110 Chapter 8. API Reference arkouda, Release 2020.07.07

OR(self, values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Bitwise OR of values in each segment. Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise OR reduction on each group. Parameters values (pdarray, int64) – The values to group and reduce with OR Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • result (pdarray, int64) – Bitwise OR of values in segments corresponding to keys Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64 • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if all is not supported for the values dtype AND(self, values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Bitwise AND of values in each segment. Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise AND reduction on each group. Parameters values (pdarray, int64) – The values to group and reduce with AND Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • result (pdarray, int64) – Bitwise AND of values in segments corresponding to keys Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64 • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if all is not supported for the values dtype XOR(self, values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Bitwise XOR of values in each segment. Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise XOR reduction on each group. Parameters values (pdarray, int64) – The values to group and reduce with XOR Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • result (pdarray, int64) – Bitwise XOR of values in segments corresponding to keys Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64

8.1. arkouda 111 arkouda, Release 2020.07.07

• ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if all is not supported for the values dtype broadcast(self, values: arkouda.pdarrayclass.pdarray, permute: bool = True) → arkouda.pdarrayclass.pdarray Fill each group’s segment with a constant value. Parameters • values (pdarray) – The values to put in each group’s segment • permute (bool) – If True (default), permute broadcast values back to the ordering of the original array on which GroupBy was called. If False, the broadcast values are grouped by value. Returns The broadcast values Return type pdarray Raises • TypeError – Raised if value is not a pdarray object • ValueError – Raised if the values array does not have one value per segment

Notes

This function is a sparse analog of np.broadcast. If a GroupBy object represents a sparse matrix (tensor), then this function takes a (dense) column vector and replicates each value to the non-zero elements in the corresponding row.

Examples

>>> a= ak.array([0,1,0,1,0]) >>> values= ak.array([3,5]) >>> g= ak.GroupBy(a) # By default, result is in original order >>> g.broadcast(values) array([3, 5, 3, 5, 3])

# With permute=False, result is in grouped order >>> g.broadcast(values, permute=False) array([3, 3, 3, 5, 5] >>> a= ak.randint(1,5,10) >>> a array([3, 1, 4, 4, 4, 1, 3, 3, 2, 2]) >>> g= ak.GroupBy(a) >>> keys,counts=g.count() >>> g.broadcast(counts>2) array([True False True True True False True True False False]) >>> g.broadcast(counts ==3) array([True False True True True False True True False False]) >>> g.broadcast(counts<4) array([True True True True True True True True True True])

112 Chapter 8. API Reference arkouda, Release 2020.07.07 arkouda.groupbyclass.broadcast(segments: arkouda.pdarrayclass.pdarray, values: arkouda.pdarrayclass.pdarray, size: Union[int, numpy.int64] = - 1, permutation: Union[arkouda.pdarrayclass.pdarray, None] = None) Broadcast a dense column vector to the rows of a sparse matrix or grouped array. Parameters • segments (pdarray, int64) – Offsets of the start of each row in the sparse matrixor grouped array. Must be sorted in ascending order. • values (pdarray) – The values to broadcast, one per row (or group) • size (int) – The total number of nonzeros in the matrix. If permutation is given, this argument is ignored and the size is inferred from the permutation array. • permutation (pdarray, int64) – The permutation to go from the original ordering of nonzeros to the ordering grouped by row. To broadcast values back to the original ordering, this permutation will be inverted. If no permutation is supplied, it is assumed that the original nonzeros were already grouped by row. In this case, the size argument must be given. Returns The broadcast values, one per nonzero Return type pdarray Raises ValueError – • If segments and values are different sizes • If segments are empty • If number of nonzeros (either user-specified or inferred from permutation) is less thanone

Examples

# Define a sparse matrix with 3 rows and 7 nonzeros >>> row_starts = ak.array([0, 2, 5]) >>> nnz =7#Broad- cast the row number to each nonzero element >>> row_number = ak.arange(3) >>> ak.broadcast(row_starts, row_number, nnz) array([0 0 1 1 1 2 2]) # If the original nonzeros were in reverse order... >>> permutation = ak.arange(6, -1, -1) >>> ak.broadcast(row_starts, row_number, permutation=permutation) array([2 2 1 1 1 0 0]) arkouda.infoclass

Module Contents

Functions

information(names: Union[List[str], str] = Regis- Returns JSON formatted string containing information teredSymbols) → str about the objects in names list_registry() → List[str] Return a list containing the names of all registered ob- jects list_symbol_table() → List[str] Return a list containing the names of all objects in the symbol table pretty_print_information(names: Prints verbose information for each object in names in a Union[List[str], str] = RegisteredSymbols) → None human readable format

8.1. arkouda 113 arkouda, Release 2020.07.07

Attributes

AllSymbols

RegisteredSymbols arkouda.infoclass.AllSymbols = __AllSymbols__ arkouda.infoclass.RegisteredSymbols = __RegisteredSymbols__ arkouda.infoclass.information(names: Union[List[str], str] = RegisteredSymbols) → str Returns JSON formatted string containing information about the objects in names Parameters names (Union[List[str], str]) – names is either the name of an object or list of names of objects to retrieve info if names is ak.AllSymbols, retrieves info for all symbols in the symbol table if names is ak.RegisteredSymbols, retrieves info for all symbols in the registry Returns JSON formatted string containing a list of information for each object in names Return type str Raises RuntimeError – Raised if a server-side error is thrown in the process of retrieving informa- tion about the objects in names arkouda.infoclass.list_registry() → List[str] Return a list containing the names of all registered objects Parameters None – Returns List of all object names in the registry Return type list Raises RuntimeError – Raised if there’s a server-side error thrown arkouda.infoclass.list_symbol_table() → List[str] Return a list containing the names of all objects in the symbol table Parameters None – Returns List of all object names in the symbol table Return type list Raises RuntimeError – Raised if there’s a server-side error thrown arkouda.infoclass.pretty_print_information(names: Union[List[str], str] = RegisteredSymbols) → None Prints verbose information for each object in names in a human readable format Parameters names (Union[List[str], str]) – names is either the name of an object or list of names of objects to retrieve info if names is ak.AllSymbols, retrieves info for all symbols in the symbol table if names is ak.RegisteredSymbols, retrieves info for all symbols in the registry Returns Return type None Raises RuntimeError – Raised if a server-side error is thrown in the process of retrieving informa- tion about the objects in names

114 Chapter 8. API Reference arkouda, Release 2020.07.07

arkouda.io_util

Module Contents

Functions

get_directory(path: str) → pathlib.Path Creates the directory if it does not exist and then write_line_to_file(path: str, line: str) → None Writes a line to the requested file. Note: if the file delimited_file_to_dict(path: str, delimiter: str = Returns a dictionary populated by lines from a file where ',') → Dict[str, str] dict_to_delimited_file(path: str, values: Map- Writes a dictionary to delimited lines in a file where ping[Any, Any], delimiter: str = ',') → None

arkouda.io_util.get_directory(path: str) → pathlib.Path Creates the directory if it does not exist and then returns the corresponding Path object Parameters path (str) – The path to the directory Returns Path object corresponding to the directory Return type str Raises ValueError – Raised if there’s an error in reading an existing directory or creating a new one arkouda.io_util.write_line_to_file(path: str, line: str) → None Writes a line to the requested file. Note: if the file does not exist, the file is created first and then the specified line is written to it. Parameters • path (str) – Path to the target file • line (str) – Line to be written to the file Returns Return type None Raises UnsupportedOption – Raised if there’s an error in creating or writing to the file arkouda.io_util.delimited_file_to_dict(path: str, delimiter: str = ',') → Dict[str, str] Returns a dictionary populated by lines from a file where the first delimited element of each line is the keyand the second delimited element is the value. Parameters • path (str) – Path to the file • delimiter (str) – Delimiter separating key and value Returns Dictionary containing key,value pairs derived from each line of delimited strings Return type Mapping[str,str] Raises UnsupportedOperation – Raised if there’s an error in reading the file arkouda.io_util.dict_to_delimited_file(path: str, values: Mapping[Any, Any], delimiter: str = ',') → None Writes a dictionary to delimited lines in a file where the first delimited element of each line is the dict keyand the second delimited element is the dict value. If the file does not exist, it is created and then written to.

8.1. arkouda 115 arkouda, Release 2020.07.07

Parameters • path (str) – Path to the file • delimiter – Delimiter separating key and value Returns Return type None Raises • OError – Raised if there’s an error opening or writing to the specified file • ValueError – Raised if the delimiter is not supported arkouda.join

Module Contents

Functions

join_on_eq_with_dt(a1: ark- Performs an inner-join on equality between two integer ouda.pdarrayclass.pdarray, a2: ark- arrays where ouda.pdarrayclass.pdarray, t1: ark- ouda.pdarrayclass.pdarray, t2: ark- ouda.pdarrayclass.pdarray, dt: Union[int, numpy.int64], pred: str, result_limit: Union[int, numpy.int64] = 1000) → Tuple[arkouda.pdarrayclass.pdarray, ark- ouda.pdarrayclass.pdarray]

arkouda.join.join_on_eq_with_dt(a1: arkouda.pdarrayclass.pdarray, a2: arkouda.pdarrayclass.pdarray, t1: arkouda.pdarrayclass.pdarray, t2: arkouda.pdarrayclass.pdarray, dt: Union[int, numpy.int64], pred: str, result_limit: Union[int, numpy.int64] = 1000) → Tuple[arkouda.pdarrayclass.pdarray, arkouda.pdarrayclass.pdarray] Performs an inner-join on equality between two integer arrays where the time-window predicate is also true Parameters • a1 (pdarray, int64) – pdarray to be joined • a2 (pdarray, int64) – pdarray to be joined • t1 (pdarray) – timestamps in millis corresponding to the a1 pdarray • t2 (pdarray,) – timestamps in millis corresponding to the a2 pdarray • dt (Union[int,np.int64]) – time delta • pred (str) – time window predicate • result_limit (Union[int,np.int64]) – size limit for returned result Returns • result_array_one (pdarray, int64) – a1 indices where a1 == a2 • result_array_one (pdarray, int64) – a2 indices where a2 == a1 Raises

116 Chapter 8. API Reference arkouda, Release 2020.07.07

• TypeError – Raised if a1, a2, t1, or t2 is not a pdarray, or if dt or result_limit is not an int • ValueError – if a1, a2, t1, or t2 dtype is not int64, pred is not ‘true_dt’, ‘abs_dt’, or ‘pos_dt’, or result_limit is < 0 arkouda.logger

Module Contents

Functions

enableVerbose() → None Enables verbose logging (DEBUG log level) for all Ark- oudaLoggers disableVerbose(logLevel: LogLevel = Disables verbose logging (DEBUG log level) for all Ark- LogLevel.INFO) → None oudaLoggers, setting arkouda.logger.enableVerbose() → None Enables verbose logging (DEBUG log level) for all ArkoudaLoggers arkouda.logger.disableVerbose(logLevel: LogLevel = LogLevel.INFO) → None Disables verbose logging (DEBUG log level) for all ArkoudaLoggers, setting the log level for each to the logLevel parameter Parameters logLevel (LogLevel) – The new log level, defaultts to LogLevel.INFO Raises TypeError – Raised if logLevel is not a LogLevel enum arkouda.message

Module Contents

Classes

MessageFormat Generic enumeration. MessageType Generic enumeration. RequestMessage

ReplyMessage

class arkouda.message.MessageFormat Bases: enum.Enum Generic enumeration. Derive from this class to define new enumerations. STRING = STRING BINARY = BINARY __str__(self ) → str Overridden method returns value, which is useful in outputting a MessageFormat object to JSON.

8.1. arkouda 117 arkouda, Release 2020.07.07

__repr__(self ) → str Overridden method returns value, which is useful in outputting a MessageFormat object to JSON. class arkouda.message.MessageType Bases: enum.Enum Generic enumeration. Derive from this class to define new enumerations. NORMAL = NORMAL WARNING = WARNING ERROR = ERROR __str__(self ) → str Overridden method returns value, which is useful in outputting a MessageType object to JSON. __repr__(self ) → str Overridden method returns value, which is useful in outputting a MessageType object to JSON. class arkouda.message.RequestMessage(user: str, cmd: str, token: str = None, format: MessageFormat = MessageFormat.STRING, args: str = None)

__slots = ['user', 'token', 'cmd', 'format', 'args'] user :str token :str cmd :str format :MessageFormat args :str asdict(self ) → Dict Overridden asdict implementation sets the values of non-required fields to an empty space (for Chapel JSON processing) and invokes str() on the format instance attribute. Returns A dict object encapsulating ReplyMessage state Return type Dict class arkouda.message.ReplyMessage

__slots__ = ['msg', 'msgType', 'user'] msg :str msgType :MessageType user :str static fromdict(values: Dict) → ReplyMessage Generates a ReplyMessage from a dict encapsulating the data and metadata from a reply returned by the Arkouda server. Parameters values (Dict) – The dict object encapsulating the fields required to instantiate a ReplyMessage Returns The ReplyMessage composed of values encapsulated within values dict Return type ReplyMessage

118 Chapter 8. API Reference arkouda, Release 2020.07.07

Raises ValueError – Raised if the values Dict is missing fields or contains malformed values arkouda.numeric

Module Contents

Functions

cast(pda: Union[arkouda.pdarrayclass.pdarray, Cast an array to another dtype. arkouda.strings.Strings], dt: Union[numpy.dtype, str]) → Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings] abs(pda: arkouda.pdarrayclass.pdarray) → ark- Return the element-wise absolute value of the array. ouda.pdarrayclass.pdarray log(pda: arkouda.pdarrayclass.pdarray) → ark- Return the element-wise natural log of the array. ouda.pdarrayclass.pdarray exp(pda: arkouda.pdarrayclass.pdarray) → ark- Return the element-wise exponential of the array. ouda.pdarrayclass.pdarray cumsum(pda: arkouda.pdarrayclass.pdarray) → ark- Return the cumulative sum over the array. ouda.pdarrayclass.pdarray cumprod(pda: arkouda.pdarrayclass.pdarray) → ark- Return the cumulative product over the array. ouda.pdarrayclass.pdarray sin(pda: arkouda.pdarrayclass.pdarray) → ark- Return the element-wise sine of the array. ouda.pdarrayclass.pdarray cos(pda: arkouda.pdarrayclass.pdarray) → ark- Return the element-wise cosine of the array. ouda.pdarrayclass.pdarray hash(pda: arkouda.pdarrayclass.pdarray, full: bool = Return an element-wise hash of the array. True) → Union[Tuple[arkouda.pdarrayclass.pdarray, arkouda.pdarrayclass.pdarray], ark- ouda.pdarrayclass.pdarray] where(condition: arkouda.pdarrayclass.pdarray, Returns an array with elements chosen from A and B A: Union[arkouda.dtypes.numeric_scalars, based upon a arkouda.pdarrayclass.pdarray], B: Union[arkouda.dtypes.numeric_scalars, arkouda.pdarrayclass.pdarray]) → ark- ouda.pdarrayclass.pdarray histogram(pda: arkouda.pdarrayclass.pdarray, Compute a histogram of evenly spaced bins over the bins: arkouda.dtypes.int_scalars = 10) → ark- range of an array. ouda.pdarrayclass.pdarray value_counts(pda: arkouda.pdarrayclass.pdarray) Count the occurrences of the unique values of an array. → Union[Categorical, Tu- ple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings], Op- tional[arkouda.pdarrayclass.pdarray]]] isnan(pda: arkouda.pdarrayclass.pdarray) → ark- Test a pdarray for Not a number / NaN values ouda.pdarrayclass.pdarray arkouda.numeric.cast(pda: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings], dt: Union[numpy.dtype, str]) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Cast an array to another dtype.

8.1. arkouda 119 arkouda, Release 2020.07.07

Parameters • pda (pdarray or Strings) – The array of values to cast • dtype (np.dtype or str) – The target dtype to cast values to Returns Array of values cast to desired dtype Return type pdarray or Strings

Notes

The cast is performed according to Chapel’s casting rules and is NOT safe from overflows or underflows. The user must ensure that the target dtype has the precision and capacity to hold the desired result.

Examples

>>> ak.cast(ak.linspace(1.0,5.0,5), dt=ak.int64) array([1, 2, 3, 4, 5])

>>> ak.cast(ak.arange(0,5), dt=ak.float64).dtype dtype('float64')

>>> ak.cast(ak.arange(0,5), dt=ak.bool) array([False, True, True, True, True])

>>> ak.cast(ak.linspace(0,4,5), dt=ak.bool) array([False, True, True, True, True]) arkouda.numeric.abs(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise absolute value of the array. Parameters pda (pdarray)– Returns A pdarray containing absolute values of the input array elements Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.abs(ak.arange(-5,-1)) array([5, 4, 3, 2])

>>> ak.abs(ak.linspace(-5,-1,5)) array([5, 4, 3, 2, 1]) arkouda.numeric.log(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise natural log of the array. Parameters pda (pdarray)– Returns A pdarray containing natural log values of the input array elements Return type pdarray

120 Chapter 8. API Reference arkouda, Release 2020.07.07

Raises TypeError – Raised if the parameter is not a pdarray

Notes

Logarithms with other bases can be computed as follows:

Examples

>>> A= ak.array([1, 10, 100]) # Natural log >>> ak.log(A) array([0, 2.3025850929940459, 4.6051701859880918]) # Log base 10 >>> ak.log(A)/ np.log(10) array([0, 1, 2]) # Log base 2 >>> ak.log(A)/ np.log(2) array([0, 3.3219280948873626, 6.6438561897747253]) arkouda.numeric.exp(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise exponential of the array. Parameters pda (pdarray)– Returns A pdarray containing exponential values of the input array elements Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.exp(ak.arange(1,5)) array([2.7182818284590451, 7.3890560989306504, 20.085536923187668, 54.

˓→598150033144236])

>>> ak.exp(ak.uniform(5,1.0,5.0)) array([11.84010843172504, 46.454368507659211, 5.5571769623557188, 33.494295836924771, 13.478894913238722]) arkouda.numeric.cumsum(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the cumulative sum over the array. The sum is inclusive, such that the i th element of the result is the sum of elements up to and including i. Parameters pda (pdarray)– Returns A pdarray containing cumulative sums for each element of the original pdarray Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

8.1. arkouda 121 arkouda, Release 2020.07.07

Examples

>>> ak.cumsum(ak.arange([1,5])) array([1, 3, 6])

>>> ak.cumsum(ak.uniform(5,1.0,5.0)) array([3.1598310770203937, 5.4110385860243131, 9.1622479306453748, 12.710615785506533, 13.945880905466208])

>>> ak.cumsum(ak.randint(0,1,5, dtype=ak.bool)) array([0, 1, 1, 2, 3]) arkouda.numeric.cumprod(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the cumulative product over the array. The product is inclusive, such that the i th element of the result is the product of elements up to and including i. Parameters pda (pdarray)– Returns A pdarray containing cumulative products for each element of the original pdarray Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.cumprod(ak.arange(1,5)) array([1, 2, 6, 24]))

>>> ak.cumprod(ak.uniform(5,1.0,5.0)) array([1.5728783400481925, 7.0472855509390593, 33.78523998586553, 134.05309592737584, 450.21589865655358]) arkouda.numeric.sin(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise sine of the array. Parameters pda (pdarray)– Returns A pdarray containing sin for each element of the original pdarray Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray arkouda.numeric.cos(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise cosine of the array. Parameters pda (pdarray)– Returns A pdarray containing cosine for each element of the original pdarray Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray arkouda.numeric.hash(pda: arkouda.pdarrayclass.pdarray, full: bool = True) → Union[Tuple[arkouda.pdarrayclass.pdarray, arkouda.pdarrayclass.pdarray], arkouda.pdarrayclass.pdarray] Return an element-wise hash of the array.

122 Chapter 8. API Reference arkouda, Release 2020.07.07

Parameters • pda (pdarray)– • full (bool) – By default, a 128-bit hash is computed and returned as two int64 arrays. If full=False, then a 64-bit hash is computed and returned as a single int64 array. Returns If full=True, a 2-tuple of pdarrays containing the high and low 64 bits of each hash, respec- tively. If full=False, a single pdarray containing a 64-bit hash Return type hashes Raises TypeError – Raised if the parameter is not a pdarray

Notes

This function uses the SIPhash algorithm, which can output either a 64-bit or 128-bit hash. However, the 64-bit hash runs a significant risk of collisions when applied to more than a few million unique values. Unlessthe number of unique values is known to be small, the 128-bit hash is strongly recommended. Note that this hash should not be used for security, or for any cryptographic application. Not only is SIPhash not intended for such uses, but this implementation employs a fixed key for the hash, which makes it possible foran adversary with control over input to engineer collisions. arkouda.numeric.where(condition: arkouda.pdarrayclass.pdarray, A: Union[arkouda.dtypes.numeric_scalars, arkouda.pdarrayclass.pdarray], B: Union[arkouda.dtypes.numeric_scalars, arkouda.pdarrayclass.pdarray]) → arkouda.pdarrayclass.pdarray Returns an array with elements chosen from A and B based upon a conditioning array. As is the case with numpy.where, the return array consists of values from the first array (A) where the conditioning array elements are True and from the second array (B) where the conditioning array elements are False. Parameters • condition (pdarray) – Used to choose values from A or B • A (Union[numeric_scalars, pdarray]) – Value(s) used when condition is True • B (Union[numeric_scalars, pdarray]) – Value(s) used when condition is False Returns Values chosen from A where the condition is True and B where the condition is False Return type pdarray Raises • TypeError – Raised if the condition object is not a pdarray, if A or B is not an int, np.int64, float, np.float64, or pdarray, if pdarray dtypes are not supported or do not match, ormultiple condition clauses (see Notes section) are applied • ValueError – Raised if the shapes of the condition, A, and B pdarrays are unequal

8.1. arkouda 123 arkouda, Release 2020.07.07

Examples

>>> a1= ak.arange(1,10) >>> a2= ak.ones(9, dtype=np.int64) >>> cond= a1<5 >>> ak.where(cond,a1,a2) array([1, 2, 3, 4, 1, 1, 1, 1, 1])

>>> a1= ak.arange(1,10) >>> a2= ak.ones(9, dtype=np.int64) >>> cond= a1 ==5 >>> ak.where(cond,a1,a2) array([1, 1, 1, 1, 5, 1, 1, 1, 1])

>>> a1= ak.arange(1,10) >>> a2= 10 >>> cond= a1<5 >>> ak.where(cond,a1,a2) array([1, 2, 3, 4, 10, 10, 10, 10, 10])

Notes

A and B must have the same dtype and only one conditional clause is supported e.g., n < 5, n > 1, which is supported in numpy is not currently supported in Arkouda arkouda.numeric.histogram(pda: arkouda.pdarrayclass.pdarray, bins: arkouda.dtypes.int_scalars = 10) → arkouda.pdarrayclass.pdarray Compute a histogram of evenly spaced bins over the range of an array. Parameters • pda (pdarray) – The values to histogram • bins (int_scalars) – The number of equal-size bins to use (default: 10) Returns The number of values present in each bin Return type pdarray, int64 or float64 Raises • TypeError – Raised if the parameter is not a pdarray or if bins is not an int. • ValueError – Raised if bins < 1 • NotImplementedError – Raised if pdarray dtype is bool or uint8 See also: value_counts

124 Chapter 8. API Reference arkouda, Release 2020.07.07

Notes

The bins are evenly spaced in the interval [pda.min(), pda.max()]. Currently, the user must re-compute the bin edges, e.g. with np.linspace (see below) in order to plot the histogram.

Examples

>>> import matplotlib.pyplot as plt >>> A= ak.arange(0, 10,1) >>> nbins=3 >>> h= ak.histogram(A, bins=nbins) >>> h array([3, 3, 4]) # Recreate the bin edges in NumPy >>> binEdges= np.linspace(A.min(), A.max(), nbins+1) >>> binEdges array([0., 3., 6., 9.]) # To plot, use only the left edges, and export the histogram to NumPy >>> plt.plot(binEdges[:-1], h.to_ndarray()) arkouda.numeric.value_counts(pda: arkouda.pdarrayclass.pdarray) → Union[Categorical, Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings], Optional[arkouda.pdarrayclass.pdarray]]] Count the occurrences of the unique values of an array. Parameters pda (pdarray, int64) – The array of values to count Returns • unique_values (pdarray, int64 or Strings) – The unique values, sorted in ascending order • counts (pdarray, int64) – The number of times the corresponding unique value occurs Raises TypeError – Raised if the parameter is not a pdarray See also: unique, histogram

Notes

This function differs from histogram() in that it only returns counts for values that are present, leaving out empty “bins”. This function delegates all logic to the unique() method where the return_counts parameter is set to True.

Examples

>>> A= ak.array([2,0,2,4,0,0]) >>> ak.value_counts(A) (array([0, 2, 4]), array([3, 2, 1])) arkouda.numeric.isnan(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Test a pdarray for Not a number / NaN values Currently only supports float-value-based arrays Parameters pda (pdarray to test)–

8.1. arkouda 125 arkouda, Release 2020.07.07

Returns Return type pdarray consisting of True / False values; True where NaN, False otherwise Raises • TypeError – Raised if the parameter is not a pdarray • RuntimeError – if the underlying pdarray is not float-based arkouda.pdarrayIO

Module Contents

Functions

ls_hdf (filename: str) → str This function calls the h5ls utility on a filename visible to the read_hdf (dsetName: str, filenames: Union[str, Read a single dataset from multiple HDF5 files into an List[str]], strictTypes: bool = True, allow_errors: Arkouda bool = False, calc_string_offsets: bool = False) → Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings] read_all(filenames: Union[str, List[str]], datasets: Read datasets from HDF5 files. Optional[Union[str, List[str]]] = None, itera- tive: bool = False, strictTypes: bool = True, al- low_errors: bool = False, calc_string_offsets=False) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Mapping[str, Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings]]] load(path_prefix: str, dataset: str = 'ar- Load a pdarray previously saved with pdarray. ray', calc_string_offsets: bool = False) → save(). Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings] get_datasets(filename: str) → List[str] Get the names of datasets in an HDF5 file. load_all(path_prefix: str) → Mapping[str, Load multiple pdarrays or Strings previously saved with Union[arkouda.pdarrayclass.pdarray, ark- save_all(). ouda.strings.Strings, arkouda.categorical.Categorical]] save_all(columns: Union[Mapping[str, Save multiple named pdarrays to HDF5 files. arkouda.pdarrayclass.pdarray], List[arkouda.pdarrayclass.pdarray]], prefix_path: str, names: List[str] = None, mode: str = 'truncate') → None

arkouda.pdarrayIO.ls_hdf(filename: str) → str This function calls the h5ls utility on a filename visible to the arkouda server. Parameters filename (str) – The name of the file to pass to h5ls Returns The string output of h5ls from the server Return type str Raises

126 Chapter 8. API Reference arkouda, Release 2020.07.07

• TypeError – Raised if filename is not a str • ValueError – Raised if filename is empty or contains only whitespace • RuntimeError – Raised if error occurs in executing ls on an HDF5 file arkouda.pdarrayIO.read_hdf(dsetName: str, filenames: Union[str, List[str]], strictTypes: bool = True, allow_errors: bool = False, calc_string_offsets: bool = False) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Read a single dataset from multiple HDF5 files into an Arkouda pdarray or Strings object. Parameters • dsetName (str) – The name of the dataset (must be the same across all files) • filenames (list or str) – Either a list of filenames or shell expression • strictTypes (bool) – If True (default), require all dtypes in all files to have the same precision and sign. If False, allow dtypes of different precision and sign across different files. For example, if one file contains a uint32 dataset and another contains an int64 dataset, the contents of both will be read into an int64 pdarray. • allow_errors (bool) – Default False, if True will allow files with read errors to be skipped instead of failing. A warning will be included in the return containing the total number of files skipped due to failure and up to 10 filenames. • calc_string_offsets (bool) – Default False, if True this will tell the server to calculate the offsets/segments array on the server versus loading them from HDF5 files. In thefuture this option may be set to True as the default. Returns A pdarray or Strings instance pointing to the server-side data Return type Union[pdarray,Strings] Raises • TypeError – Raised if dsetName is not a str or if filenames is neither a string nor a listof strings • ValueError – Raised if all datasets are not present in all hdf5 files • RuntimeError – Raised if one or more of the specified files cannot be opened See also: get_datasets, ls_hdf , read_all, load, save

Notes

If filenames is a string, it is interpreted as a shell expression (a single filename is a valid expression, soitwill work) and is expanded with glob to read all matching files. Use get_datasets to show the names of datasets in HDF5 files. If dsetName is not present in all files, a TypeError is raised. arkouda.pdarrayIO.read_all(filenames: Union[str, List[str]], datasets: Optional[Union[str, List[str]]] = None, iterative: bool = False, strictTypes: bool = True, allow_errors: bool = False, calc_string_offsets=False) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Mapping[str, Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]] Read datasets from HDF5 files. Parameters

8.1. arkouda 127 arkouda, Release 2020.07.07

• filenames (list or str) – Either a list of filenames or shell expression • datasets (list or str or None) – (List of) name(s) of dataset(s) to read (default: all available) • iterative (bool) – Iterative (True) or Single (False) function call(s) to server • strictTypes (bool) – If True (default), require all dtypes of a given dataset to have the same precision and sign. If False, allow dtypes of different precision and sign across different files. For example, if one file contains a uint32 dataset and another contains an int64 dataset with the same name, the contents of both will be read into an int64 pdarray. • allow_errors (bool) – Default False, if True will allow files with read errors to be skipped instead of failing. A warning will be included in the return containing the total number of files skipped due to failure and up to 10 filenames. • calc_string_offsets (bool) – Default False, if True this will tell the server to calculate the offsets/segments array on the server versus loading them from HDF5 files. In thefuture this option may be set to True as the default. Returns • For a single dataset returns an Arkouda pdarray or Arkouda Strings object • and for multiple datasets returns a dictionary of Arkouda pdarrays or • Arkouda Strings. – Dictionary of {datasetName: pdarray or String} Raises • ValueError – Raised if all datasets are not present in all hdf5 files or if one or more of the specified files do not exist • RuntimeError – Raised if one or more of the specified files cannot be opened. If al- low_errors is true this may be raised if no values are returned from the server. • TypeError – Raised if we receive an unknown arkouda_type returned from the server See also: read_hdf , get_datasets, ls_hdf

Notes

If filenames is a string, it is interpreted as a shell expression (a single filename is a valid expression, soitwill work) and is expanded with glob to read all matching files. If iterative == True each dataset name and file names are passed to the server as independent sequential strings while if iterative == False all dataset names and file names are passed to the server in a single string. If datasets is None, infer the names of datasets from the first file and read all of them. Use get_datasets to show the names of datasets to HDF5 files. arkouda.pdarrayIO.load(path_prefix: str, dataset: str = 'array', calc_string_offsets: bool = False) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Load a pdarray previously saved with pdarray.save(). Parameters • path_prefix (str) – Filename prefix used to save the original pdarray • dataset (str) – Dataset name where the pdarray was saved, defaults to ‘array’

128 Chapter 8. API Reference arkouda, Release 2020.07.07

• calc_string_offsets (bool) – If True the server will ignore Segmented Strings ‘offsets’ array and derive it from the null-byte terminators. Defaults to False currently Returns The pdarray or Strings that was previously saved Return type Union[pdarray, Strings] Raises • TypeError – Raised if either path_prefix or dataset is not astr • ValueError – Raised if the dataset is not present in all hdf5 files or if the path_prefix does not correspond to files accessible to Arkouda • RuntimeError – Raised if the hdf5 files are present but there is an error in opening oneor more of them See also: save, load_all, read_hdf , read_all arkouda.pdarrayIO.get_datasets(filename: str) → List[str] Get the names of datasets in an HDF5 file. Parameters filename (str) – Name of an HDF5 file visible to the arkouda server Returns Names of the datasets in the file Return type List[str] Raises • TypeError – Raised if filename is not a str • ValueError – Raised if filename is empty or contains only whitespace • RuntimeError – Raised if error occurs in executing ls on an HDF5 file See also: ls_hdf arkouda.pdarrayIO.load_all(path_prefix: str) → Mapping[str, Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, arkouda.categorical.Categorical]] Load multiple pdarrays or Strings previously saved with save_all(). Parameters path_prefix (str) – Filename prefix used to save the original pdarray Returns Dictionary of {datsetName: pdarray} with the previously saved pdarrays Return type Mapping[str,pdarray] Raises • TypeError: – Raised if path_prefix is not a str • ValueError – Raised if all datasets are not present in all hdf5 files or if the path_prefix does not correspond to files accessible to Arkouda • RuntimeError – Raised if the hdf5 files are present but there is an error in opening oneor more of them See also: save_all, load, read_hdf , read_all

8.1. arkouda 129 arkouda, Release 2020.07.07 arkouda.pdarrayIO.save_all(columns: Union[Mapping[str, arkouda.pdarrayclass.pdarray], List[arkouda.pdarrayclass.pdarray]], prefix_path: str, names: List[str] = None, mode: str = 'truncate') → None Save multiple named pdarrays to HDF5 files. Parameters • columns (dict or list of pdarrays) – Collection of arrays to save • prefix_path (str) – Directory and filename prefix for output files • names (list of str) – Dataset names for the pdarrays • mode ({'truncate' | 'append'}) – By default, truncate (overwrite) the output files if they exist. If ‘append’, attempt to create new dataset in existing files. Returns Return type None Raises ValueError – Raised if (1) the lengths of columns and values differ or (2) the mode is not ‘truncate’ or ‘append’ See also: save, load_all

Notes

Creates one file per locale containing that locale’s chunk of each pdarray. If columns is a dictionary, thekeys are used as the HDF5 dataset names. Otherwise, if no names are supplied, 0-up integers are used. By default, any existing files at path_prefix will be overwritten, unless the user specifies the ‘append’ mode, inwhichcase arkouda will attempt to add as new datasets to existing files. If the wrong number of files is present or dataset names already exist, a RuntimeError is raised. arkouda.pdarrayclass

Module Contents

Classes

pdarray The basic arkouda array class. This class contains only the

Functions

clear() → None Send a clear message to clear all unregistered data from the server symbol table any(pda: pdarray) → numpy.bool_ Return True iff any element of the array evaluates to True. all(pda: pdarray) → numpy.bool_ Return True iff all elements of the array evaluate to True. is_sorted(pda: pdarray) → numpy.bool_ Return True iff the array is monotonically non- decreasing. continues on next page

130 Chapter 8. API Reference arkouda, Release 2020.07.07

Table 20 – continued from previous page sum(pda: pdarray) → numpy.float64 Return the sum of all elements in the array. prod(pda: pdarray) → numpy.float64 Return the product of all elements in the array. Return value is min(pda: pdarray) → arkouda.dtypes.numpy_scalars Return the minimum value of the array. max(pda: pdarray) → arkouda.dtypes.numpy_scalars Return the maximum value of the array. argmin(pda: pdarray) → numpy.int64 Return the index of the first occurrence of the array min value. argmax(pda: pdarray) → numpy.int64 Return the index of the first occurrence of the array max value. mean(pda: pdarray) → numpy.float64 Return the mean of the array. var(pda: pdarray, ddof: arkouda.dtypes.int_scalars = 0) Return the variance of values in the array. → numpy.float64 std(pda: pdarray, ddof: arkouda.dtypes.int_scalars = 0) Return the standard deviation of values in the array. The → numpy.float64 standard mink(pda: pdarray, k: arkouda.dtypes.int_scalars) → Find the k minimum values of an array. pdarray maxk(pda: pdarray, k: arkouda.dtypes.int_scalars) → Find the k maximum values of an array. pdarray argmink(pda: pdarray, k: arkouda.dtypes.int_scalars) Finds the indices corresponding to the k minimum val- → pdarray ues of an array. argmaxk(pda: pdarray, k: arkouda.dtypes.int_scalars) Find the indices corresponding to the k maximum values → pdarray of an array. attach_pdarray(user_defined_name: str) → pdarray class method to return a pdarray attached to the regis- tered name in the arkouda unregister_pdarray_by_name(user_defined_name: Unregister a named pdarray in the arkouda server which str) → None was previously class arkouda.pdarrayclass.pdarray(name: str, mydtype: numpy.dtype, size: arkouda.dtypes.int_scalars, ndim: arkouda.dtypes.int_scalars, shape: Sequence[int], itemsize: arkouda.dtypes.int_scalars) The basic arkouda array class. This class contains only the attributies of the array; the data resides on the arkouda server. When a server operation results in a new array, arkouda will create a pdarray instance that points to the array data on the server. As such, the user should not initialize pdarray instances directly. name The server-side identifier for the array Type str dtype The element type of the array Type dtype size The number of elements in the array Type int_scalars ndim The rank of the array (currently only rank 1 arrays supported) Type int_scalars shape A list or tuple containing the sizes of each dimension of the array

8.1. arkouda 131 arkouda, Release 2020.07.07

Type Sequence[int] itemsize The size in bytes of each element Type int_scalars BinOps OpEqOps objtype = pdarray __array_priority__ = 1000 __del__(self ) __bool__(self ) → bool __len__(self ) __str__(self ) Return str(self). __repr__(self ) Return repr(self). format_other(self, other: object) → numpy.dtype Attempt to cast scalar other to the element dtype of this pdarray, and print the resulting value to a string (e.g. for sending to a server command). The user should not call this function directly. Parameters other (object) – The scalar to be cast to the pdarray.dtype Returns Return type np.dtype corresponding to the other parameter Raises TypeError – Raised if the other parameter cannot be converted to Numpy dtype _binop(self, other: pdarray, op: str) → pdarray Executes binary operation specified by the op string Parameters • other (pdarray) – The pdarray upon which the binop is to be executed • op (str) – The binop to be executed Returns A pdarray encapsulating the binop result Return type pdarray Raises • ValueError – Raised if the op is not within the pdarray.BinOps set, or if the pdarray sizes don’t match • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype _r_binop(self, other: pdarray, op: str) → pdarray Executes reverse binary operation specified by the op string Parameters • other (pdarray) – The pdarray upon which the reverse binop is to be executed • op (str) – The name of the reverse binop to be executed Returns A pdarray encapsulating the reverse binop result

132 Chapter 8. API Reference arkouda, Release 2020.07.07

Return type pdarray Raises • ValueError – Raised if the op is not within the pdarray.BinOps set • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype __add__(self, other) __radd__(self, other) __sub__(self, other) __rsub__(self, other) __mul__(self, other) __rmul__(self, other) __truediv__(self, other) __rtruediv__(self, other) __floordiv__(self, other) __rfloordiv__(self, other) __mod__(self, other) __rmod__(self, other) __lshift__(self, other) __rlshift__(self, other) __rshift__(self, other) __rrshift__(self, other) __and__(self, other) __rand__(self, other) __or__(self, other) __ror__(self, other) __xor__(self, other) __rxor__(self, other) __pow__(self, other) __rpow__(self, other) __lt__(self, other) Return selfvalue. __le__(self, other) Return self<=value. __ge__(self, other) Return self>=value. __eq__(self, other) Return self==value.

8.1. arkouda 133 arkouda, Release 2020.07.07

__ne__(self, other) Return self!=value. __neg__(self ) __invert__(self ) opeq(self, other, op) __iadd__(self, other) __isub__(self, other) __imul__(self, other) __itruediv__(self, other) __ifloordiv__(self, other) __ilshift__(self, other) __irshift__(self, other) __iand__(self, other) __ior__(self, other) __ixor__(self, other) __ipow__(self, other) abstract __iter__(self ) __getitem__(self, key) __setitem__(self, key, value) fill(self, value: arkouda.dtypes.numeric_scalars) → None Fill the array (in place) with a constant value. Parameters value (numeric_scalars)– Raises TypeError – Raised if value is not an int, int64, float, or float64 any(self ) → numpy.bool_ Return True iff any element of the array evaluates to True. all(self ) → numpy.bool_ Return True iff all elements of the array evaluate to True. is_registered(self ) → numpy.bool_ Return True iff the object is contained in the registry Parameters None – Returns Indicates if the object is contained in the registry Return type bool Raises RuntimeError – Raised if there’s a server-side error thrown _list_component_names(self ) → List[str] Internal Function that returns a list of all component names Parameters None – Returns List of all component names Return type List[str]

134 Chapter 8. API Reference arkouda, Release 2020.07.07

info(self ) → str Returns a JSON formatted string containing information about all components of self Parameters None – Returns JSON string containing information about all components of self Return type str pretty_print_info(self ) → None Prints information about all components of self in a human readable format Parameters None – Returns Return type None is_sorted(self ) → numpy.bool_ Return True iff the array is monotonically non-decreasing. Parameters None – Returns Indicates if the array is monotonically non-decreasing Return type bool Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown sum(self ) → arkouda.dtypes.numpy_scalars Return the sum of all elements in the array. prod(self ) → numpy.float64 Return the product of all elements in the array. Return value is always a np.float64 or np.int64. min(self ) → arkouda.dtypes.numpy_scalars Return the minimum value of the array. max(self ) → arkouda.dtypes.numpy_scalars Return the maximum value of the array. argmin(self ) → numpy.int64 Return the index of the first occurrence of the array min value argmax(self ) → numpy.int64 Return the index of the first occurrence of the array max value. mean(self ) → numpy.float64 Return the mean of the array. var(self, ddof: arkouda.dtypes.int_scalars = 0) → numpy.float64 Compute the variance. See arkouda.var for details. Parameters ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating var Returns The scalar variance of the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • ValueError – Raised if the ddof >= pdarray size

8.1. arkouda 135 arkouda, Release 2020.07.07

• RuntimeError – Raised if there’s a server-side error thrown std(self, ddof: arkouda.dtypes.int_scalars = 0) → numpy.float64 Compute the standard deviation. See arkouda.std for details. Parameters ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std Returns The scalar standard deviation of the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown mink(self, k: arkouda.dtypes.int_scalars) → pdarray Compute the minimum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray maxk(self, k: arkouda.dtypes.int_scalars) → pdarray Compute the maximum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray argmink(self, k: arkouda.dtypes.int_scalars) → pdarray Compute the minimum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns Indices corresponding to the maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray argmaxk(self, k: arkouda.dtypes.int_scalars) → pdarray Finds the indices corresponding to the maximum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns Indices corresponding to the maximum k values, sorted Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray to_ndarray(self ) → numpy.ndarray Convert the array to a np.ndarray, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised.

136 Chapter 8. API Reference arkouda, Release 2020.07.07

Returns A numpy ndarray with the same attributes and data as the pdarray Return type np.ndarray Raises RuntimeError – Raised if there is a server-side error thrown, if the pdarray size ex- ceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution. See also: array

Examples

>>> a= ak.arange(0,5,1) >>> a.to_ndarray() array([0, 1, 2, 3, 4])

>>> type(a.to_ndarray()) numpy.ndarray

to_cuda(self ) Convert the array to a Numba DeviceND array, transferring array data from the arkouda server to Python via ndarray. If the array exceeds a builtin size limit, a RuntimeError is raised. Returns A Numba ndarray with the same attributes and data as the pdarray; on GPU Return type numba.DeviceNDArray Raises • ImportError – Raised if CUDA is not available • ModuleNotFoundError – Raised if Numba is either not installed or not enabled • RuntimeError – Raised if there is a server-side error thrown in the course of retrieving the pdarray.

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution. See also: array

8.1. arkouda 137 arkouda, Release 2020.07.07

Examples

>>> a= ak.arange(0,5,1) >>> a.to_cuda() array([0, 1, 2, 3, 4])

>>> type(a.to_cuda()) numpy.devicendarray

save(self, prefix_path: str, dataset: str = 'array', mode: str = 'truncate') → str Save the pdarray to HDF5. The result is a collection of HDF5 files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the array to its corresponding file. Parameters • prefix_path (str) – Directory and filename prefix that all output files share • dataset (str) – Name of the dataset to create in HDF5 files (must not already exist) • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files. Returns Return type string message indicating result of save operation Raises • RuntimeError – Raised if a server-side error is thrown saving the pdarray • ValueError – Raised if there is an error in parsing the prefix path pointing to file write location or if the mode parameter is neither truncate nor append • TypeError – Raised if any one of the prefix_path, dataset, or mode parameters is nota string See also: save_all, load, read_hdf, read_all

Notes

The prefix_path must be visible to the arkouda server and the user must have write permission. Output files have names of the form _LOCALE.hdf, where ranges from 0 to numLocales. If any of the output files already exist and the mode is ‘truncate’, they will be overwrit- ten. If the mode is ‘append’ and the number of output files is less than the number of locales or adataset with the same name already exists, a RuntimeError will result.

138 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> a= ak.arange(0, 100,1) >>> a.save('arkouda_range', dataset='array')

Array is saved in numLocales files with names like tmp/arkouda_range_LOCALE0.hdf The array can be read back in as follows >>> b= ak.load( 'arkouda_range', dataset='array') >>> (a == b).all() True

register(self, user_defined_name: str) → pdarray Register this pdarray with a user defined name in the arkouda server so it can be attached to later using pdarray.attach() This is an in-place operation, registering a pdarray more than once will update the name in the registry and remove the previously registered name. A name can only be registered to one pdarray at a time. Parameters user_defined_name (str) – user defined name array is to be registered under Returns The same pdarray which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different pdarrays with the same name. Return type pdarray Raises • TypeError – Raised if user_defined_name is not a str • RegistrationError – If the server was unable to register the pdarray with the user_defined_name If the user is attempting to register more than one pdarray withthe same name, the former should be unregistered first to free up the registration name. See also: attach, unregister, is_registered, list_registry, unregister_pdarray_by_name

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a= zeros(100) >>> a.register("my_zeros") >>> # potentially disconnect from server and reconnect to server >>> b= ak.pdarray.attach("my_zeros") >>> # ...other work... >>> b.unregister()

unregister(self ) → None Unregister a pdarray in the arkouda server which was previously registered using register() and/or attahced to using attach() Returns

8.1. arkouda 139 arkouda, Release 2020.07.07

Return type None Raises RuntimeError – Raised if the server could not find the internal name/symbol to remove See also: register, unregister, is_registered, unregister_pdarray_by_name, list_registry

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a= zeros(100) >>> a.register("my_zeros") >>> # potentially disconnect from server and reconnect to server >>> b= ak.pdarray.attach("my_zeros") >>> # ...other work... >>> b.unregister()

static attach(user_defined_name: str) → pdarray class method to return a pdarray attached to the registered name in the arkouda server which was registered using register() Parameters user_defined_name (str) – user defined name which array was registered under Returns pdarray which is bound to corresponding server side component that was registered with user_defined_name Return type pdarray Raises TypeError – Raised if user_defined_name is not a str See also: register, unregister, is_registered, unregister_pdarray_by_name, list_registry

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a= zeros(100) >>> a.register("my_zeros") >>> # potentially disconnect from server and reconnect to server >>> b= ak.pdarray.attach("my_zeros") >>> # ...other work... >>> b.unregister() arkouda.pdarrayclass.clear() → None Send a clear message to clear all unregistered data from the server symbol table Returns

140 Chapter 8. API Reference arkouda, Release 2020.07.07

Return type None Raises RuntimeError – Raised if there is a server-side error in executing clear request arkouda.pdarrayclass.any(pda: pdarray) → numpy.bool_ Return True iff any element of the array evaluates to True. Parameters pda (pdarray) – The pdarray instance to be evaluated Returns Indicates if 1..n pdarray elements evaluate to True Return type bool Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.pdarrayclass.all(pda: pdarray) → numpy.bool_ Return True iff all elements of the array evaluate to True. Parameters pda (pdarray) – The pdarray instance to be evaluated Returns Indicates if all pdarray elements evaluate to True Return type bool Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.pdarrayclass.is_sorted(pda: pdarray) → numpy.bool_ Return True iff the array is monotonically non-decreasing. Parameters pda (pdarray) – The pdarray instance to be evaluated Returns Indicates if the array is monotonically non-decreasing Return type bool Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.pdarrayclass.sum(pda: pdarray) → numpy.float64 Return the sum of all elements in the array. Parameters pda (pdarray) – Values for which to calculate the sum Returns The sum of all elements in the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.pdarrayclass.prod(pda: pdarray) → numpy.float64 Return the product of all elements in the array. Return value is always a np.float64 or np.int64 Parameters pda (pdarray) – Values for which to calculate the product Returns The product calculated from the pda

8.1. arkouda 141 arkouda, Release 2020.07.07

Return type numpy_scalars Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.pdarrayclass.min(pda: pdarray) → arkouda.dtypes.numpy_scalars Return the minimum value of the array. Parameters pda (pdarray) – Values for which to calculate the min Returns The min calculated from the pda Return type numpy_scalars Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.pdarrayclass.max(pda: pdarray) → arkouda.dtypes.numpy_scalars Return the maximum value of the array. Parameters pda (pdarray) – Values for which to calculate the max Returns The max calculated from the pda Return type numpy_scalars Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.pdarrayclass.argmin(pda: pdarray) → numpy.int64 Return the index of the first occurrence of the array min value. Parameters pda (pdarray) – Values for which to calculate the argmin Returns The index of the argmin calculated from the pda Return type np.int64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.pdarrayclass.argmax(pda: pdarray) → numpy.int64 Return the index of the first occurrence of the array max value. Parameters pda (pdarray) – Values for which to calculate the argmax Returns The index of the argmax calculated from the pda Return type np.int64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.pdarrayclass.mean(pda: pdarray) → numpy.float64 Return the mean of the array.

142 Chapter 8. API Reference arkouda, Release 2020.07.07

Parameters pda (pdarray) – Values for which to calculate the mean Returns The mean calculated from the pda sum and size Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.pdarrayclass.var(pda: pdarray, ddof: arkouda.dtypes.int_scalars = 0) → numpy.float64 Return the variance of values in the array. Parameters • pda (pdarray) – Values for which to calculate the variance • ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating var Returns The scalar variance of the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • ValueError – Raised if the ddof >= pdarray size • RuntimeError – Raised if there’s a server-side error thrown See also: mean, std

Notes

The variance is the average of the squared deviations from the mean, i.e., var = mean((x - x.mean())**2). The mean is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. arkouda.pdarrayclass.std(pda: pdarray, ddof: arkouda.dtypes.int_scalars = 0) → numpy.float64 Return the standard deviation of values in the array. The standard deviation is implemented as the square root of the variance. Parameters • pda (pdarray) – values for which to calculate the standard deviation • ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std Returns The scalar standard deviation of the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance or ddof is not an integer • ValueError – Raised if ddof is an integer < 0 • RuntimeError – Raised if there’s a server-side error thrown

8.1. arkouda 143 arkouda, Release 2020.07.07

See also: mean, var

Notes

The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean((x - x.mean())**2)). The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se. arkouda.pdarrayclass.mink(pda: pdarray, k: arkouda.dtypes.int_scalars) → pdarray Find the k minimum values of an array. Returns the smallest k values of an array, sorted Parameters • pda (pdarray) – Input array. • k (int_scalars) – The desired count of minimum values to be returned by the output. Returns The minimum k values from pda, sorted Return type pdarray Raises • TypeError – Raised if pda is not a pdarray • ValueError – Raised if the pda is empty or k < 1

Notes

This call is equivalent in value to: a[ak.argsort(a)[:k]] and generally outperforms this operation. This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degredation has been observed.

Examples

>>> A= ak.array([10,5,1,3,7,2,9,0]) >>> ak.mink(A,3) array([0, 1, 2]) >>> ak.mink(A,4) array([0, 1, 2, 3])

arkouda.pdarrayclass.maxk(pda: pdarray, k: arkouda.dtypes.int_scalars) → pdarray Find the k maximum values of an array. Returns the largest k values of an array, sorted

144 Chapter 8. API Reference arkouda, Release 2020.07.07

Parameters • pda (pdarray) – Input array. • k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda, sorted Return type pdarray, int Raises • TypeError – Raised if pda is not a pdarray or k is not an integer • ValueError – Raised if the pda is empty or k < 1

Notes

This call is equivalent in value to: a[ak.argsort(a)[k:]] and generally outperforms this operation. This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degredation has been observed.

Examples

>>> A= ak.array([10,5,1,3,7,2,9,0]) >>> ak.maxk(A,3) array([7, 9, 10]) >>> ak.maxk(A,4) array([5, 7, 9, 10]) arkouda.pdarrayclass.argmink(pda: pdarray, k: arkouda.dtypes.int_scalars) → pdarray Finds the indices corresponding to the k minimum values of an array. Parameters • pda (pdarray) – Input array. • k (int_scalars) – The desired count of indices corresponding to minimum array values Returns The indices of the minimum k values from the pda, sorted Return type pdarray, int Raises • TypeError – Raised if pda is not a pdarray or k is not an integer • ValueError – Raised if the pda is empty or k < 1

8.1. arkouda 145 arkouda, Release 2020.07.07

Notes

This call is equivalent in value to: ak.argsort(a)[:k] and generally outperforms this operation. This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degradation has been observed.

Examples

>>> A= ak.array([10,5,1,3,7,2,9,0]) >>> ak.argmink(A,3) array([7, 2, 5]) >>> ak.argmink(A,4) array([7, 2, 5, 3]) arkouda.pdarrayclass.argmaxk(pda: pdarray, k: arkouda.dtypes.int_scalars) → pdarray Find the indices corresponding to the k maximum values of an array. Returns the largest k values of an array, sorted Parameters • pda (pdarray) – Input array. • k (int_scalars) – The desired count of indices corresponding to maxmum array values Returns The indices of the maximum k values from the pda, sorted Return type pdarray, int Raises • TypeError – Raised if pda is not a pdarray or k is not an integer • ValueError – Raised if the pda is empty or k < 1

Notes

This call is equivalent in value to: ak.argsort(a)[k:] and generally outperforms this operation. This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degradation has been observed.

146 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> A= ak.array([10,5,1,3,7,2,9,0]) >>> ak.argmaxk(A,3) array([4, 6, 0]) >>> ak.argmaxk(A,4) array([1, 4, 6, 0]) arkouda.pdarrayclass.attach_pdarray(user_defined_name: str) → pdarray class method to return a pdarray attached to the registered name in the arkouda server which was registered using register() Parameters user_defined_name (str) – user defined name which array was registered under Returns pdarray which is bound to corresponding server side component that was registered with user_defined_name Return type pdarray Raises TypeError – Raised if user_defined_name is not a str See also: register, unregister, is_registered, unregister_pdarray_by_name, list_registry

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a= zeros(100) >>> a.register("my_zeros") >>> # potentially disconnect from server and reconnect to server >>> b= ak.attach_pdarray("my_zeros") >>> # ...other work... >>> b.unregister() arkouda.pdarrayclass.unregister_pdarray_by_name(user_defined_name: str) → None Unregister a named pdarray in the arkouda server which was previously registered using register() and/or attahced to using attach_pdarray() Parameters user_defined_name (str) – user defined name which array was registered under Returns Return type None Raises RuntimeError – Raised if the server could not find the internal name/symbol to remove See also: register, unregister, is_registered, list_registry, attach

8.1. arkouda 147 arkouda, Release 2020.07.07

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a= zeros(100) >>> a.register("my_zeros") >>> # potentially disconnect from server and reconnect to server >>> b= ak.attach_pdarray("my_zeros") >>> # ...other work... >>> ak.unregister_pdarray_by_name(b) exception arkouda.pdarrayclass.RegistrationError Bases: Exception Error/Exception used when the Arkouda Server cannot register an object arkouda.pdarraycreation

Module Contents

Functions

from_series(series: pandas.Series, dtype: Converts a Pandas Series to an Arkouda pdarray or Optional[Union[type, str]] = None) → Strings object. If Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings] array(a: Union[arkouda.pdarrayclass.pdarray, Convert a Python or Numpy Iterable to a pdarray or numpy.ndarray, Iterable]) → Strings object, sending Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings] zeros(size: arkouda.dtypes.int_scalars, dtype: type = Create a pdarray filled with zeros. np.float64) → arkouda.pdarrayclass.pdarray ones(size: arkouda.dtypes.int_scalars, dtype: type = Create a pdarray filled with ones. float64) → arkouda.pdarrayclass.pdarray zeros_like(pda: arkouda.pdarrayclass.pdarray) → Create a zero-filled pdarray of the same size and dtype arkouda.pdarrayclass.pdarray as an existing ones_like(pda: arkouda.pdarrayclass.pdarray) → ark- Create a one-filled pdarray of the same size and dtype as ouda.pdarrayclass.pdarray an existing arange(*args) → arkouda.pdarrayclass.pdarray arange([start,] stop[, stride]) linspace(start: arkouda.dtypes.numeric_scalars, Create a pdarray of linearly-spaced floats in a closed in- stop: arkouda.dtypes.numeric_scalars, terval. length: arkouda.dtypes.int_scalars) → ark- ouda.pdarrayclass.pdarray randint(low: arkouda.dtypes.numeric_scalars, Generate a pdarray of randomized int, float, or bool val- high: arkouda.dtypes.numeric_scalars, size: ues in a arkouda.dtypes.int_scalars, dtype=int64, seed: arkouda.dtypes.int_scalars = None) → ark- ouda.pdarrayclass.pdarray continues on next page

148 Chapter 8. API Reference arkouda, Release 2020.07.07

Table 21 – continued from previous page uniform(size: arkouda.dtypes.int_scalars, low: Generate a pdarray with uniformly distributed random arkouda.dtypes.numeric_scalars = float(0.0), high: float values arkouda.dtypes.numeric_scalars = 1.0, seed: Union[None, arkouda.dtypes.int_scalars] = None) → arkouda.pdarrayclass.pdarray standard_normal(size: arkouda.dtypes.int_scalars, Draw real numbers from the standard normal distribu- seed: Union[None, arkouda.dtypes.int_scalars] = None) tion. → arkouda.pdarrayclass.pdarray random_strings_uniform(minlen: ark- Generate random strings with lengths uniformly dis- ouda.dtypes.int_scalars, maxlen: ark- tributed between ouda.dtypes.int_scalars, size: ark- ouda.dtypes.int_scalars, characters: str = 'uppercase', seed: Union[None, arkouda.dtypes.int_scalars] = None) → arkouda.strings.Strings random_strings_lognormal(logmean: ark- Generate random strings with log-normally distributed ouda.dtypes.numeric_scalars, logstd: ark- lengths and ouda.dtypes.numeric_scalars, size: ark- ouda.dtypes.int_scalars, characters: str = 'uppercase', seed: Optional[arkouda.dtypes.int_scalars] = None) → arkouda.strings.Strings arkouda.pdarraycreation.from_series(series: pandas.Series, dtype: Optional[Union[type, str]] = None) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Converts a Pandas Series to an Arkouda pdarray or Strings object. If dtype is None, the dtype is inferred from the Pandas Series. Otherwise, the dtype parameter is set if the dtype of the Pandas Series is to be overridden or is unknown (for example, in situations where the Series dtype is object). Parameters • series (Pandas Series) – The Pandas Series with a dtype of bool, float64, int64, or string • dtype (Optional[type]) – The valid dtype types are np.bool, np.float64, np.int64, and np.str Returns Return type Union[pdarray,Strings] Raises • TypeError – Raised if series is not a Pandas Series object • ValueError – Raised if the Series dtype is not bool, float64, int64, string, datetime, or timedelta

Examples

>>> ak.from_series(pd.Series(np.random.randint(0,10,5))) array([9, 0, 4, 7, 9])

>>> ak.from_series(pd.Series(['1', '2', '3', '4', '5']),dtype=np.int64) array([1, 2, 3, 4, 5])

>>> ak.from_series(pd.Series(np.random.uniform(low=0.0,high=1.0,size=3))) array([0.57600036956445599, 0.41619265571741659, 0.6615356693784662])

8.1. arkouda 149 arkouda, Release 2020.07.07

>>> ak.from_series(pd.Series(['0.57600036956445599', '0.41619265571741659', '0.6615356693784662']), dtype=np.float64) array([0.57600036956445599, 0.41619265571741659, 0.6615356693784662])

>>> ak.from_series(pd.Series(np.random.choice([True, False],size=5))) array([True, False, True, True, True])

>>> ak.from_series(pd.Series(['True', 'False', 'False', 'True', 'True']), dtype=np.

˓→bool) array([True, True, True, True, True])

>>> ak.from_series(pd.Series(['a', 'b', 'c', 'd', 'e'], dtype="string")) array(['a', 'b', 'c', 'd', 'e'])

>>> ak.from_series(pd.Series(['a', 'b', 'c', 'd', 'e']),dtype=np.str) array(['a', 'b', 'c', 'd', 'e'])

>>> ak.from_series(pd.Series(pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01

˓→')]))) array([1514764800000000000, 1514764800000000000])

Notes

The supported datatypes are bool, float64, int64, string, and datetime64[ns]. The data type is either inferred from the the Series or is set via the dtype parameter. Series of datetime or timedelta are converted to Arkouda arrays of dtype int64 (nanoseconds) A Pandas Series containing strings has a dtype of object. Arkouda assumes the Series contains strings and sets the dtype to str arkouda.pdarraycreation.array(a: Union[arkouda.pdarrayclass.pdarray, numpy.ndarray, Iterable]) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Convert a Python or Numpy Iterable to a pdarray or Strings object, sending the corresponding data to the arkouda server. Parameters a (Union[pdarray, np.ndarray]) – Rank-1 array of a supported dtype Returns A pdarray instance stored on arkouda server or Strings instance, which is composed of two pdarrays stored on arkouda server Return type pdarray or Strings Raises • TypeError – Raised if a is not a pdarray, np.ndarray, or Python Iterable such as a list, array, tuple, or deque • RuntimeError – Raised if a is not one-dimensional, nbytes > maxTransferBytes, a.dtype is not supported (not in DTypes), or if the product of a size and a.itemsize > maxTransferBytes • ValueError – Raised if the returned message is malformed or does not contain the fields required to generate the array. See also: pdarray.to_ndarray

150 Chapter 8. API Reference arkouda, Release 2020.07.07

Notes

The number of bytes in the input array cannot exceed arkouda.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overwhelming the connection between the Python client and the arkouda server, under the assumption that it is a low-bandwidth connection. The user may override this limit by setting ak.maxTransferBytes to a larger value, but should proceed with caution. If the pdrray or ndarray is of type U, this method is called twice recursively to create the Strings object and the two corresponding pdarrays for string bytes and offsets, respectively.

Examples

>>> ak.array(np.arange(1,10)) array([1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> ak.array(range(1,10)) array([1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> strings= ak.array([ 'string {}'.format(i) for i in range(0,5)]) >>> type(strings) arkouda.pdarraycreation.zeros(size: arkouda.dtypes.int_scalars, dtype: type = np.float64) → arkouda.pdarrayclass.pdarray Create a pdarray filled with zeros. Parameters • size (int_scalars) – Size of the array (only rank-1 arrays supported) • dtype (all_scalars) – Type of resulting array, default float64 Returns Zeros of the requested size and dtype Return type pdarray Raises TypeError – Raised if the supplied dtype is not supported or if the size parameter is neither an int nor a str that is parseable to an int. See also: ones, zeros_like

Examples

>>> ak.zeros(5, dtype=ak.int64) array([0, 0, 0, 0, 0])

>>> ak.zeros(5, dtype=ak.float64) array([0, 0, 0, 0, 0])

>>> ak.zeros(5, dtype=ak.bool) array([False, False, False, False, False])

8.1. arkouda 151 arkouda, Release 2020.07.07 arkouda.pdarraycreation.ones(size: arkouda.dtypes.int_scalars, dtype: type = float64) → arkouda.pdarrayclass.pdarray Create a pdarray filled with ones. Parameters • size (int_scalars) – Size of the array (only rank-1 arrays supported) • dtype (Union[float64, int64, bool]) – Resulting array type, default float64 Returns Ones of the requested size and dtype Return type pdarray Raises TypeError – Raised if the supplied dtype is not supported or if the size parameter is neither an int nor a str that is parseable to an int. See also: zeros, ones_like

Examples

>>> ak.ones(5, dtype=ak.int64) array([1, 1, 1, 1, 1])

>>> ak.ones(5, dtype=ak.float64) array([1, 1, 1, 1, 1])

>>> ak.ones(5, dtype=ak.bool) array([True, True, True, True, True]) arkouda.pdarraycreation.zeros_like(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Create a zero-filled pdarray of the same size and dtype as an existing pdarray. Parameters pda (pdarray) – Array to use for size and dtype Returns Equivalent to ak.zeros(pda.size, pda.dtype) Return type pdarray Raises TypeError – Raised if the pda parameter is not a pdarray. See also: zeros, ones_like

Examples

>>> zeros= ak.zeros(5, dtype=ak.int64) >>> ak.zeros_like(zeros) array([0, 0, 0, 0, 0])

>>> zeros= ak.zeros(5, dtype=ak.float64) >>> ak.zeros_like(zeros) array([0, 0, 0, 0, 0])

152 Chapter 8. API Reference arkouda, Release 2020.07.07

>>> zeros= ak.zeros(5, dtype=ak.bool) >>> ak.zeros_like(zeros) array([False, False, False, False, False]) arkouda.pdarraycreation.ones_like(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Create a one-filled pdarray of the same size and dtype as an existing pdarray. Parameters pda (pdarray) – Array to use for size and dtype Returns Equivalent to ak.ones(pda.size, pda.dtype) Return type pdarray Raises TypeError – Raised if the pda parameter is not a pdarray. See also: ones, zeros_like

Notes

Logic for generating the pdarray is delegated to the ak.ones method. Accordingly, the supported dtypes match are defined by the ak.ones method.

Examples

>>> ones= ak.ones(5, dtype=ak.int64) >>> ak.ones_like(ones) array([1, 1, 1, 1, 1])

>>> ones= ak.ones(5, dtype=ak.float64) >>> ak.ones_like(ones) array([1, 1, 1, 1, 1])

>>> ones= ak.ones(5, dtype=ak.bool) >>> ak.ones_like(ones) array([True, True, True, True, True]) arkouda.pdarraycreation.arange(*args) → arkouda.pdarrayclass.pdarray arange([start,] stop[, stride]) Create a pdarray of consecutive integers within the interval [start, stop). If only one arg is given then arg is the stop parameter. If two args are given, then the first arg is start and second is stop. If three args are given, then the first arg is start, second is stop, third is stride. Parameters • start (int_scalars, optional) – Starting value (inclusive) • stop (int_scalars) – Stopping value (exclusive) • stride (int_scalars, optional) – The difference between consecutive elements, the default stride is 1, if stride is specified then start must also be specified. Returns Integers from start (inclusive) to stop (exclusive) by stride Return type pdarray, int64

8.1. arkouda 153 arkouda, Release 2020.07.07

Raises • TypeError – Raised if start, stop, or stride is not an int object • ZeroDivisionError – Raised if stride == 0 See also: linspace, zeros, ones, randint

Notes

Negative strides result in decreasing values. Currently, only int64 pdarrays can be created with this method. For float64 arrays, use the linspace method.

Examples

>>> ak.arange(0,5,1) array([0, 1, 2, 3, 4])

>>> ak.arange(5,0,-1) array([5, 4, 3, 2, 1])

>>> ak.arange(0, 10,2) array([0, 2, 4, 6, 8])

>>> ak.arange(-5,-10,-1) array([-5, -6, -7, -8, -9]) arkouda.pdarraycreation.linspace(start: arkouda.dtypes.numeric_scalars, stop: arkouda.dtypes.numeric_scalars, length: arkouda.dtypes.int_scalars) → arkouda.pdarrayclass.pdarray Create a pdarray of linearly-spaced floats in a closed interval. Parameters • start (numeric_scalars) – Start of interval (inclusive) • stop (numeric_scalars) – End of interval (inclusive) • length (int_scalars) – Number of points Returns Array of evenly spaced float values along the interval Return type pdarray, float64 Raises TypeError – Raised if start or stop is not a float or int or if length is not anint See also: arange

154 Chapter 8. API Reference arkouda, Release 2020.07.07

Notes

If that start is greater than stop, the pdarray values are generated in descending order.

Examples

>>> ak.linspace(0,1,5) array([0, 0.25, 0.5, 0.75, 1])

>>> ak.linspace(start=1, stop=0, length=5) array([1, 0.75, 0.5, 0.25, 0])

>>> ak.linspace(start=-5, stop=0, length=5) array([-5, -3.75, -2.5, -1.25, 0]) arkouda.pdarraycreation.randint(low: arkouda.dtypes.numeric_scalars, high: arkouda.dtypes.numeric_scalars, size: arkouda.dtypes.int_scalars, dtype=int64, seed: arkouda.dtypes.int_scalars = None) → arkouda.pdarrayclass.pdarray Generate a pdarray of randomized int, float, or bool values in a specified range bounded by the lowandhigh parameters. Parameters • low (numeric_scalars) – The low value (inclusive) of the range • high (numeric_scalars) – The high value (exclusive for int, inclusive for float) of the range • size (int_scalars) – The length of the returned array • dtype (Union[int64, float64, bool]) – The dtype of the array • seed (int_scalars) – Index for where to pull the first returned value Returns Values drawn uniformly from the specified range having the desired dtype Return type pdarray Raises • TypeError – Raised if dtype.name not in DTypes, size is not an int, low or high is not an int or float, or seed is not anint • ValueError – Raised if size < 0 or if high < low

Notes

Calling randint with dtype=float64 will result in uniform non-integral floating point values.

8.1. arkouda 155 arkouda, Release 2020.07.07

Examples

>>> ak.randint(0, 10,5) array([5, 7, 4, 8, 3])

>>> ak.randint(0,1,3, dtype=ak.float64) array([0.92176432277231968, 0.083130710959903542, 0.68894208386667544])

>>> ak.randint(0,1,5, dtype=ak.bool) array([True, False, True, True, True])

>>> ak.randint(1,5, 10, seed=2) array([4, 3, 1, 3, 4, 4, 2, 4, 3, 2])

>>> ak.randint(1,5,3, dtype=ak.float64, seed=2) array([2.9160772326374946, 4.353429832157099, 4.5392023718621486])

>>> ak.randint(1,5, 10, dtype=ak.bool, seed=2) array([False, True, True, True, True, False, True, True, True, True]) arkouda.pdarraycreation.uniform(size: arkouda.dtypes.int_scalars, low: arkouda.dtypes.numeric_scalars = float(0.0), high: arkouda.dtypes.numeric_scalars = 1.0, seed: Union[None, arkouda.dtypes.int_scalars] = None) → arkouda.pdarrayclass.pdarray Generate a pdarray with uniformly distributed random float values in a specified range. Parameters • low (float_scalars) – The low value (inclusive) of the range, defaults to 0.0 • high (float_scalars) – The high value (inclusive) of the range, defaults to 1.0 • size (int_scalars) – The length of the returned array • seed (int_scalars, optional) – Value used to initialize the random number generator Returns Values drawn uniformly from the specified range Return type pdarray, float64 Raises • TypeError – Raised if dtype.name not in DTypes, size is not an int, or if either low or high is not an int or float • ValueError – Raised if size < 0 or if high < low

Notes

The logic for uniform is delegated to the ak.randint method which is invoked with a dtype of float64

156 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> ak.uniform(3) array([0.92176432277231968, 0.083130710959903542, 0.68894208386667544])

>>> ak.uniform(size=3,low=0,high=5,seed=0) array([0.30013431967121934, 0.47383036230759112, 1.0441791878997098]) arkouda.pdarraycreation.standard_normal(size: arkouda.dtypes.int_scalars, seed: Union[None, arkouda.dtypes.int_scalars] = None) → arkouda.pdarrayclass.pdarray Draw real numbers from the standard normal distribution. Parameters • size (int_scalars) – The number of samples to draw (size of the returned array) • seed (int_scalars) – Value used to initialize the random number generator Returns The array of random numbers Return type pdarray, float64 Raises • TypeError – Raised if size is not an int • ValueError – Raised if size < 0 See also: randint

Notes

For random samples from 푁(휇, 휎2), use: (sigma * standard_normal(size)) + mu

Examples

>>> ak.standard_normal(3,1) array([-0.68586185091150265, 1.1723810583573375, 0.567584107142031]) arkouda.pdarraycreation.random_strings_uniform(minlen: arkouda.dtypes.int_scalars, maxlen: arkouda.dtypes.int_scalars, size: arkouda.dtypes.int_scalars, characters: str = 'uppercase', seed: Union[None, arkouda.dtypes.int_scalars] = None) → arkouda.strings.Strings Generate random strings with lengths uniformly distributed between minlen and maxlen, and with characters drawn from a specified set. Parameters • minlen (int_scalars) – The minimum allowed length of string • maxlen (int_scalars) – The maximum allowed length of string

8.1. arkouda 157 arkouda, Release 2020.07.07

• size (int_scalars) – The number of strings to generate • characters ((uppercase, lowercase, numeric, printable, binary)) – The set of characters to draw from • seed (Union[None, int_scalars], optional) – Value used to initialize the random number generator Returns The array of random strings Return type Strings Raises ValueError – Raised if minlen < 0, maxlen < minlen, or size < 0 See also: random_strings_lognormal, randint

Examples

>>> ak.random_strings_uniform(minlen=1, maxlen=5, seed=1, size=5) array(['TVKJ', 'EWAB', 'CO', 'HFMD', 'U'])

>>> ak.random_strings_uniform(minlen=1, maxlen=5, seed=1, size=5, ... characters='printable') array(['+5"f', '-P]3', '4k', '~HFF', 'F']) arkouda.pdarraycreation.random_strings_lognormal(logmean: arkouda.dtypes.numeric_scalars, logstd: arkouda.dtypes.numeric_scalars, size: arkouda.dtypes.int_scalars, characters: str = 'uppercase', seed: Optional[arkouda.dtypes.int_scalars] = None) → arkouda.strings.Strings Generate random strings with log-normally distributed lengths and with characters drawn from a specified set. Parameters • logmean (numeric_scalars) – The log-mean of the length distribution • logstd (numeric_scalars) – The log-standard-deviation of the length distribution • size (int_scalars) – The number of strings to generate • characters ((uppercase, lowercase, numeric, printable, binary)) – The set of characters to draw from • seed (int_scalars, optional) – Value used to initialize the random number generator Returns The Strings object encapsulating a pdarray of random strings Return type Strings Raises • TypeError – Raised if logmean is neither a float nor a int, logstd is not a float, size isnot an int, or if characters is not a str • ValueError – Raised if logstd <= 0 or size < 0 See also: random_strings_lognormal, randint

158 Chapter 8. API Reference arkouda, Release 2020.07.07

Notes

The lengths of the generated strings are distributed $Lognormal(mu, sigma^2)$, with 휇 = 푙표푔푚푒푎푛 and 휎 = 푙표푔푠푡푑. Thus, the strings will have an average length of 푒푥푝(휇 + 0.5 * 휎2), a minimum length of zero, and a heavy tail towards longer strings.

Examples

>>> ak.random_strings_lognormal(2, 0.25,5, seed=1) array(['TVKJTE', 'ABOCORHFM', 'LUDMMGTB', 'KWOQNPHZ', 'VSXRRL'])

>>> ak.random_strings_lognormal(2, 0.25,5, seed=1, characters= 'printable') array(['+5"fp-', ']3Q4kC~HF', '=F=`,IE!', 'DjkBa'9(', '5oZ1)=']) arkouda.pdarraysetops

Module Contents

Functions

unique(pda: Union[arkouda.pdarrayclass.pdarray, ark- Find the unique elements of an array. ouda.strings.Strings, Categorical], return_counts: bool = False) → Union[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], Tu- ple[Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings, Categorical], Op- tional[arkouda.pdarrayclass.pdarray]]] in1d(pda1: Union[arkouda.pdarrayclass.pdarray, Test whether each element of a 1-D array is also present arkouda.strings.Strings, Categorical], pda2: in a second array. Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings, Categorical], invert: bool = False) → arkouda.pdarrayclass.pdarray concatenate(arrays: Se- Concatenate a list or tuple of pdarray or Strings ob- quence[Union[arkouda.pdarrayclass.pdarray, ark- jects into ouda.strings.Strings, Categorical]], ordered: bool = True) → Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings, Categorical] union1d(pda1: arkouda.pdarrayclass.pdarray, Find the union of two arrays. pda2: arkouda.pdarrayclass.pdarray) → ark- ouda.pdarrayclass.pdarray intersect1d(pda1: arkouda.pdarrayclass.pdarray, Find the intersection of two arrays. pda2: arkouda.pdarrayclass.pdarray, assume_unique: bool = False) → arkouda.pdarrayclass.pdarray setdiff1d(pda1: arkouda.pdarrayclass.pdarray, pda2: Find the set difference of two arrays. arkouda.pdarrayclass.pdarray, assume_unique: bool = False) → arkouda.pdarrayclass.pdarray setxor1d(pda1: arkouda.pdarrayclass.pdarray, pda2: Find the set exclusive-or (symmetric difference) of two arkouda.pdarrayclass.pdarray, assume_unique: bool = arrays. False) → arkouda.pdarrayclass.pdarray

8.1. arkouda 159 arkouda, Release 2020.07.07 arkouda.pdarraysetops.unique(pda: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], return_counts: bool = False) → Union[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], Optional[arkouda.pdarrayclass.pdarray]]] Find the unique elements of an array. Returns the unique elements of an array, sorted if the values are integers. There is an optional output in addition to the unique elements: the number of times each unique value comes up in the input array. Parameters • pda (pdarray or Strings or Categorical) – Input array. • return_counts (bool, optional) – If True, also return the number of times each unique item appears in pda. Returns • unique (pdarray or Strings) – The unique values. If input dtype is int64, return values will be sorted. • unique_counts (pdarray, optional) – The number of times each of the unique values comes up in the original array. Only provided if return_counts is True. Raises • TypeError – Raised if pda is not a pdarray or Strings object • RuntimeError – Raised if the pdarray or Strings dtype is unsupported

Notes

For integer arrays, this function checks to see whether pda is sorted and, if so, whether it is already unique. This step can save considerable computation. Otherwise, this function will sort pda.

Examples

>>> A= ak.array([3,2,1,1,2,3]) >>> ak.unique(A) array([1, 2, 3]) arkouda.pdarraysetops.in1d(pda1: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], pda2: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], invert: bool = False) → arkouda.pdarrayclass.pdarray Test whether each element of a 1-D array is also present in a second array. Returns a boolean array the same length as pda1 that is True where an element of pda1 is in pda2 and False otherwise. Parameters • pda1 (pdarray or Strings or Categorical) – Input array. • pda2 (pdarray or Strings or Categorical) – The values against which to test each value of pda1. Must be the same type as pda1.

160 Chapter 8. API Reference arkouda, Release 2020.07.07

• invert (bool, optional) – If True, the values in the returned array are inverted (that is, False where an element of pda1 is in pda2 and True otherwise). Default is False. ak. in1d(a, b, invert=True) is equivalent to (but is faster than) ~ak.in1d(a, b). Returns The values pda1[in1d] are in pda2. Return type pdarray, bool Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray, Strings, or Categorical object or if invert is not a bool • RuntimeError – Raised if the dtype of either array is not supported See also: unique, intersect1d, union1d

Notes

in1d can be considered as an element-wise function version of the python keyword in, for 1-D sequences. in1d(a, b) is logically equivalent to ak.array([item in b for item in a]), but is much faster and scales to arbitrarily large a. ak.in1d is not supported for bool or float64 pdarrays

Examples

>>> ak.in1d(ak.array([-1,0,1]), ak.array([-2,0,2])) array([False, True, False])

>>> ak.in1d(ak.array(['one','two']),ak.array(['two', 'three','four','five'])) array([False, True]) arkouda.pdarraysetops.concatenate(arrays: Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]], ordered: bool = True) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical] Concatenate a list or tuple of pdarray or Strings objects into one pdarray or Strings object, respectively. Parameters • arrays (Sequence[Union[pdarray,Strings,Categorical]]) – The arrays to con- catenate. Must all have same dtype. • ordered (bool) – If True (default), the arrays will be appended in the order given. If False, array data may be interleaved in blocks, which can greatly improve performance but results in non-deterministic ordering of elements. Returns Single pdarray or Strings object containing all values, returned in the original order Return type Union[pdarray,Strings,Categorical] Raises • ValueError – Raised if arrays is empty or if 1..n pdarrays have differing dtypes • TypeError – Raised if arrays is not a pdarrays or Strings python Sequence such as a list or tuple

8.1. arkouda 161 arkouda, Release 2020.07.07

• RuntimeError – Raised if 1..n array elements are dtypes for which concatenate has not been implemented.

Examples

>>> ak.concatenate([ak.array([1,2,3]), ak.array([4,5,6])]) array([1, 2, 3, 4, 5, 6])

>>> ak.concatenate([ak.array([True,False,True]),ak.array([False,True,True])]) array([True, False, True, False, True, True])

>>> ak.concatenate([ak.array(['one','two']),ak.array(['three','four','five'])]) array(['one', 'two', 'three', 'four', 'five']) arkouda.pdarraysetops.union1d(pda1: arkouda.pdarrayclass.pdarray, pda2: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Find the union of two arrays. Return the unique, sorted array of values that are in either of the two input arrays. Parameters • pda1 (pdarray) – Input array • pda2 (pdarray) – Input array Returns Unique, sorted union of the input arrays. Return type pdarray Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray • RuntimeError – Raised if the dtype of either array is not supported See also: intersect1d, unique

Notes

ak.union1d is not supported for bool or float64 pdarrays

Examples

>>> ak.union1d(ak.array([-1,0,1]), ak.array([-2,0,2])) array([-2, -1, 0, 1, 2]) arkouda.pdarraysetops.intersect1d(pda1: arkouda.pdarrayclass.pdarray, pda2: arkouda.pdarrayclass.pdarray, assume_unique: bool = False) → arkouda.pdarrayclass.pdarray Find the intersection of two arrays. Return the sorted, unique values that are in both of the input arrays. Parameters

162 Chapter 8. API Reference arkouda, Release 2020.07.07

• pda1 (pdarray) – Input array • pda2 (pdarray) – Input array • assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False. Returns Sorted 1D array of common and unique elements. Return type pdarray Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray • RuntimeError – Raised if the dtype of either pdarray is not supported See also: unique, union1d

Notes

ak.intersect1d is not supported for bool or float64 pdarrays

Examples

>>> ak.intersect1d([1,3,4,3], [3,1,2,1]) array([1, 3]) arkouda.pdarraysetops.setdiff1d(pda1: arkouda.pdarrayclass.pdarray, pda2: arkouda.pdarrayclass.pdarray, assume_unique: bool = False) → arkouda.pdarrayclass.pdarray Find the set difference of two arrays. Return the sorted, unique values in pda1 that are not in pda2. Parameters • pda1 (pdarray) – Input array. • pda2 (pdarray) – Input comparison array. • assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False. Returns Sorted 1D array of values in pda1 that are not in pda2. Return type pdarray Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray • RuntimeError – Raised if the dtype of either pdarray is not supported See also: unique, setxor1d

8.1. arkouda 163 arkouda, Release 2020.07.07

Notes

ak.setdiff1d is not supported for bool or float64 pdarrays

Examples

>>> a= ak.array([1,2,3,2,4,1]) >>> b= ak.array([3,4,5,6]) >>> ak.setdiff1d(a, b) array([1, 2]) arkouda.pdarraysetops.setxor1d(pda1: arkouda.pdarrayclass.pdarray, pda2: arkouda.pdarrayclass.pdarray, assume_unique: bool = False) → arkouda.pdarrayclass.pdarray Find the set exclusive-or (symmetric difference) of two arrays. Return the sorted, unique values that are in only one (not both) of the input arrays. Parameters • pda1 (pdarray) – Input array. • pda2 (pdarray) – Input array. • assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False. Returns Sorted 1D array of unique values that are in only one of the input arrays. Return type pdarray Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray • RuntimeError – Raised if the dtype of either pdarray is not supported

Notes

ak.setxor1d is not supported for bool or float64 pdarrays

Examples

>>> a= ak.array([1,2,3,2,4]) >>> b= ak.array([2,3,5,7,5]) >>> ak.setxor1d(a,b) array([1, 4, 5, 7])

164 Chapter 8. API Reference arkouda, Release 2020.07.07 arkouda.security

Module Contents

Functions

generate_token(length: int = 32) → str Uses the secrets.token_hex() method to generate a get_home_directory() → str A platform-independent means of finding path to get_arkouda_client_directory() → pathlib.Path A platform-independent means of finding path to get_username() → str A platform-independent means of retrieving the current generate_username_token_json(token: str) → str Generates a JSON object encapsulating the user's user- name

Attributes

username_tokenizer

arkouda.security.username_tokenizer arkouda.security.generate_token(length: int = 32) → str Uses the secrets.token_hex() method to generate a a hexidecimal token Parameters length (int) – The desired length of token Returns The hexidecimal string generated by Python Return type str

Notes

This method uses the Python secrets.token_hex method arkouda.security.get_home_directory() → str A platform-independent means of finding path to the current user’s home directory Returns The user’s home directory path Return type str

Notes

This method uses the Python os.path.expanduser method to retrieve the user’s home directory arkouda.security.get_arkouda_client_directory() → pathlib.Path A platform-independent means of finding path to the current user’s .arkouda directory where artifacts such as server access tokens are stored. Returns Path corresponding to the user’s .arkouda directory path Return type Path

8.1. arkouda 165 arkouda, Release 2020.07.07

Notes

The default implementation is to place the .arkouda directory in the current user’s home directory. The default can be overridden by setting the ARKOUDA_CLIENT_DIRECTORY environment variable. It is important this is not the same location as the server’s token directory as the file format is different. arkouda.security.get_username() → str A platform-independent means of retrieving the current user’s username for the host system. Returns The username in the form of string Return type str Raises EnvironmentError – Raised if the host OS is unsupported

Notes

The currently supported operating systems are Windows, Linux, and MacOS AKA Darwin arkouda.security.generate_username_token_json(token: str) → str Generates a JSON object encapsulating the user’s username and token for connecting to an arkouda server with basic authentication enabled Parameters token (string) – The token to be used to access arkouda server Returns The JSON-formatted string encapsulating username and token Return type str arkouda.sorting

Module Contents

Functions

argsort(pda: Union[arkouda.pdarrayclass.pdarray, Return the permutation that sorts the array. arkouda.strings.Strings, ark- ouda.categorical.Categorical]) → ark- ouda.pdarrayclass.pdarray coargsort(arrays: Se- Return the permutation that groups the rows (left-to- quence[Union[arkouda.strings.Strings, right), if the arkouda.pdarrayclass.pdarray, ark- ouda.categorical.Categorical]]) → ark- ouda.pdarrayclass.pdarray sort(pda: arkouda.pdarrayclass.pdarray) → ark- Return a sorted copy of the array. Only sorts numeric ouda.pdarrayclass.pdarray arrays; arkouda.sorting.argsort(pda: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, arkouda.categorical.Categorical]) → arkouda.pdarrayclass.pdarray Return the permutation that sorts the array. Parameters pda (pdarray or Strings or Categorical) – The array to sort (int64 or float64) Returns The indices such that pda[indices] is sorted Return type pdarray, int64

166 Chapter 8. API Reference arkouda, Release 2020.07.07

Raises TypeError – Raised if the parameter is other than a pdarray or Strings See also: coargsort

Notes

Uses a least-significant-digit radix sort, which is stable and resilient to non-uniformity in data but communication intensive.

Examples

>>> a= ak.randint(0, 10, 10) >>> perm= ak.argsort(a) >>> a[perm] array([0, 1, 1, 3, 4, 5, 7, 8, 8, 9]) arkouda.sorting.coargsort(arrays: Sequence[Union[arkouda.strings.Strings, arkouda.pdarrayclass.pdarray, arkouda.categorical.Categorical]]) → arkouda.pdarrayclass.pdarray Return the permutation that groups the rows (left-to-right), if the input arrays are treated as columns. The permu- tation sorts numeric columns, but not strings/Categoricals – strings/Categoricals are grouped, but not ordered. Parameters arrays (Sequence[Union[Strings, pdarray, Categorical]]) – The columns (int64, float64, Strings, or Categorical) to sort by row Returns The indices that permute the rows to grouped order Return type pdarray, int64 Raises ValueError – Raised if the pdarrays are not of the same size or if the parameter is not an Iterable containing pdarrays, Strings, or Categoricals See also: argsort

Notes

Uses a least-significant-digit radix sort, which is stable and resilient to non-uniformity in data but communication intensive. Starts with the last array and moves forward. This sort operates directly on numeric types, but for Strings, it operates on a hash. Thus, while grouping of equivalent strings is guaranteed, lexicographic ordering of the groups is not. For Categoricals, coargsort sorts based on Categorical.codes which guarantees grouping of equivalent categories but not lexicographic ordering of those groups.

Examples

>>> a= ak.array([0,1,0,1]) >>> b= ak.array([1,1,0,0]) >>> perm= ak.coargsort([a, b]) >>> perm array([2, 0, 3, 1]) >>> a[perm] array([0, 0, 1, 1]) (continues on next page)

8.1. arkouda 167 arkouda, Release 2020.07.07

(continued from previous page) >>> b[perm] array([0, 1, 0, 1]) arkouda.sorting.sort(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return a sorted copy of the array. Only sorts numeric arrays; for Strings, use argsort. Parameters pda (pdarray or Categorical) – The array to sort (int64 or float64) Returns The sorted copy of pda Return type pdarray, int64 or float64 Raises • TypeError – Raised if the parameter is not a pdarray • ValueError – Raised if sort attempted on a pdarray with an unsupported dtype such as bool See also: argsort

Notes

Uses a least-significant-digit radix sort, which is stable and resilient to non-uniformity in data but communication intensive.

Examples

>>> a= ak.randint(0, 10, 10) >>> sorted= ak.sort(a) >>> a array([0, 1, 1, 3, 4, 5, 7, 8, 8, 9]) arkouda.strings

Module Contents

Classes

Strings Represents an array of strings whose data resides on the class arkouda.strings.Strings(offset_attrib: Union[arkouda.pdarrayclass.pdarray, str], bytes_attrib: Union[arkouda.pdarrayclass.pdarray, str]) Represents an array of strings whose data resides on the arkouda server. The user should not call this class directly; rather its instances are created by other arkouda functions. offsets The starting indices for each string Type pdarray bytes

168 Chapter 8. API Reference arkouda, Release 2020.07.07

The raw bytes of all strings, joined by nulls Type pdarray size The number of strings in the array Type int_scalars nbytes The total number of bytes in all strings Type int_scalars ndim The rank of the array (currently only rank 1 arrays supported) Type int_scalars shape The sizes of each dimension of the array Type tuple dtype The dtype is ak.str Type dtype regex_dict Dictionary storing information on matches (cache of Strings.find_locations(pattern)) Keys - regex patterns Values - tuples of pdarrays (numMatches, matchStarts, matchLens) Type Dict[str, Tuple[pdarray, pdarray, pdarray]] logger Used for all logging operations Type ArkoudaLogger

Notes

Strings is composed of two pdarrays: (1) offsets, which contains the starting indices for each string and (2)bytes, which contains the raw bytes of all strings, delimited by nulls. BinOps objtype = str abstract __iter__(self ) __len__(self ) → int __str__(self ) → str Return str(self). __repr__(self ) → str Return repr(self). _binop(self, other: Union[Strings, arkouda.dtypes.str_scalars], op: str) → arkouda.pdarrayclass.pdarray Executes the requested binop on this Strings instance and the parameter Strings object and returns the results within a pdarray object. Parameters

8.1. arkouda 169 arkouda, Release 2020.07.07

• other (Strings, str_scalars) – the other object is a Strings object • op (str) – name of the binary operation to be performed Returns encapsulating the results of the requested binop Return type pdarray Raises • ValueError – Raised if (1) the op is not in the self.BinOps set, or (2) if the sizes of this and the other instance don’t match, or (3) the other object is not a Strings object • RuntimeError – Raised if a server-side error is thrown while executing the binary opera- tion __eq__(self, other) → bool Return self==value. __ne__(self, other) → bool Return self!=value. __getitem__(self, key) get_lengths(self ) → arkouda.pdarrayclass.pdarray Return the length of each string in the array. Returns The length of each string Return type pdarray, int Raises RuntimeError – Raised if there is a server-side error thrown cached_regex_patterns(self ) Returns the regex patterns for which Strings.find_locations(pattern) have been cached find_locations(self, pattern: Union[bytes, arkouda.dtypes.str_scalars]) → Tuple[arkouda.pdarrayclass.pdarray, arkouda.pdarrayclass.pdarray, arkouda.pdarrayclass.pdarray]

Finds pattern matches and returns pdarrays containing the number, start postitions, and lengths of matches Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Parameters pattern (str_scalars) – The regex pattern used to find matches Returns • pdarray, int64 – For each original string, the number of pattern matches • pdarray, int64 – The start positons of pattern matches • pdarray, int64 – The lengths of pattern matches Raises • TypeError – Raised if the pattern parameter is not bytes or str_scalars • ValueError – Rasied if pattern is not a valid regex • RuntimeError – Raised if there is a server-side error thrown

See also: Strings.findall, Strings.match

170 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> strings= ak.array([ '{} string {}'.format(i, i) for i in range(1,6)]) >>> strings array(['1 string 1', '2 string 2', '3 string 3', '4 string 4', '5 string 5']) >>> num_matches, starts, lens= strings.find_locations( '\d') >>> num_matches array([2, 2, 2, 2, 2]) >>> starts array([0, 9, 11, 20, 22, 31, 33, 42, 44, 53]) >>> lens array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]))

findall(self, pattern: Union[bytes, arkouda.dtypes.str_scalars], return_match_origins: bool = False) → Union[Strings, Tuple]

Return all non-overlapping matches of pattern in Strings as a new Strings object Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Parameters • pattern (str_scalars) – The regex pattern used to find matches • return_match_origins (bool) – If True, return a pdarray containing the index of the original string each pattern match is from Returns • Strings – Strings object containing only pattern matches • pdarray, int64 (optional) – The index of the original string each pattern match is from Raises • TypeError – Raised if the pattern parameter is not bytes or str_scalars • ValueError – Rasied if pattern is not a valid regex • RuntimeError – Raised if there is a server-side error thrown

See also: Strings.find_locations, Strings.match

Examples

>>> strings= ak.array([ '{} string {}'.format(i, i) for i in range(1,6)]) >>> strings array(['1 string 1', '2 string 2', '3 string 3', '4 string 4', '5 string 5']) >>> matches, match_origins= strings.findall( '\d', return_match_origins= True) >>> matches array(['1', '1', '2', '2', '3', '3', '4', '4', '5', '5']) >>> match_origins array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4])

8.1. arkouda 171 arkouda, Release 2020.07.07

contains(self, substr: Union[bytes, arkouda.dtypes.str_scalars], regex: bool = False) → arkouda.pdarrayclass.pdarray Check whether each element contains the given substring. Parameters • substr (str_scalars) – The substring in the form of string or byte array to search for • regex (bool) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns True for elements that contain substr, False otherwise Return type pdarray, bool Raises • TypeError – Raised if the substr parameter is not bytes or str_scalars • ValueError – Rasied if substr is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: Strings.startswith, Strings.endswith

Examples

>>> strings= ak.array([ '{} string {}'.format(i, i) for i in range(1,6)]) >>> strings array(['1 string 1', '2 string 2', '3 string 3', '4 string 4', '5 string 5']) >>> strings.contains('string') array([True, True, True, True, True]) >>> strings.contains('string\d ', regex=True) array([True, True, True, True, True])

startswith(self, substr: Union[bytes, arkouda.dtypes.str_scalars], regex: bool = False) → arkouda.pdarrayclass.pdarray Check whether each element starts with the given substring. Parameters • substr (Union[bytes, str_scalars]) – The prefix to search for • regex (bool) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns True for elements that start with substr, False otherwise Return type pdarray, bool Raises • TypeError – Raised if the substr parameter is not a bytes ior str_scalars • ValueError – Rasied if substr is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: Strings.contains, Strings.endswith

172 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> strings_end= ak.array([ 'string {}'.format(i) for i in range(1,6)]) >>> strings_end array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5']) >>> strings_end.startswith('string') array([True, True, True, True, True]) >>> strings_start= ak.array([ '{} string'.format(i) for i in range(1,6)]) >>> strings_start array(['1 string', '2 string', '3 string', '4 string', '5 string']) >>> strings_start.startswith('\d str', regex= True) array([True, True, True, True, True])

endswith(self, substr: Union[bytes, arkouda.dtypes.str_scalars], regex: bool = False) → arkouda.pdarrayclass.pdarray Check whether each element ends with the given substring. Parameters • substr (Union[bytes, str_scalars]) – The suffix to search for • regex (bool) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns True for elements that end with substr, False otherwise Return type pdarray, bool Raises • TypeError – Raised if the substr parameter is not bytes or str_scalars • ValueError – Rasied if substr is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: Strings.contains, Strings.startswith

Examples

>>> strings_start= ak.array([ '{} string'.format(i) for i in range(1,6)]) >>> strings_start array(['1 string', '2 string', '3 string', '4 string', '5 string']) >>> strings_start.endswith('ing') array([True, True, True, True, True]) >>> strings_end= ak.array([ 'string {}'.format(i) for i in range(1,6)]) >>> strings_end array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5']) >>> strings_end.endswith('ing\d ', regex= True) array([True, True, True, True, True])

match(self, pattern: Union[bytes, arkouda.dtypes.str_scalars]) → arkouda.pdarrayclass.pdarray For each element check whether the entire element matches the given regex, pattern. Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

8.1. arkouda 173 arkouda, Release 2020.07.07

Parameters pattern (str_scalars) – The regex in the form of string or byte array to search for Returns True for elements that match pattern, False otherwise Return type pdarray, bool Raises • TypeError – Raised if the pattern parameter is not bytes or str_scalars • ValueError – Rasied if pattern is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: Strings.contains, Strings.startswith, Strings.endswith

Examples

>>> strings= ak.array([ '{} string {}'.format(i, i) for i in range(1,6)]) >>> strings array(['1 string 1', '2 string 2', '3 string 3', '4 string 4', '5 string 5']) >>> strings.match('\d string\d ') array([True, True, True, True, True]) >>> strings.match('ing\d ') array([False, False, False, False, False])

flatten(self, delimiter: str, return_segments: bool = False, regex: bool = False) → Union[Strings, Tuple] Unpack delimiter-joined substrings into a flat array. Parameters • delimiter (str) – Characters used to split strings into substrings • return_segments (bool) – If True, also return mapping of original strings to first sub- string in return array. • regex (bool) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns • Strings – Flattened substrings with delimiters removed • pdarray, int64 (optional) – For each original string, the index of first corresponding sub- string in the return array See also: peel, rpeel

174 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> orig= ak.array([ 'one|two', 'three|four|five', 'six']) >>> orig.flatten('|') array(['one', 'two', 'three', 'four', 'five', 'six']) >>> flat, map= orig.flatten( '|', return_segments=True) >>> map array([0, 2, 5]) >>> under= ak.array([ 'one_two', 'three_____four____five', 'six']) >>> under_flat, under_map= under.flatten( '_+', return_segments=True,␣

˓→regex=True) >>> under_flat array(['one', 'two', 'three', 'four', 'five', 'six']) >>> under_map array([0, 2, 5])

peel(self, delimiter: Union[bytes, arkouda.dtypes.str_scalars], times: arkouda.dtypes.int_scalars = 1, includeDelimiter: bool = False, keepPartial: bool = False, fromRight: bool = False, regex: bool = False) → Tuple Peel off one or more delimited fields from each string (similar to string.partition), returning two newarrays of strings. Warning: This function is experimental and not guaranteed to work. Parameters • delimiter (Union[bytes, str_scalars]) – The separator where the split will occur • times (Union[int, np.int64]) – The number of times the delimiter is sought, i.e. skip over the first (times-1) delimiters • includeDelimiter (bool) – If true, append the delimiter to the end of the first return array. By default, it is prepended to the beginning of the second return array. • keepPartial (bool) – If true, a string that does not contain instances of the delimiter will be returned in the first array. By default, such strings are returned in the second array. • fromRight (bool) – If true, peel from the right instead of the left (see also rpeel) • regex (bool) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns left: Strings The field(s) peeled from the end of each string (unless fromRight istrue) right: Strings The remainder of each string after peeling (unless fromRight is true) Return type Tuple[Strings, Strings] Raises • TypeError – Raised if the delimiter parameter is not byte or str_scalars, if times is not int64, or if includeDelimiter, keepPartial, or fromRight is not bool • ValueError – Raised if times is < 1 or if delimiter is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: rpeel, stick, lstick

8.1. arkouda 175 arkouda, Release 2020.07.07

Examples

>>> s= ak.array([ 'a.b', 'c.d', 'e.f.g']) >>> s.peel('.') (array(['a', 'c', 'e']), array(['b', 'd', 'f.g'])) >>> s.peel('.', includeDelimiter=True) (array(['a.', 'c.', 'e.']), array(['b', 'd', 'f.g'])) >>> s.peel('.', times=2) (array(['', '', 'e.f']), array(['a.b', 'c.d', 'g'])) >>> s.peel('.', times=2, keepPartial=True) (array(['a.b', 'c.d', 'e.f']), array(['', '', 'g']))

rpeel(self, delimiter: Union[bytes, arkouda.dtypes.str_scalars], times: arkouda.dtypes.int_scalars = 1, includeDelimiter: bool = False, keepPartial: bool = False, regex: bool = False) Peel off one or more delimited fields from the end of each string (similar to string.rpartition), returningtwo new arrays of strings. Warning: This function is experimental and not guaranteed to work. Parameters • delimiter (Union[bytes, str_scalars]) – The separator where the split will occur • times (Union[int, np.int64]) – The number of times the delimiter is sought, i.e. skip over the last (times-1) delimiters • includeDelimiter (bool) – If true, prepend the delimiter to the start of the first return array. By default, it is appended to the end of the second return array. • keepPartial (bool) – If true, a string that does not contain instances of the delimiter will be returned in the second array. By default, such strings are returned in the first array. • regex (bool) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns left: Strings The remainder of the string after peeling right: Strings The field(s) that were peeled from the right of each string Return type Tuple[Strings, Strings] Raises • TypeError – Raised if the delimiter parameter is not bytes or str_scalars or if times is not int64 • ValueError – Raised if times is < 1 or if delimiter is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: peel, stick, lstick

176 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> s= ak.array([ 'a.b', 'c.d', 'e.f.g']) >>> s.rpeel('.') (array(['a', 'c', 'e.f']), array(['b', 'd', 'g'])) # Compared against peel >>> s.peel('.') (array(['a', 'c', 'e']), array(['b', 'd', 'f.g']))

stick(self, other: Strings, delimiter: Union[bytes, arkouda.dtypes.str_scalars] = '', toLeft: bool = False) → Strings Join the strings from another array onto one end of the strings of this array, optionally inserting a delimiter. Warning: This function is experimental and not guaranteed to work. Parameters • other (Strings) – The strings to join onto self’s strings • delimiter (str) – String inserted between self and other • toLeft (bool) – If true, join other strings to the left of self. By default, other is joined to the right of self. Returns The array of joined strings Return type Strings Raises • TypeError – Raised if the delimiter parameter is not bytes or str_scalars or if the other parameter is not a Strings instance • ValueError – Raised if times is < 1 • RuntimeError – Raised if there is a server-side error thrown See also: lstick, peel, rpeel

Examples

>>> s= ak.array([ 'a', 'c', 'e']) >>> t= ak.array([ 'b', 'd', 'f']) >>> s.stick(t, delimiter='.') array(['a.b', 'c.d', 'e.f'])

__add__(self, other: Strings) → Strings lstick(self, other: Strings, delimiter: Union[bytes, arkouda.dtypes.str_scalars] = '') → Strings Join the strings from another array onto the left of the strings of this array, optionally inserting a delimiter. Warning: This function is experimental and not guaranteed to work. Parameters • other (Strings) – The strings to join onto self’s strings • delimiter (Union[bytes,str_scalars]) – String inserted between self and other Returns The array of joined strings, as other + self Return type Strings

8.1. arkouda 177 arkouda, Release 2020.07.07

Raises • TypeError – Raised if the delimiter parameter is neither bytes nor a str or if the other parameter is not a Strings instance • RuntimeError – Raised if there is a server-side error thrown See also: stick, peel, rpeel

Examples

>>> s= ak.array([ 'a', 'c', 'e']) >>> t= ak.array([ 'b', 'd', 'f']) >>> s.lstick(t, delimiter='.') array(['b.a', 'd.c', 'f.e'])

__radd__(self, other: Strings) → Strings hash(self ) → Tuple[arkouda.pdarrayclass.pdarray, arkouda.pdarrayclass.pdarray] Compute a 128-bit hash of each string. Returns A tuple of two int64 pdarrays. The ith hash value is the concatenation of the ith values from each array. Return type Tuple[pdarray,pdarray]

Notes

The implementation uses SipHash128, a fast and balanced (used by Python for dictionaries and sets). For realistic numbers of strings (up to about 10**15), the probability of a collision between two 128-bit hash values is negligible. group(self ) → arkouda.pdarrayclass.pdarray Return the permutation that groups the array, placing equivalent strings together. All instances of the same string are guaranteed to lie in one contiguous block of the permuted array, but the blocks are not necessarily ordered. Returns The permutation that groups the array by value Return type pdarray See also: GroupBy, unique

Notes

If the arkouda server is compiled with “-sSegmentedArray.useHash=true”, then arkouda uses 128-bit hash values to group strings, rather than sorting the strings directly. This method is fast, but the resulting permu- tation merely groups equivalent strings and does not sort them. If the “useHash” parameter is false, then a full sort is performed. Raises RuntimeError – Raised if there is a server-side error in executing group request or cre- ating the pdarray encapsulating the return message

178 Chapter 8. API Reference arkouda, Release 2020.07.07

to_ndarray(self ) → numpy.ndarray Convert the array to a np.ndarray, transferring array data from the arkouda server to Python. If the array exceeds a built-in size limit, a RuntimeError is raised. Returns A numpy ndarray with the same strings as this array Return type np.ndarray

Notes

The number of bytes in the array cannot exceed arkouda.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.maxTransferBytes to a larger value, but proceed with caution. See also: array

Examples

>>> a= ak.array(["hello","my","world"]) >>> a.to_ndarray() array(['hello', 'my', 'world'], dtype='>> type(a.to_ndarray()) numpy.ndarray

save(self, prefix_path: str, dataset: str = 'strings_array', mode: str = 'truncate', save_offsets: bool = True) → str Save the Strings object to HDF5. The result is a collection of HDF5 files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the Strings array toits corresponding file. Parameters • prefix_path (str) – Directory and filename prefix that all output files share • dataset (str) – The name of the Strings dataset to be written, defaults to strings_array • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, create a new Strings dataset within existing files. • save_offsets (bool) – Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read. Returns Return type String message indicating result of save operation Raises • ValueError – Raised if the lengths of columns and values differ, or the mode is neither ‘truncate’ nor ‘append’ • TypeError – Raised if prefix_path, dataset, or mode is not astr

8.1. arkouda 179 arkouda, Release 2020.07.07

See also: pdarrayIO.save

Notes

Important implementation notes: (1) Strings state is saved as two datasets within an hdf5 group: one for the string characters and one for the segments corresponding to the start of each string, (2) the hdf5 group is named via the dataset parameter. is_registered(self ) → numpy.bool_ Return True iff the object is contained in the registry Parameters None – Returns Indicates if the object is contained in the registry Return type bool Raises RuntimeError – Raised if there’s a server-side error thrown _list_component_names(self ) → List[str] Internal Function that returns a list of all component names Parameters None – Returns List of all component names Return type List[str] info(self ) → str Returns a JSON formatted string containing information about all components of self Parameters None – Returns JSON string containing information about all components of self Return type str pretty_print_info(self ) → None Prints information about all components of self in a human readable format Parameters None – Returns Return type None register(self, user_defined_name: str) → Strings Register this Strings object with a user defined name in the arkouda server so it can be attached tolater using Strings.attach() This is an in-place operation, registering a Strings object more than once will update the name in the registry and remove the previously registered name. A name can only be registered to one object at a time. Parameters user_defined_name (str) – user defined name which the Strings object is tobe registered under Returns The same Strings object which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support afluid programming style. Please note you cannot register two different objects with the same name. Return type Strings Raises

180 Chapter 8. API Reference arkouda, Release 2020.07.07

• TypeError – Raised if user_defined_name is not a str • RegistrationError – If the server was unable to register the Strings object with the user_defined_name If the user is attempting to register more than one object with thesame name, the former should be unregistered first to free up the registration name. See also: attach, unregister

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered. unregister(self ) → None Unregister a Strings object in the arkouda server which was previously registered using register() and/or attached to using attach() Returns Return type None Raises RuntimeError – Raised if the server could not find the internal name/symbol to remove See also: register, attach

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered. static attach(user_defined_name: str) → Strings class method to return a Strings object attached to the registered name in the arkouda server which was registered using register() Parameters user_defined_name (str) – user defined name which the Strings object was reg- istered under Returns the Strings object registered with user_defined_name in the arkouda server Return type Strings object Raises TypeError – Raised if user_defined_name is not a str See also: register, unregister

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered. static unregister_strings_by_name(user_defined_name: str) → None Unregister a Strings object in the arkouda server previously registered via register() Parameters user_defined_name (str) – The registered name of the Strings object See also: register, unregister, attach, is_registered

8.1. arkouda 181 arkouda, Release 2020.07.07 arkouda.timeclass

Module Contents

Classes

_Timescalar

_AbstractBaseTime Base class for Datetime and Timedelta; not user-facing. Arkouda handles Datetime Represents a date and/or time. Timedelta Represents a duration, the difference between two dates or times.

Functions

_get_factor(unit: str) → int

_identity(x, **kwargs)

date_range(start=None, end=None, periods=None, Creates a fixed frequency Datetime range. Alias for freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs) timedelta_range(start=None, end=None, peri- Return a fixed frequency TimedeltaIndex, with day as ods=None, freq=None, name=None, closed=None, the default **kwargs)

Attributes

_BASE_UNIT

_unit2normunit

_unit2factor arkouda.timeclass._BASE_UNIT = ns arkouda.timeclass._unit2normunit arkouda.timeclass._unit2factor arkouda.timeclass._get_factor(unit: str) → int arkouda.timeclass._identity(x, **kwargs) class arkouda.timeclass._Timescalar(scalar) class arkouda.timeclass._AbstractBaseTime(array, unit: str = _BASE_UNIT) Bases: arkouda.pdarrayclass.pdarray

182 Chapter 8. API Reference arkouda, Release 2020.07.07

Base class for Datetime and Timedelta; not user-facing. Arkouda handles time similar to Pandas (albeit with less functionality), in that all absolute and relative times are represented in nanoseconds as int64 behind the scenes. Datetime and Timedelta can be constructed from Arkouda, NumPy, or Pandas arrays; in each case, the input values are normalized to nanoseconds on initialization, so that all resulting operations are transparent. classmethod _get_callback(cls, other, op) floor(self, freq) Round times down to the nearest integer of a given frequency. Parameters freq (str {'d', 'm', 'h', 's', 'ms', 'us', 'ns'}) – Frequency to round to Returns Values rounded down to nearest frequency Return type self.__class__ ceil(self, freq) Round times up to the nearest integer of a given frequency. Parameters freq (str {'d', 'm', 'h', 's', 'ms', 'us', 'ns'}) – Frequency to round to Returns Values rounded up to nearest frequency Return type self.__class__ round(self, freq) Round times to the nearest integer of a given frequency. Midpoint values will be rounded to nearest even integer. Parameters freq (str {'d', 'm', 'h', 's', 'ms', 'us', 'ns'}) – Frequency to round to Returns Values rounded to nearest frequency Return type self.__class__ to_ndarray(self ) Convert the array to a np.ndarray, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised. Returns A numpy ndarray with the same attributes and data as the pdarray Return type np.ndarray Raises RuntimeError – Raised if there is a server-side error thrown, if the pdarray size ex- ceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution. See also: array

8.1. arkouda 183 arkouda, Release 2020.07.07

Examples

>>> a= ak.arange(0,5,1) >>> a.to_ndarray() array([0, 1, 2, 3, 4])

>>> type(a.to_ndarray()) numpy.ndarray

__str__(self ) Return str(self). __repr__(self ) → str Return repr(self). _binop(self, other, op) Executes binary operation specified by the op string Parameters • other (pdarray) – The pdarray upon which the binop is to be executed • op (str) – The binop to be executed Returns A pdarray encapsulating the binop result Return type pdarray Raises • ValueError – Raised if the op is not within the pdarray.BinOps set, or if the pdarray sizes don’t match • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype _r_binop(self, other, op) Executes reverse binary operation specified by the op string Parameters • other (pdarray) – The pdarray upon which the reverse binop is to be executed • op (str) – The name of the reverse binop to be executed Returns A pdarray encapsulating the reverse binop result Return type pdarray Raises • ValueError – Raised if the op is not within the pdarray.BinOps set • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype opeq(self, other, op) static _is_datetime_scalar(scalar) static _is_timedelta_scalar(scalar) _scalar_callback(self, key) __getitem__(self, key) __setitem__(self, key, value)

184 Chapter 8. API Reference arkouda, Release 2020.07.07

min(self ) Return the minimum value of the array. max(self ) Return the maximum value of the array. mink(self, k) Compute the minimum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray maxk(self, k) Compute the maximum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray class arkouda.timeclass.Datetime(array, unit: str = _BASE_UNIT) Bases: _AbstractBaseTime Represents a date and/or time. Datetime is the Arkouda analog to pandas DatetimeIndex and other timeseries data types. Parameters • array (int64 pdarray, pd.DatetimeIndex, pd.Series, or np.datetime64 array)– • uint (str, default 'ns') – For int64 pdarray, denotes the unit of the input. Ignored for pandas and numpy arrays, which carry their own unit. Not case-sensitive; prefixes of full names (like ‘sec’) are accepted. Possible values: – ’weeks’ or ‘w’ – ’days’ or ‘d’ – ’hours’ or ‘h’ – ’minutes’, ‘m’, or ‘t’ – ’seconds’ or ‘s’ – ’milliseconds’, ‘ms’, or ‘l’ – ’microseconds’, ‘us’, or ‘u’ – ’nanoseconds’, ‘ns’, or ‘n’ Unlike in pandas, units cannot be combined or mixed with integers

8.1. arkouda 185 arkouda, Release 2020.07.07

Notes

The ._data attribute is always in nanoseconds with int64 dtype. supported_with_datetime supported_with_r_datetime supported_with_timedelta supported_with_r_timedelta supported_opeq supported_with_pdarray supported_with_r_pdarray classmethod _get_callback(cls, otherclass, op) _scalar_callback(self, scalar) static _is_supported_scalar(scalar) to_pandas(self ) Convert array to a pandas DatetimeIndex. Note: if the array size exceeds client.maxTransferBytes, a Run- timeError is raised. See also: to_ndarray sum(self ) Return the sum of all elements in the array. class arkouda.timeclass.Timedelta(array, unit: str = _BASE_UNIT) Bases: _AbstractBaseTime Represents a duration, the difference between two dates or times. Timedelta is the Arkouda equivalent of pandas.TimedeltaIndex. Parameters • array (int64 pdarray, pd.TimedeltaIndex, pd.Series, or np.timedelta64 array)– • unit (str, default 'ns') – For int64 pdarray, denotes the unit of the input. Ignored for pandas and numpy arrays, which carry their own unit. Not case-sensitive; prefixes of full names (like ‘sec’) are accepted. Possible values: – ’weeks’ or ‘w’ – ’days’ or ‘d’ – ’hours’ or ‘h’ – ’minutes’, ‘m’, or ‘t’ – ’seconds’ or ‘s’ – ’milliseconds’, ‘ms’, or ‘l’ – ’microseconds’, ‘us’, or ‘u’ – ’nanoseconds’, ‘ns’, or ‘n’

186 Chapter 8. API Reference arkouda, Release 2020.07.07

Unlike in pandas, units cannot be combined or mixed with integers

Notes

The ._data attribute is always in nanoseconds with int64 dtype. supported_with_datetime supported_with_r_datetime supported_with_timedelta supported_with_r_timedelta supported_opeq supported_with_pdarray supported_with_r_pdarray classmethod _get_callback(cls, otherclass, op) _scalar_callback(self, scalar) static _is_supported_scalar(scalar) to_pandas(self ) Convert array to a pandas TimedeltaIndex. Note: if the array size exceeds client.maxTransferBytes, a RuntimeError is raised. See also: to_ndarray std(self, ddof: Union[int, numpy.int64] = 0) Returns the standard deviation as a pd.Timedelta object sum(self ) Return the sum of all elements in the array. abs(self ) Absolute value of time interval. arkouda.timeclass.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs) Creates a fixed frequency Datetime range. Alias for ak.Datetime(pd.date_range(args)). Subject to size limit imposed by client.maxTransferBytes. Parameters • start (str or datetime-like, optional) – Left bound for generating dates. • end (str or datetime-like, optional) – Right bound for generating dates. • periods (int, optional) – Number of periods to generate. • freq (str or DateOffset, default 'D') – Frequency strings can have multiples, e.g. ‘5H’. See timeseries.offset_aliases for a list of frequency aliases. • tz (str or tzinfo, optional) – Time zone name for returning localized DatetimeIn- dex, for example ‘Asia/Hong_Kong’. By default, the resulting DatetimeIndex is timezone- naive. • normalize (bool, default False) – Normalize start/end dates to midnight before gen- erating date range.

8.1. arkouda 187 arkouda, Release 2020.07.07

• name (str, default None) – Name of the resulting DatetimeIndex. • closed ({None, 'left', 'right'}, optional) – Make the interval closed with respect to the given frequency to the ‘left’, ‘right’, or both sides (None, the default). • **kwargs – For compatibility. Has no effect on the result. Returns rng Return type DatetimeIndex

Notes

Of the four parameters start, end, periods, and freq, exactly three must be specified. If freq is omitted, the resulting DatetimeIndex will have periods linearly spaced elements between start and end (closed on both sides). To learn more about the frequency strings, please see this link. arkouda.timeclass.timedelta_range(start=None, end=None, periods=None, freq=None, name=None, closed=None, **kwargs) Return a fixed frequency TimedeltaIndex, with day as the default frequency. Aliasfor ak.Timedelta(pd. timedelta_range(args)). Subject to size limit imposed by client.maxTransferBytes. Parameters • start (str or timedelta-like, default None) – Left bound for generating timedeltas. • end (str or timedelta-like, default None) – Right bound for generating timedeltas. • periods (int, default None) – Number of periods to generate. • freq (str or DateOffset, default 'D') – Frequency strings can have multiples, e.g. ‘5H’. • name (str, default None) – Name of the resulting TimedeltaIndex. • closed (str, default None) – Make the interval closed with respect to the given fre- quency to the ‘left’, ‘right’, or both sides (None). Returns rng Return type TimedeltaIndex

Notes

Of the four parameters start, end, periods, and freq, exactly three must be specified. If freq is omitted, the resulting TimedeltaIndex will have periods linearly spaced elements between start and end (closed on both sides). To learn more about the frequency strings, please see this link.

188 Chapter 8. API Reference arkouda, Release 2020.07.07

8.1.2 Package Contents

Classes

pdarray The basic arkouda array class. This class contains only the GroupBy Group an array or list of arrays by value, usually in preparation Strings Represents an array of strings whose data resides on the Categorical Represents an array of values belonging to named cate- gories. Converting a pdarray The basic arkouda array class. This class contains only the _Timescalar

_AbstractBaseTime Base class for Datetime and Timedelta; not user-facing. Arkouda handles Datetime Represents a date and/or time. Timedelta Represents a duration, the difference between two dates or times.

Functions

get_versions() Get version information or return default if unable to do so. check_np_dtype(dt: numpy.dtype) → None Assert that numpy dtype dt is one of the dtypes sup- ported translate_np_dtype(dt: numpy.dtype) → Tuple[str, Split numpy dtype dt into its kind and byte size, raising int] resolve_scalar_dtype(val: object) → str Try to infer what dtype arkouda_server should treat val as. get_byteorder(dt: numpy.dtype) → str Get a concrete byteorder (turns '=' into '<' or '>') get_server_byteorder() → str Get the server's byteorder clear() → None Send a clear message to clear all unregistered data from the server symbol table any(pda: pdarray) → numpy.bool_ Return True iff any element of the array evaluates to True. all(pda: pdarray) → numpy.bool_ Return True iff all elements of the array evaluate to True. is_sorted(pda: pdarray) → numpy.bool_ Return True iff the array is monotonically non- decreasing. sum(pda: pdarray) → numpy.float64 Return the sum of all elements in the array. prod(pda: pdarray) → numpy.float64 Return the product of all elements in the array. Return value is min(pda: pdarray) → arkouda.dtypes.numpy_scalars Return the minimum value of the array. max(pda: pdarray) → arkouda.dtypes.numpy_scalars Return the maximum value of the array. argmin(pda: pdarray) → numpy.int64 Return the index of the first occurrence of the array min value. argmax(pda: pdarray) → numpy.int64 Return the index of the first occurrence of the array max value. continues on next page

8.1. arkouda 189 arkouda, Release 2020.07.07

Table 31 – continued from previous page mean(pda: pdarray) → numpy.float64 Return the mean of the array. var(pda: pdarray, ddof: arkouda.dtypes.int_scalars = 0) Return the variance of values in the array. → numpy.float64 std(pda: pdarray, ddof: arkouda.dtypes.int_scalars = 0) Return the standard deviation of values in the array. The → numpy.float64 standard mink(pda: pdarray, k: arkouda.dtypes.int_scalars) → Find the k minimum values of an array. pdarray maxk(pda: pdarray, k: arkouda.dtypes.int_scalars) → Find the k maximum values of an array. pdarray argmink(pda: pdarray, k: arkouda.dtypes.int_scalars) Finds the indices corresponding to the k minimum val- → pdarray ues of an array. argmaxk(pda: pdarray, k: arkouda.dtypes.int_scalars) Find the indices corresponding to the k maximum values → pdarray of an array. attach_pdarray(user_defined_name: str) → pdarray class method to return a pdarray attached to the regis- tered name in the arkouda unregister_pdarray_by_name(user_defined_name: Unregister a named pdarray in the arkouda server which str) → None was previously argsort(pda: Union[arkouda.pdarrayclass.pdarray, Return the permutation that sorts the array. arkouda.strings.Strings, ark- ouda.categorical.Categorical]) → ark- ouda.pdarrayclass.pdarray coargsort(arrays: Se- Return the permutation that groups the rows (left-to- quence[Union[arkouda.strings.Strings, right), if the arkouda.pdarrayclass.pdarray, ark- ouda.categorical.Categorical]]) → ark- ouda.pdarrayclass.pdarray sort(pda: arkouda.pdarrayclass.pdarray) → ark- Return a sorted copy of the array. Only sorts numeric ouda.pdarrayclass.pdarray arrays; unique(pda: Union[arkouda.pdarrayclass.pdarray, ark- Find the unique elements of an array. ouda.strings.Strings, Categorical], return_counts: bool = False) → Union[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], Tu- ple[Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings, Categorical], Op- tional[arkouda.pdarrayclass.pdarray]]] in1d(pda1: Union[arkouda.pdarrayclass.pdarray, Test whether each element of a 1-D array is also present arkouda.strings.Strings, Categorical], pda2: in a second array. Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings, Categorical], invert: bool = False) → arkouda.pdarrayclass.pdarray concatenate(arrays: Se- Concatenate a list or tuple of pdarray or Strings ob- quence[Union[arkouda.pdarrayclass.pdarray, ark- jects into ouda.strings.Strings, Categorical]], ordered: bool = True) → Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings, Categorical] union1d(pda1: arkouda.pdarrayclass.pdarray, Find the union of two arrays. pda2: arkouda.pdarrayclass.pdarray) → ark- ouda.pdarrayclass.pdarray intersect1d(pda1: arkouda.pdarrayclass.pdarray, Find the intersection of two arrays. pda2: arkouda.pdarrayclass.pdarray, assume_unique: bool = False) → arkouda.pdarrayclass.pdarray continues on next page

190 Chapter 8. API Reference arkouda, Release 2020.07.07

Table 31 – continued from previous page setdiff1d(pda1: arkouda.pdarrayclass.pdarray, pda2: Find the set difference of two arrays. arkouda.pdarrayclass.pdarray, assume_unique: bool = False) → arkouda.pdarrayclass.pdarray setxor1d(pda1: arkouda.pdarrayclass.pdarray, pda2: Find the set exclusive-or (symmetric difference) of two arkouda.pdarrayclass.pdarray, assume_unique: bool = arrays. False) → arkouda.pdarrayclass.pdarray array(a: Union[arkouda.pdarrayclass.pdarray, Convert a Python or Numpy Iterable to a pdarray or numpy.ndarray, Iterable]) → Strings object, sending Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings] zeros(size: arkouda.dtypes.int_scalars, dtype: type = Create a pdarray filled with zeros. np.float64) → arkouda.pdarrayclass.pdarray ones(size: arkouda.dtypes.int_scalars, dtype: type = Create a pdarray filled with ones. float64) → arkouda.pdarrayclass.pdarray zeros_like(pda: arkouda.pdarrayclass.pdarray) → Create a zero-filled pdarray of the same size and dtype arkouda.pdarrayclass.pdarray as an existing ones_like(pda: arkouda.pdarrayclass.pdarray) → ark- Create a one-filled pdarray of the same size and dtype as ouda.pdarrayclass.pdarray an existing arange(*args) → arkouda.pdarrayclass.pdarray arange([start,] stop[, stride]) linspace(start: arkouda.dtypes.numeric_scalars, Create a pdarray of linearly-spaced floats in a closed in- stop: arkouda.dtypes.numeric_scalars, terval. length: arkouda.dtypes.int_scalars) → ark- ouda.pdarrayclass.pdarray randint(low: arkouda.dtypes.numeric_scalars, Generate a pdarray of randomized int, float, or bool val- high: arkouda.dtypes.numeric_scalars, size: ues in a arkouda.dtypes.int_scalars, dtype=int64, seed: arkouda.dtypes.int_scalars = None) → ark- ouda.pdarrayclass.pdarray uniform(size: arkouda.dtypes.int_scalars, low: Generate a pdarray with uniformly distributed random arkouda.dtypes.numeric_scalars = float(0.0), high: float values arkouda.dtypes.numeric_scalars = 1.0, seed: Union[None, arkouda.dtypes.int_scalars] = None) → arkouda.pdarrayclass.pdarray standard_normal(size: arkouda.dtypes.int_scalars, Draw real numbers from the standard normal distribu- seed: Union[None, arkouda.dtypes.int_scalars] = None) tion. → arkouda.pdarrayclass.pdarray random_strings_uniform(minlen: ark- Generate random strings with lengths uniformly dis- ouda.dtypes.int_scalars, maxlen: ark- tributed between ouda.dtypes.int_scalars, size: ark- ouda.dtypes.int_scalars, characters: str = 'uppercase', seed: Union[None, arkouda.dtypes.int_scalars] = None) → arkouda.strings.Strings random_strings_lognormal(logmean: ark- Generate random strings with log-normally distributed ouda.dtypes.numeric_scalars, logstd: ark- lengths and ouda.dtypes.numeric_scalars, size: ark- ouda.dtypes.int_scalars, characters: str = 'uppercase', seed: Optional[arkouda.dtypes.int_scalars] = None) → arkouda.strings.Strings continues on next page

8.1. arkouda 191 arkouda, Release 2020.07.07

Table 31 – continued from previous page from_series(series: pandas.Series, dtype: Converts a Pandas Series to an Arkouda pdarray or Optional[Union[type, str]] = None) → Strings object. If Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings] cast(pda: Union[arkouda.pdarrayclass.pdarray, Cast an array to another dtype. arkouda.strings.Strings], dt: Union[numpy.dtype, str]) → Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings] abs(pda: arkouda.pdarrayclass.pdarray) → ark- Return the element-wise absolute value of the array. ouda.pdarrayclass.pdarray log(pda: arkouda.pdarrayclass.pdarray) → ark- Return the element-wise natural log of the array. ouda.pdarrayclass.pdarray exp(pda: arkouda.pdarrayclass.pdarray) → ark- Return the element-wise exponential of the array. ouda.pdarrayclass.pdarray cumsum(pda: arkouda.pdarrayclass.pdarray) → ark- Return the cumulative sum over the array. ouda.pdarrayclass.pdarray cumprod(pda: arkouda.pdarrayclass.pdarray) → ark- Return the cumulative product over the array. ouda.pdarrayclass.pdarray sin(pda: arkouda.pdarrayclass.pdarray) → ark- Return the element-wise sine of the array. ouda.pdarrayclass.pdarray cos(pda: arkouda.pdarrayclass.pdarray) → ark- Return the element-wise cosine of the array. ouda.pdarrayclass.pdarray hash(pda: arkouda.pdarrayclass.pdarray, full: bool = Return an element-wise hash of the array. True) → Union[Tuple[arkouda.pdarrayclass.pdarray, arkouda.pdarrayclass.pdarray], ark- ouda.pdarrayclass.pdarray] where(condition: arkouda.pdarrayclass.pdarray, Returns an array with elements chosen from A and B A: Union[arkouda.dtypes.numeric_scalars, based upon a arkouda.pdarrayclass.pdarray], B: Union[arkouda.dtypes.numeric_scalars, arkouda.pdarrayclass.pdarray]) → ark- ouda.pdarrayclass.pdarray histogram(pda: arkouda.pdarrayclass.pdarray, Compute a histogram of evenly spaced bins over the bins: arkouda.dtypes.int_scalars = 10) → ark- range of an array. ouda.pdarrayclass.pdarray value_counts(pda: arkouda.pdarrayclass.pdarray) Count the occurrences of the unique values of an array. → Union[Categorical, Tu- ple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings], Op- tional[arkouda.pdarrayclass.pdarray]]] isnan(pda: arkouda.pdarrayclass.pdarray) → ark- Test a pdarray for Not a number / NaN values ouda.pdarrayclass.pdarray ls_hdf (filename: str) → str This function calls the h5ls utility on a filename visible to the read_hdf (dsetName: str, filenames: Union[str, Read a single dataset from multiple HDF5 files into an List[str]], strictTypes: bool = True, allow_errors: Arkouda bool = False, calc_string_offsets: bool = False) → Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings] continues on next page

192 Chapter 8. API Reference arkouda, Release 2020.07.07

Table 31 – continued from previous page read_all(filenames: Union[str, List[str]], datasets: Read datasets from HDF5 files. Optional[Union[str, List[str]]] = None, itera- tive: bool = False, strictTypes: bool = True, al- low_errors: bool = False, calc_string_offsets=False) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Mapping[str, Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings]]] load(path_prefix: str, dataset: str = 'ar- Load a pdarray previously saved with pdarray. ray', calc_string_offsets: bool = False) → save(). Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings] get_datasets(filename: str) → List[str] Get the names of datasets in an HDF5 file. load_all(path_prefix: str) → Mapping[str, Load multiple pdarrays or Strings previously saved with Union[arkouda.pdarrayclass.pdarray, ark- save_all(). ouda.strings.Strings, arkouda.categorical.Categorical]] save_all(columns: Union[Mapping[str, Save multiple named pdarrays to HDF5 files. arkouda.pdarrayclass.pdarray], List[arkouda.pdarrayclass.pdarray]], prefix_path: str, names: List[str] = None, mode: str = 'truncate') → None broadcast(segments: arkouda.pdarrayclass.pdarray, Broadcast a dense column vector to the rows of a sparse values: arkouda.pdarrayclass.pdarray, size: matrix or grouped array. Union[int, numpy.int64] = -1, permutation: Union[arkouda.pdarrayclass.pdarray, None] = None) join_on_eq_with_dt(a1: ark- Performs an inner-join on equality between two integer ouda.pdarrayclass.pdarray, a2: ark- arrays where ouda.pdarrayclass.pdarray, t1: ark- ouda.pdarrayclass.pdarray, t2: ark- ouda.pdarrayclass.pdarray, dt: Union[int, numpy.int64], pred: str, result_limit: Union[int, numpy.int64] = 1000) → Tuple[arkouda.pdarrayclass.pdarray, ark- ouda.pdarrayclass.pdarray] enableVerbose() → None Enables verbose logging (DEBUG log level) for all Ark- oudaLoggers disableVerbose(logLevel: LogLevel = Disables verbose logging (DEBUG log level) for all Ark- LogLevel.INFO) → None oudaLoggers, setting isSupportedInt(num)

from_series(series: pandas.Series, dtype: Converts a Pandas Series to an Arkouda pdarray or Optional[Union[type, str]] = None) → Strings object. If Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings] ak_array(a: Union[arkouda.pdarrayclass.pdarray, Convert a Python or Numpy Iterable to a pdarray or numpy.ndarray, Iterable]) → Strings object, sending Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings] cast(pda: Union[arkouda.pdarrayclass.pdarray, Cast an array to another dtype. arkouda.strings.Strings], dt: Union[numpy.dtype, str]) → Union[arkouda.pdarrayclass.pdarray, ark- ouda.strings.Strings] continues on next page

8.1. arkouda 193 arkouda, Release 2020.07.07

Table 31 – continued from previous page akabs(pda: arkouda.pdarrayclass.pdarray) → ark- Return the element-wise absolute value of the array. ouda.pdarrayclass.pdarray _get_factor(unit: str) → int

_identity(x, **kwargs)

date_range(start=None, end=None, periods=None, Creates a fixed frequency Datetime range. Alias for freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs) timedelta_range(start=None, end=None, peri- Return a fixed frequency TimedeltaIndex, with day as ods=None, freq=None, name=None, closed=None, the default **kwargs) information(names: Union[List[str], str] = Regis- Returns JSON formatted string containing information teredSymbols) → str about the objects in names list_registry() → List[str] Return a list containing the names of all registered ob- jects list_symbol_table() → List[str] Return a list containing the names of all objects in the symbol table pretty_print_information(names: Prints verbose information for each object in names in a Union[List[str], str] = RegisteredSymbols) → None human readable format

Attributes

__version__

DTypes

DTypeObjects

dtype

bool

int64

float64

uint8

str_

ARKOUDA_SUPPORTED_DTYPES

bool_scalars

float_scalars

int_scalars

continues on next page

194 Chapter 8. API Reference arkouda, Release 2020.07.07

Table 32 – continued from previous page numeric_scalars

numpy_scalars

str_scalars

all_scalars The DType enum defines the supported Arkouda data types in string form. GROUPBY_REDUCTION_TYPES

int64

_BASE_UNIT

_unit2normunit

_unit2factor

AllSymbols

RegisteredSymbols

arkouda.get_versions() Get version information or return default if unable to do so. arkouda.__version__ arkouda.DTypes arkouda.DTypeObjects arkouda.dtype arkouda.bool arkouda.int64 arkouda.float64 arkouda.uint8 arkouda.str_ arkouda.check_np_dtype(dt: numpy.dtype) → None Assert that numpy dtype dt is one of the dtypes supported by arkouda, otherwise raise TypeError. Raises TypeError – Raised if the dtype is not in supported dtypes or if dt is not a np.dtype arkouda.translate_np_dtype(dt: numpy.dtype) → Tuple[str, int] Split numpy dtype dt into its kind and byte size, raising TypeError for unsupported dtypes. Raises TypeError – Raised if the dtype is not in supported dtypes or if dt is not a np.dtype arkouda.resolve_scalar_dtype(val: object) → str Try to infer what dtype arkouda_server should treat val as. arkouda.ARKOUDA_SUPPORTED_DTYPES arkouda.bool_scalars

8.1. arkouda 195 arkouda, Release 2020.07.07 arkouda.float_scalars arkouda.int_scalars arkouda.numeric_scalars arkouda.numpy_scalars arkouda.str_scalars arkouda.all_scalars The DType enum defines the supported Arkouda data types in string form. arkouda.get_byteorder(dt: numpy.dtype) → str Get a concrete byteorder (turns ‘=’ into ‘<’ or ‘>’) arkouda.get_server_byteorder() → str Get the server’s byteorder class arkouda.pdarray(name: str, mydtype: numpy.dtype, size: arkouda.dtypes.int_scalars, ndim: arkouda.dtypes.int_scalars, shape: Sequence[int], itemsize: arkouda.dtypes.int_scalars) The basic arkouda array class. This class contains only the attributies of the array; the data resides on the arkouda server. When a server operation results in a new array, arkouda will create a pdarray instance that points to the array data on the server. As such, the user should not initialize pdarray instances directly. name The server-side identifier for the array Type str dtype The element type of the array Type dtype size The number of elements in the array Type int_scalars ndim The rank of the array (currently only rank 1 arrays supported) Type int_scalars shape A list or tuple containing the sizes of each dimension of the array Type Sequence[int] itemsize The size in bytes of each element Type int_scalars BinOps OpEqOps objtype = pdarray __array_priority__ = 1000 __del__(self ) __bool__(self ) → bool

196 Chapter 8. API Reference arkouda, Release 2020.07.07

__len__(self ) __str__(self ) Return str(self). __repr__(self ) Return repr(self). format_other(self, other: object) → numpy.dtype Attempt to cast scalar other to the element dtype of this pdarray, and print the resulting value to a string (e.g. for sending to a server command). The user should not call this function directly. Parameters other (object) – The scalar to be cast to the pdarray.dtype Returns Return type np.dtype corresponding to the other parameter Raises TypeError – Raised if the other parameter cannot be converted to Numpy dtype _binop(self, other: pdarray, op: str) → pdarray Executes binary operation specified by the op string Parameters • other (pdarray) – The pdarray upon which the binop is to be executed • op (str) – The binop to be executed Returns A pdarray encapsulating the binop result Return type pdarray Raises • ValueError – Raised if the op is not within the pdarray.BinOps set, or if the pdarray sizes don’t match • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype _r_binop(self, other: pdarray, op: str) → pdarray Executes reverse binary operation specified by the op string Parameters • other (pdarray) – The pdarray upon which the reverse binop is to be executed • op (str) – The name of the reverse binop to be executed Returns A pdarray encapsulating the reverse binop result Return type pdarray Raises • ValueError – Raised if the op is not within the pdarray.BinOps set • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype __add__(self, other) __radd__(self, other) __sub__(self, other) __rsub__(self, other) __mul__(self, other)

8.1. arkouda 197 arkouda, Release 2020.07.07

__rmul__(self, other) __truediv__(self, other) __rtruediv__(self, other) __floordiv__(self, other) __rfloordiv__(self, other) __mod__(self, other) __rmod__(self, other) __lshift__(self, other) __rlshift__(self, other) __rshift__(self, other) __rrshift__(self, other) __and__(self, other) __rand__(self, other) __or__(self, other) __ror__(self, other) __xor__(self, other) __rxor__(self, other) __pow__(self, other) __rpow__(self, other) __lt__(self, other) Return selfvalue. __le__(self, other) Return self<=value. __ge__(self, other) Return self>=value. __eq__(self, other) Return self==value. __ne__(self, other) Return self!=value. __neg__(self ) __invert__(self ) opeq(self, other, op) __iadd__(self, other) __isub__(self, other) __imul__(self, other) __itruediv__(self, other)

198 Chapter 8. API Reference arkouda, Release 2020.07.07

__ifloordiv__(self, other) __ilshift__(self, other) __irshift__(self, other) __iand__(self, other) __ior__(self, other) __ixor__(self, other) __ipow__(self, other) abstract __iter__(self ) __getitem__(self, key) __setitem__(self, key, value) fill(self, value: arkouda.dtypes.numeric_scalars) → None Fill the array (in place) with a constant value. Parameters value (numeric_scalars)– Raises TypeError – Raised if value is not an int, int64, float, or float64 any(self ) → numpy.bool_ Return True iff any element of the array evaluates to True. all(self ) → numpy.bool_ Return True iff all elements of the array evaluate to True. is_registered(self ) → numpy.bool_ Return True iff the object is contained in the registry Parameters None – Returns Indicates if the object is contained in the registry Return type bool Raises RuntimeError – Raised if there’s a server-side error thrown _list_component_names(self ) → List[str] Internal Function that returns a list of all component names Parameters None – Returns List of all component names Return type List[str] info(self ) → str Returns a JSON formatted string containing information about all components of self Parameters None – Returns JSON string containing information about all components of self Return type str pretty_print_info(self ) → None Prints information about all components of self in a human readable format Parameters None – Returns

8.1. arkouda 199 arkouda, Release 2020.07.07

Return type None is_sorted(self ) → numpy.bool_ Return True iff the array is monotonically non-decreasing. Parameters None – Returns Indicates if the array is monotonically non-decreasing Return type bool Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown sum(self ) → arkouda.dtypes.numpy_scalars Return the sum of all elements in the array. prod(self ) → numpy.float64 Return the product of all elements in the array. Return value is always a np.float64 or np.int64. min(self ) → arkouda.dtypes.numpy_scalars Return the minimum value of the array. max(self ) → arkouda.dtypes.numpy_scalars Return the maximum value of the array. argmin(self ) → numpy.int64 Return the index of the first occurrence of the array min value argmax(self ) → numpy.int64 Return the index of the first occurrence of the array max value. mean(self ) → numpy.float64 Return the mean of the array. var(self, ddof: arkouda.dtypes.int_scalars = 0) → numpy.float64 Compute the variance. See arkouda.var for details. Parameters ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating var Returns The scalar variance of the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • ValueError – Raised if the ddof >= pdarray size • RuntimeError – Raised if there’s a server-side error thrown std(self, ddof: arkouda.dtypes.int_scalars = 0) → numpy.float64 Compute the standard deviation. See arkouda.std for details. Parameters ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std Returns The scalar standard deviation of the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance

200 Chapter 8. API Reference arkouda, Release 2020.07.07

• RuntimeError – Raised if there’s a server-side error thrown mink(self, k: arkouda.dtypes.int_scalars) → pdarray Compute the minimum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray maxk(self, k: arkouda.dtypes.int_scalars) → pdarray Compute the maximum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray argmink(self, k: arkouda.dtypes.int_scalars) → pdarray Compute the minimum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns Indices corresponding to the maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray argmaxk(self, k: arkouda.dtypes.int_scalars) → pdarray Finds the indices corresponding to the maximum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns Indices corresponding to the maximum k values, sorted Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray to_ndarray(self ) → numpy.ndarray Convert the array to a np.ndarray, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised. Returns A numpy ndarray with the same attributes and data as the pdarray Return type np.ndarray Raises RuntimeError – Raised if there is a server-side error thrown, if the pdarray size ex- ceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

8.1. arkouda 201 arkouda, Release 2020.07.07

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution. See also: array

Examples

>>> a= ak.arange(0,5,1) >>> a.to_ndarray() array([0, 1, 2, 3, 4])

>>> type(a.to_ndarray()) numpy.ndarray

to_cuda(self ) Convert the array to a Numba DeviceND array, transferring array data from the arkouda server to Python via ndarray. If the array exceeds a builtin size limit, a RuntimeError is raised. Returns A Numba ndarray with the same attributes and data as the pdarray; on GPU Return type numba.DeviceNDArray Raises • ImportError – Raised if CUDA is not available • ModuleNotFoundError – Raised if Numba is either not installed or not enabled • RuntimeError – Raised if there is a server-side error thrown in the course of retrieving the pdarray.

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution. See also: array

202 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> a= ak.arange(0,5,1) >>> a.to_cuda() array([0, 1, 2, 3, 4])

>>> type(a.to_cuda()) numpy.devicendarray

save(self, prefix_path: str, dataset: str = 'array', mode: str = 'truncate') → str Save the pdarray to HDF5. The result is a collection of HDF5 files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the array to its corresponding file. Parameters • prefix_path (str) – Directory and filename prefix that all output files share • dataset (str) – Name of the dataset to create in HDF5 files (must not already exist) • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files. Returns Return type string message indicating result of save operation Raises • RuntimeError – Raised if a server-side error is thrown saving the pdarray • ValueError – Raised if there is an error in parsing the prefix path pointing to file write location or if the mode parameter is neither truncate nor append • TypeError – Raised if any one of the prefix_path, dataset, or mode parameters is nota string See also: save_all, load, read_hdf , read_all

Notes

The prefix_path must be visible to the arkouda server and the user must have write permission. Output files have names of the form _LOCALE.hdf, where ranges from 0 to numLocales. If any of the output files already exist and the mode is ‘truncate’, they will be overwrit- ten. If the mode is ‘append’ and the number of output files is less than the number of locales or adataset with the same name already exists, a RuntimeError will result.

8.1. arkouda 203 arkouda, Release 2020.07.07

Examples

>>> a= ak.arange(0, 100,1) >>> a.save('arkouda_range', dataset='array')

Array is saved in numLocales files with names like tmp/arkouda_range_LOCALE0.hdf The array can be read back in as follows >>> b= ak.load( 'arkouda_range', dataset='array') >>> (a == b).all() True

register(self, user_defined_name: str) → pdarray Register this pdarray with a user defined name in the arkouda server so it can be attached to later using pdarray.attach() This is an in-place operation, registering a pdarray more than once will update the name in the registry and remove the previously registered name. A name can only be registered to one pdarray at a time. Parameters user_defined_name (str) – user defined name array is to be registered under Returns The same pdarray which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different pdarrays with the same name. Return type pdarray Raises • TypeError – Raised if user_defined_name is not a str • RegistrationError – If the server was unable to register the pdarray with the user_defined_name If the user is attempting to register more than one pdarray withthe same name, the former should be unregistered first to free up the registration name. See also: attach, unregister, is_registered, list_registry, unregister_pdarray_by_name

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a= zeros(100) >>> a.register("my_zeros") >>> # potentially disconnect from server and reconnect to server >>> b= ak.pdarray.attach("my_zeros") >>> # ...other work... >>> b.unregister()

unregister(self ) → None Unregister a pdarray in the arkouda server which was previously registered using register() and/or attahced to using attach() Returns

204 Chapter 8. API Reference arkouda, Release 2020.07.07

Return type None Raises RuntimeError – Raised if the server could not find the internal name/symbol to remove See also: register, unregister, is_registered, unregister_pdarray_by_name, list_registry

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a= zeros(100) >>> a.register("my_zeros") >>> # potentially disconnect from server and reconnect to server >>> b= ak.pdarray.attach("my_zeros") >>> # ...other work... >>> b.unregister()

static attach(user_defined_name: str) → pdarray class method to return a pdarray attached to the registered name in the arkouda server which was registered using register() Parameters user_defined_name (str) – user defined name which array was registered under Returns pdarray which is bound to corresponding server side component that was registered with user_defined_name Return type pdarray Raises TypeError – Raised if user_defined_name is not a str See also: register, unregister, is_registered, unregister_pdarray_by_name, list_registry

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a= zeros(100) >>> a.register("my_zeros") >>> # potentially disconnect from server and reconnect to server >>> b= ak.pdarray.attach("my_zeros") >>> # ...other work... >>> b.unregister() arkouda.clear() → None Send a clear message to clear all unregistered data from the server symbol table Returns

8.1. arkouda 205 arkouda, Release 2020.07.07

Return type None Raises RuntimeError – Raised if there is a server-side error in executing clear request arkouda.any(pda: pdarray) → numpy.bool_ Return True iff any element of the array evaluates to True. Parameters pda (pdarray) – The pdarray instance to be evaluated Returns Indicates if 1..n pdarray elements evaluate to True Return type bool Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.all(pda: pdarray) → numpy.bool_ Return True iff all elements of the array evaluate to True. Parameters pda (pdarray) – The pdarray instance to be evaluated Returns Indicates if all pdarray elements evaluate to True Return type bool Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.is_sorted(pda: pdarray) → numpy.bool_ Return True iff the array is monotonically non-decreasing. Parameters pda (pdarray) – The pdarray instance to be evaluated Returns Indicates if the array is monotonically non-decreasing Return type bool Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.sum(pda: pdarray) → numpy.float64 Return the sum of all elements in the array. Parameters pda (pdarray) – Values for which to calculate the sum Returns The sum of all elements in the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.prod(pda: pdarray) → numpy.float64 Return the product of all elements in the array. Return value is always a np.float64 or np.int64 Parameters pda (pdarray) – Values for which to calculate the product Returns The product calculated from the pda

206 Chapter 8. API Reference arkouda, Release 2020.07.07

Return type numpy_scalars Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.min(pda: pdarray) → arkouda.dtypes.numpy_scalars Return the minimum value of the array. Parameters pda (pdarray) – Values for which to calculate the min Returns The min calculated from the pda Return type numpy_scalars Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.max(pda: pdarray) → arkouda.dtypes.numpy_scalars Return the maximum value of the array. Parameters pda (pdarray) – Values for which to calculate the max Returns The max calculated from the pda Return type numpy_scalars Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.argmin(pda: pdarray) → numpy.int64 Return the index of the first occurrence of the array min value. Parameters pda (pdarray) – Values for which to calculate the argmin Returns The index of the argmin calculated from the pda Return type np.int64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.argmax(pda: pdarray) → numpy.int64 Return the index of the first occurrence of the array max value. Parameters pda (pdarray) – Values for which to calculate the argmax Returns The index of the argmax calculated from the pda Return type np.int64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.mean(pda: pdarray) → numpy.float64 Return the mean of the array.

8.1. arkouda 207 arkouda, Release 2020.07.07

Parameters pda (pdarray) – Values for which to calculate the mean Returns The mean calculated from the pda sum and size Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown arkouda.var(pda: pdarray, ddof: arkouda.dtypes.int_scalars = 0) → numpy.float64 Return the variance of values in the array. Parameters • pda (pdarray) – Values for which to calculate the variance • ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating var Returns The scalar variance of the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • ValueError – Raised if the ddof >= pdarray size • RuntimeError – Raised if there’s a server-side error thrown See also: mean, std

Notes

The variance is the average of the squared deviations from the mean, i.e., var = mean((x - x.mean())**2). The mean is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. arkouda.std(pda: pdarray, ddof: arkouda.dtypes.int_scalars = 0) → numpy.float64 Return the standard deviation of values in the array. The standard deviation is implemented as the square root of the variance. Parameters • pda (pdarray) – values for which to calculate the standard deviation • ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std Returns The scalar standard deviation of the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance or ddof is not an integer • ValueError – Raised if ddof is an integer < 0 • RuntimeError – Raised if there’s a server-side error thrown

208 Chapter 8. API Reference arkouda, Release 2020.07.07

See also: mean, var

Notes

The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean((x - x.mean())**2)). The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se. arkouda.mink(pda: pdarray, k: arkouda.dtypes.int_scalars) → pdarray Find the k minimum values of an array. Returns the smallest k values of an array, sorted Parameters • pda (pdarray) – Input array. • k (int_scalars) – The desired count of minimum values to be returned by the output. Returns The minimum k values from pda, sorted Return type pdarray Raises • TypeError – Raised if pda is not a pdarray • ValueError – Raised if the pda is empty or k < 1

Notes

This call is equivalent in value to: a[ak.argsort(a)[:k]] and generally outperforms this operation. This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degredation has been observed.

Examples

>>> A= ak.array([10,5,1,3,7,2,9,0]) >>> ak.mink(A,3) array([0, 1, 2]) >>> ak.mink(A,4) array([0, 1, 2, 3])

arkouda.maxk(pda: pdarray, k: arkouda.dtypes.int_scalars) → pdarray Find the k maximum values of an array. Returns the largest k values of an array, sorted

8.1. arkouda 209 arkouda, Release 2020.07.07

Parameters • pda (pdarray) – Input array. • k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda, sorted Return type pdarray, int Raises • TypeError – Raised if pda is not a pdarray or k is not an integer • ValueError – Raised if the pda is empty or k < 1

Notes

This call is equivalent in value to: a[ak.argsort(a)[k:]] and generally outperforms this operation. This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degredation has been observed.

Examples

>>> A= ak.array([10,5,1,3,7,2,9,0]) >>> ak.maxk(A,3) array([7, 9, 10]) >>> ak.maxk(A,4) array([5, 7, 9, 10]) arkouda.argmink(pda: pdarray, k: arkouda.dtypes.int_scalars) → pdarray Finds the indices corresponding to the k minimum values of an array. Parameters • pda (pdarray) – Input array. • k (int_scalars) – The desired count of indices corresponding to minimum array values Returns The indices of the minimum k values from the pda, sorted Return type pdarray, int Raises • TypeError – Raised if pda is not a pdarray or k is not an integer • ValueError – Raised if the pda is empty or k < 1

210 Chapter 8. API Reference arkouda, Release 2020.07.07

Notes

This call is equivalent in value to: ak.argsort(a)[:k] and generally outperforms this operation. This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degradation has been observed.

Examples

>>> A= ak.array([10,5,1,3,7,2,9,0]) >>> ak.argmink(A,3) array([7, 2, 5]) >>> ak.argmink(A,4) array([7, 2, 5, 3]) arkouda.argmaxk(pda: pdarray, k: arkouda.dtypes.int_scalars) → pdarray Find the indices corresponding to the k maximum values of an array. Returns the largest k values of an array, sorted Parameters • pda (pdarray) – Input array. • k (int_scalars) – The desired count of indices corresponding to maxmum array values Returns The indices of the maximum k values from the pda, sorted Return type pdarray, int Raises • TypeError – Raised if pda is not a pdarray or k is not an integer • ValueError – Raised if the pda is empty or k < 1

Notes

This call is equivalent in value to: ak.argsort(a)[k:] and generally outperforms this operation. This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degradation has been observed.

8.1. arkouda 211 arkouda, Release 2020.07.07

Examples

>>> A= ak.array([10,5,1,3,7,2,9,0]) >>> ak.argmaxk(A,3) array([4, 6, 0]) >>> ak.argmaxk(A,4) array([1, 4, 6, 0]) arkouda.attach_pdarray(user_defined_name: str) → pdarray class method to return a pdarray attached to the registered name in the arkouda server which was registered using register() Parameters user_defined_name (str) – user defined name which array was registered under Returns pdarray which is bound to corresponding server side component that was registered with user_defined_name Return type pdarray Raises TypeError – Raised if user_defined_name is not a str See also: register, unregister, is_registered, unregister_pdarray_by_name, list_registry

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a= zeros(100) >>> a.register("my_zeros") >>> # potentially disconnect from server and reconnect to server >>> b= ak.attach_pdarray("my_zeros") >>> # ...other work... >>> b.unregister() arkouda.unregister_pdarray_by_name(user_defined_name: str) → None Unregister a named pdarray in the arkouda server which was previously registered using register() and/or attahced to using attach_pdarray() Parameters user_defined_name (str) – user defined name which array was registered under Returns Return type None Raises RuntimeError – Raised if the server could not find the internal name/symbol to remove See also: register, unregister, is_registered, list_registry, attach

212 Chapter 8. API Reference arkouda, Release 2020.07.07

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a= zeros(100) >>> a.register("my_zeros") >>> # potentially disconnect from server and reconnect to server >>> b= ak.attach_pdarray("my_zeros") >>> # ...other work... >>> ak.unregister_pdarray_by_name(b) exception arkouda.RegistrationError Bases: Exception Error/Exception used when the Arkouda Server cannot register an object arkouda.argsort(pda: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, arkouda.categorical.Categorical]) → arkouda.pdarrayclass.pdarray Return the permutation that sorts the array. Parameters pda (pdarray or Strings or Categorical) – The array to sort (int64 or float64) Returns The indices such that pda[indices] is sorted Return type pdarray, int64 Raises TypeError – Raised if the parameter is other than a pdarray or Strings See also: coargsort

Notes

Uses a least-significant-digit radix sort, which is stable and resilient to non-uniformity in data but communication intensive.

Examples

>>> a= ak.randint(0, 10, 10) >>> perm= ak.argsort(a) >>> a[perm] array([0, 1, 1, 3, 4, 5, 7, 8, 8, 9]) arkouda.coargsort(arrays: Sequence[Union[arkouda.strings.Strings, arkouda.pdarrayclass.pdarray, arkouda.categorical.Categorical]]) → arkouda.pdarrayclass.pdarray Return the permutation that groups the rows (left-to-right), if the input arrays are treated as columns. The permu- tation sorts numeric columns, but not strings/Categoricals – strings/Categoricals are grouped, but not ordered. Parameters arrays (Sequence[Union[Strings, pdarray, Categorical]]) – The columns (int64, float64, Strings, or Categorical) to sort by row Returns The indices that permute the rows to grouped order Return type pdarray, int64

8.1. arkouda 213 arkouda, Release 2020.07.07

Raises ValueError – Raised if the pdarrays are not of the same size or if the parameter is not an Iterable containing pdarrays, Strings, or Categoricals See also: argsort

Notes

Uses a least-significant-digit radix sort, which is stable and resilient to non-uniformity in data but communication intensive. Starts with the last array and moves forward. This sort operates directly on numeric types, but for Strings, it operates on a hash. Thus, while grouping of equivalent strings is guaranteed, lexicographic ordering of the groups is not. For Categoricals, coargsort sorts based on Categorical.codes which guarantees grouping of equivalent categories but not lexicographic ordering of those groups.

Examples

>>> a= ak.array([0,1,0,1]) >>> b= ak.array([1,1,0,0]) >>> perm= ak.coargsort([a, b]) >>> perm array([2, 0, 3, 1]) >>> a[perm] array([0, 0, 1, 1]) >>> b[perm] array([0, 1, 0, 1]) arkouda.sort(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return a sorted copy of the array. Only sorts numeric arrays; for Strings, use argsort. Parameters pda (pdarray or Categorical) – The array to sort (int64 or float64) Returns The sorted copy of pda Return type pdarray, int64 or float64 Raises • TypeError – Raised if the parameter is not a pdarray • ValueError – Raised if sort attempted on a pdarray with an unsupported dtype such as bool See also: argsort

Notes

Uses a least-significant-digit radix sort, which is stable and resilient to non-uniformity in data but communication intensive.

214 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> a= ak.randint(0, 10, 10) >>> sorted= ak.sort(a) >>> a array([0, 1, 1, 3, 4, 5, 7, 8, 8, 9]) arkouda.unique(pda: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], return_counts: bool = False) → Union[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], Optional[arkouda.pdarrayclass.pdarray]]] Find the unique elements of an array. Returns the unique elements of an array, sorted if the values are integers. There is an optional output in addition to the unique elements: the number of times each unique value comes up in the input array. Parameters • pda (pdarray or Strings or Categorical) – Input array. • return_counts (bool, optional) – If True, also return the number of times each unique item appears in pda. Returns • unique (pdarray or Strings) – The unique values. If input dtype is int64, return values will be sorted. • unique_counts (pdarray, optional) – The number of times each of the unique values comes up in the original array. Only provided if return_counts is True. Raises • TypeError – Raised if pda is not a pdarray or Strings object • RuntimeError – Raised if the pdarray or Strings dtype is unsupported

Notes

For integer arrays, this function checks to see whether pda is sorted and, if so, whether it is already unique. This step can save considerable computation. Otherwise, this function will sort pda.

Examples

>>> A= ak.array([3,2,1,1,2,3]) >>> ak.unique(A) array([1, 2, 3]) arkouda.in1d(pda1: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], pda2: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical], invert: bool = False) → arkouda.pdarrayclass.pdarray Test whether each element of a 1-D array is also present in a second array. Returns a boolean array the same length as pda1 that is True where an element of pda1 is in pda2 and False otherwise. Parameters

8.1. arkouda 215 arkouda, Release 2020.07.07

• pda1 (pdarray or Strings or Categorical) – Input array. • pda2 (pdarray or Strings or Categorical) – The values against which to test each value of pda1. Must be the same type as pda1. • invert (bool, optional) – If True, the values in the returned array are inverted (that is, False where an element of pda1 is in pda2 and True otherwise). Default is False. ak. in1d(a, b, invert=True) is equivalent to (but is faster than) ~ak.in1d(a, b). Returns The values pda1[in1d] are in pda2. Return type pdarray, bool Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray, Strings, or Categorical object or if invert is not a bool • RuntimeError – Raised if the dtype of either array is not supported See also: unique, intersect1d, union1d

Notes

in1d can be considered as an element-wise function version of the python keyword in, for 1-D sequences. in1d(a, b) is logically equivalent to ak.array([item in b for item in a]), but is much faster and scales to arbitrarily large a. ak.in1d is not supported for bool or float64 pdarrays

Examples

>>> ak.in1d(ak.array([-1,0,1]), ak.array([-2,0,2])) array([False, True, False])

>>> ak.in1d(ak.array(['one','two']),ak.array(['two', 'three','four','five'])) array([False, True]) arkouda.concatenate(arrays: Sequence[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical]], ordered: bool = True) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Categorical] Concatenate a list or tuple of pdarray or Strings objects into one pdarray or Strings object, respectively. Parameters • arrays (Sequence[Union[pdarray,Strings,Categorical]]) – The arrays to con- catenate. Must all have same dtype. • ordered (bool) – If True (default), the arrays will be appended in the order given. If False, array data may be interleaved in blocks, which can greatly improve performance but results in non-deterministic ordering of elements. Returns Single pdarray or Strings object containing all values, returned in the original order Return type Union[pdarray,Strings,Categorical] Raises

216 Chapter 8. API Reference arkouda, Release 2020.07.07

• ValueError – Raised if arrays is empty or if 1..n pdarrays have differing dtypes • TypeError – Raised if arrays is not a pdarrays or Strings python Sequence such as a list or tuple • RuntimeError – Raised if 1..n array elements are dtypes for which concatenate has not been implemented.

Examples

>>> ak.concatenate([ak.array([1,2,3]), ak.array([4,5,6])]) array([1, 2, 3, 4, 5, 6])

>>> ak.concatenate([ak.array([True,False,True]),ak.array([False,True,True])]) array([True, False, True, False, True, True])

>>> ak.concatenate([ak.array(['one','two']),ak.array(['three','four','five'])]) array(['one', 'two', 'three', 'four', 'five']) arkouda.union1d(pda1: arkouda.pdarrayclass.pdarray, pda2: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Find the union of two arrays. Return the unique, sorted array of values that are in either of the two input arrays. Parameters • pda1 (pdarray) – Input array • pda2 (pdarray) – Input array Returns Unique, sorted union of the input arrays. Return type pdarray Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray • RuntimeError – Raised if the dtype of either array is not supported See also: intersect1d, unique

Notes

ak.union1d is not supported for bool or float64 pdarrays

8.1. arkouda 217 arkouda, Release 2020.07.07

Examples

>>> ak.union1d(ak.array([-1,0,1]), ak.array([-2,0,2])) array([-2, -1, 0, 1, 2]) arkouda.intersect1d(pda1: arkouda.pdarrayclass.pdarray, pda2: arkouda.pdarrayclass.pdarray, assume_unique: bool = False) → arkouda.pdarrayclass.pdarray Find the intersection of two arrays. Return the sorted, unique values that are in both of the input arrays. Parameters • pda1 (pdarray) – Input array • pda2 (pdarray) – Input array • assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False. Returns Sorted 1D array of common and unique elements. Return type pdarray Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray • RuntimeError – Raised if the dtype of either pdarray is not supported See also: unique, union1d

Notes

ak.intersect1d is not supported for bool or float64 pdarrays

Examples

>>> ak.intersect1d([1,3,4,3], [3,1,2,1]) array([1, 3]) arkouda.setdiff1d(pda1: arkouda.pdarrayclass.pdarray, pda2: arkouda.pdarrayclass.pdarray, assume_unique: bool = False) → arkouda.pdarrayclass.pdarray Find the set difference of two arrays. Return the sorted, unique values in pda1 that are not in pda2. Parameters • pda1 (pdarray) – Input array. • pda2 (pdarray) – Input comparison array. • assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False. Returns Sorted 1D array of values in pda1 that are not in pda2. Return type pdarray

218 Chapter 8. API Reference arkouda, Release 2020.07.07

Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray • RuntimeError – Raised if the dtype of either pdarray is not supported See also: unique, setxor1d

Notes

ak.setdiff1d is not supported for bool or float64 pdarrays

Examples

>>> a= ak.array([1,2,3,2,4,1]) >>> b= ak.array([3,4,5,6]) >>> ak.setdiff1d(a, b) array([1, 2]) arkouda.setxor1d(pda1: arkouda.pdarrayclass.pdarray, pda2: arkouda.pdarrayclass.pdarray, assume_unique: bool = False) → arkouda.pdarrayclass.pdarray Find the set exclusive-or (symmetric difference) of two arrays. Return the sorted, unique values that are in only one (not both) of the input arrays. Parameters • pda1 (pdarray) – Input array. • pda2 (pdarray) – Input array. • assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False. Returns Sorted 1D array of unique values that are in only one of the input arrays. Return type pdarray Raises • TypeError – Raised if either pda1 or pda2 is not a pdarray • RuntimeError – Raised if the dtype of either pdarray is not supported

Notes

ak.setxor1d is not supported for bool or float64 pdarrays

8.1. arkouda 219 arkouda, Release 2020.07.07

Examples

>>> a= ak.array([1,2,3,2,4]) >>> b= ak.array([2,3,5,7,5]) >>> ak.setxor1d(a,b) array([1, 4, 5, 7]) arkouda.array(a: Union[arkouda.pdarrayclass.pdarray, numpy.ndarray, Iterable]) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Convert a Python or Numpy Iterable to a pdarray or Strings object, sending the corresponding data to the arkouda server. Parameters a (Union[pdarray, np.ndarray]) – Rank-1 array of a supported dtype Returns A pdarray instance stored on arkouda server or Strings instance, which is composed of two pdarrays stored on arkouda server Return type pdarray or Strings Raises • TypeError – Raised if a is not a pdarray, np.ndarray, or Python Iterable such as a list, array, tuple, or deque • RuntimeError – Raised if a is not one-dimensional, nbytes > maxTransferBytes, a.dtype is not supported (not in DTypes), or if the product of a size and a.itemsize > maxTransferBytes • ValueError – Raised if the returned message is malformed or does not contain the fields required to generate the array. See also: pdarray.to_ndarray

Notes

The number of bytes in the input array cannot exceed arkouda.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overwhelming the connection between the Python client and the arkouda server, under the assumption that it is a low-bandwidth connection. The user may override this limit by setting ak.maxTransferBytes to a larger value, but should proceed with caution. If the pdrray or ndarray is of type U, this method is called twice recursively to create the Strings object and the two corresponding pdarrays for string bytes and offsets, respectively.

Examples

>>> ak.array(np.arange(1,10)) array([1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> ak.array(range(1,10)) array([1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> strings= ak.array([ 'string {}'.format(i) for i in range(0,5)]) >>> type(strings)

220 Chapter 8. API Reference arkouda, Release 2020.07.07 arkouda.zeros(size: arkouda.dtypes.int_scalars, dtype: type = np.float64) → arkouda.pdarrayclass.pdarray Create a pdarray filled with zeros. Parameters • size (int_scalars) – Size of the array (only rank-1 arrays supported) • dtype (all_scalars) – Type of resulting array, default float64 Returns Zeros of the requested size and dtype Return type pdarray Raises TypeError – Raised if the supplied dtype is not supported or if the size parameter is neither an int nor a str that is parseable to an int. See also: ones, zeros_like

Examples

>>> ak.zeros(5, dtype=ak.int64) array([0, 0, 0, 0, 0])

>>> ak.zeros(5, dtype=ak.float64) array([0, 0, 0, 0, 0])

>>> ak.zeros(5, dtype=ak.bool) array([False, False, False, False, False]) arkouda.ones(size: arkouda.dtypes.int_scalars, dtype: type = float64) → arkouda.pdarrayclass.pdarray Create a pdarray filled with ones. Parameters • size (int_scalars) – Size of the array (only rank-1 arrays supported) • dtype (Union[float64, int64, bool]) – Resulting array type, default float64 Returns Ones of the requested size and dtype Return type pdarray Raises TypeError – Raised if the supplied dtype is not supported or if the size parameter is neither an int nor a str that is parseable to an int. See also: zeros, ones_like

8.1. arkouda 221 arkouda, Release 2020.07.07

Examples

>>> ak.ones(5, dtype=ak.int64) array([1, 1, 1, 1, 1])

>>> ak.ones(5, dtype=ak.float64) array([1, 1, 1, 1, 1])

>>> ak.ones(5, dtype=ak.bool) array([True, True, True, True, True]) arkouda.zeros_like(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Create a zero-filled pdarray of the same size and dtype as an existing pdarray. Parameters pda (pdarray) – Array to use for size and dtype Returns Equivalent to ak.zeros(pda.size, pda.dtype) Return type pdarray Raises TypeError – Raised if the pda parameter is not a pdarray. See also: zeros, ones_like

Examples

>>> zeros= ak.zeros(5, dtype=ak.int64) >>> ak.zeros_like(zeros) array([0, 0, 0, 0, 0])

>>> zeros= ak.zeros(5, dtype=ak.float64) >>> ak.zeros_like(zeros) array([0, 0, 0, 0, 0])

>>> zeros= ak.zeros(5, dtype=ak.bool) >>> ak.zeros_like(zeros) array([False, False, False, False, False]) arkouda.ones_like(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Create a one-filled pdarray of the same size and dtype as an existing pdarray. Parameters pda (pdarray) – Array to use for size and dtype Returns Equivalent to ak.ones(pda.size, pda.dtype) Return type pdarray Raises TypeError – Raised if the pda parameter is not a pdarray. See also: ones, zeros_like

222 Chapter 8. API Reference arkouda, Release 2020.07.07

Notes

Logic for generating the pdarray is delegated to the ak.ones method. Accordingly, the supported dtypes match are defined by the ak.ones method.

Examples

>>> ones= ak.ones(5, dtype=ak.int64) >>> ak.ones_like(ones) array([1, 1, 1, 1, 1])

>>> ones= ak.ones(5, dtype=ak.float64) >>> ak.ones_like(ones) array([1, 1, 1, 1, 1])

>>> ones= ak.ones(5, dtype=ak.bool) >>> ak.ones_like(ones) array([True, True, True, True, True]) arkouda.arange(*args) → arkouda.pdarrayclass.pdarray arange([start,] stop[, stride]) Create a pdarray of consecutive integers within the interval [start, stop). If only one arg is given then arg is the stop parameter. If two args are given, then the first arg is start and second is stop. If three args are given, then the first arg is start, second is stop, third is stride. Parameters • start (int_scalars, optional) – Starting value (inclusive) • stop (int_scalars) – Stopping value (exclusive) • stride (int_scalars, optional) – The difference between consecutive elements, the default stride is 1, if stride is specified then start must also be specified. Returns Integers from start (inclusive) to stop (exclusive) by stride Return type pdarray, int64 Raises • TypeError – Raised if start, stop, or stride is not an int object • ZeroDivisionError – Raised if stride == 0 See also: linspace, zeros, ones, randint

8.1. arkouda 223 arkouda, Release 2020.07.07

Notes

Negative strides result in decreasing values. Currently, only int64 pdarrays can be created with this method. For float64 arrays, use the linspace method.

Examples

>>> ak.arange(0,5,1) array([0, 1, 2, 3, 4])

>>> ak.arange(5,0,-1) array([5, 4, 3, 2, 1])

>>> ak.arange(0, 10,2) array([0, 2, 4, 6, 8])

>>> ak.arange(-5,-10,-1) array([-5, -6, -7, -8, -9]) arkouda.linspace(start: arkouda.dtypes.numeric_scalars, stop: arkouda.dtypes.numeric_scalars, length: arkouda.dtypes.int_scalars) → arkouda.pdarrayclass.pdarray Create a pdarray of linearly-spaced floats in a closed interval. Parameters • start (numeric_scalars) – Start of interval (inclusive) • stop (numeric_scalars) – End of interval (inclusive) • length (int_scalars) – Number of points Returns Array of evenly spaced float values along the interval Return type pdarray, float64 Raises TypeError – Raised if start or stop is not a float or int or if length is not anint See also: arange

Notes

If that start is greater than stop, the pdarray values are generated in descending order.

Examples

>>> ak.linspace(0,1,5) array([0, 0.25, 0.5, 0.75, 1])

>>> ak.linspace(start=1, stop=0, length=5) array([1, 0.75, 0.5, 0.25, 0])

224 Chapter 8. API Reference arkouda, Release 2020.07.07

>>> ak.linspace(start=-5, stop=0, length=5) array([-5, -3.75, -2.5, -1.25, 0]) arkouda.randint(low: arkouda.dtypes.numeric_scalars, high: arkouda.dtypes.numeric_scalars, size: arkouda.dtypes.int_scalars, dtype=int64, seed: arkouda.dtypes.int_scalars = None) → arkouda.pdarrayclass.pdarray Generate a pdarray of randomized int, float, or bool values in a specified range bounded by the lowandhigh parameters. Parameters • low (numeric_scalars) – The low value (inclusive) of the range • high (numeric_scalars) – The high value (exclusive for int, inclusive for float) of the range • size (int_scalars) – The length of the returned array • dtype (Union[int64, float64, bool]) – The dtype of the array • seed (int_scalars) – Index for where to pull the first returned value Returns Values drawn uniformly from the specified range having the desired dtype Return type pdarray Raises • TypeError – Raised if dtype.name not in DTypes, size is not an int, low or high is not an int or float, or seed is not anint • ValueError – Raised if size < 0 or if high < low

Notes

Calling randint with dtype=float64 will result in uniform non-integral floating point values.

Examples

>>> ak.randint(0, 10,5) array([5, 7, 4, 8, 3])

>>> ak.randint(0,1,3, dtype=ak.float64) array([0.92176432277231968, 0.083130710959903542, 0.68894208386667544])

>>> ak.randint(0,1,5, dtype=ak.bool) array([True, False, True, True, True])

>>> ak.randint(1,5, 10, seed=2) array([4, 3, 1, 3, 4, 4, 2, 4, 3, 2])

>>> ak.randint(1,5,3, dtype=ak.float64, seed=2) array([2.9160772326374946, 4.353429832157099, 4.5392023718621486])

8.1. arkouda 225 arkouda, Release 2020.07.07

>>> ak.randint(1,5, 10, dtype=ak.bool, seed=2) array([False, True, True, True, True, False, True, True, True, True]) arkouda.uniform(size: arkouda.dtypes.int_scalars, low: arkouda.dtypes.numeric_scalars = float(0.0), high: arkouda.dtypes.numeric_scalars = 1.0, seed: Union[None, arkouda.dtypes.int_scalars] = None) → arkouda.pdarrayclass.pdarray Generate a pdarray with uniformly distributed random float values in a specified range. Parameters • low (float_scalars) – The low value (inclusive) of the range, defaults to 0.0 • high (float_scalars) – The high value (inclusive) of the range, defaults to 1.0 • size (int_scalars) – The length of the returned array • seed (int_scalars, optional) – Value used to initialize the random number generator Returns Values drawn uniformly from the specified range Return type pdarray, float64 Raises • TypeError – Raised if dtype.name not in DTypes, size is not an int, or if either low or high is not an int or float • ValueError – Raised if size < 0 or if high < low

Notes

The logic for uniform is delegated to the ak.randint method which is invoked with a dtype of float64

Examples

>>> ak.uniform(3) array([0.92176432277231968, 0.083130710959903542, 0.68894208386667544])

>>> ak.uniform(size=3,low=0,high=5,seed=0) array([0.30013431967121934, 0.47383036230759112, 1.0441791878997098]) arkouda.standard_normal(size: arkouda.dtypes.int_scalars, seed: Union[None, arkouda.dtypes.int_scalars] = None) → arkouda.pdarrayclass.pdarray Draw real numbers from the standard normal distribution. Parameters • size (int_scalars) – The number of samples to draw (size of the returned array) • seed (int_scalars) – Value used to initialize the random number generator Returns The array of random numbers Return type pdarray, float64 Raises • TypeError – Raised if size is not an int • ValueError – Raised if size < 0

226 Chapter 8. API Reference arkouda, Release 2020.07.07

See also: randint

Notes

For random samples from 푁(휇, 휎2), use: (sigma * standard_normal(size)) + mu

Examples

>>> ak.standard_normal(3,1) array([-0.68586185091150265, 1.1723810583573375, 0.567584107142031]) arkouda.random_strings_uniform(minlen: arkouda.dtypes.int_scalars, maxlen: arkouda.dtypes.int_scalars, size: arkouda.dtypes.int_scalars, characters: str = 'uppercase', seed: Union[None, arkouda.dtypes.int_scalars] = None) → arkouda.strings.Strings Generate random strings with lengths uniformly distributed between minlen and maxlen, and with characters drawn from a specified set. Parameters • minlen (int_scalars) – The minimum allowed length of string • maxlen (int_scalars) – The maximum allowed length of string • size (int_scalars) – The number of strings to generate • characters ((uppercase, lowercase, numeric, printable, binary)) – The set of characters to draw from • seed (Union[None, int_scalars], optional) – Value used to initialize the random number generator Returns The array of random strings Return type Strings Raises ValueError – Raised if minlen < 0, maxlen < minlen, or size < 0 See also: random_strings_lognormal, randint

Examples

>>> ak.random_strings_uniform(minlen=1, maxlen=5, seed=1, size=5) array(['TVKJ', 'EWAB', 'CO', 'HFMD', 'U'])

>>> ak.random_strings_uniform(minlen=1, maxlen=5, seed=1, size=5, ... characters='printable') array(['+5"f', '-P]3', '4k', '~HFF', 'F'])

8.1. arkouda 227 arkouda, Release 2020.07.07 arkouda.random_strings_lognormal(logmean: arkouda.dtypes.numeric_scalars, logstd: arkouda.dtypes.numeric_scalars, size: arkouda.dtypes.int_scalars, characters: str = 'uppercase', seed: Optional[arkouda.dtypes.int_scalars] = None) → arkouda.strings.Strings Generate random strings with log-normally distributed lengths and with characters drawn from a specified set. Parameters • logmean (numeric_scalars) – The log-mean of the length distribution • logstd (numeric_scalars) – The log-standard-deviation of the length distribution • size (int_scalars) – The number of strings to generate • characters ((uppercase, lowercase, numeric, printable, binary)) – The set of characters to draw from • seed (int_scalars, optional) – Value used to initialize the random number generator Returns The Strings object encapsulating a pdarray of random strings Return type Strings Raises • TypeError – Raised if logmean is neither a float nor a int, logstd is not a float, size isnot an int, or if characters is not a str • ValueError – Raised if logstd <= 0 or size < 0 See also: random_strings_lognormal, randint

Notes

The lengths of the generated strings are distributed $Lognormal(mu, sigma^2)$, with 휇 = 푙표푔푚푒푎푛 and 휎 = 푙표푔푠푡푑. Thus, the strings will have an average length of 푒푥푝(휇 + 0.5 * 휎2), a minimum length of zero, and a heavy tail towards longer strings.

Examples

>>> ak.random_strings_lognormal(2, 0.25,5, seed=1) array(['TVKJTE', 'ABOCORHFM', 'LUDMMGTB', 'KWOQNPHZ', 'VSXRRL'])

>>> ak.random_strings_lognormal(2, 0.25,5, seed=1, characters= 'printable') array(['+5"fp-', ']3Q4kC~HF', '=F=`,IE!', 'DjkBa'9(', '5oZ1)=']) arkouda.from_series(series: pandas.Series, dtype: Optional[Union[type, str]] = None) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Converts a Pandas Series to an Arkouda pdarray or Strings object. If dtype is None, the dtype is inferred from the Pandas Series. Otherwise, the dtype parameter is set if the dtype of the Pandas Series is to be overridden or is unknown (for example, in situations where the Series dtype is object). Parameters • series (Pandas Series) – The Pandas Series with a dtype of bool, float64, int64, or string • dtype (Optional[type]) – The valid dtype types are np.bool, np.float64, np.int64, and np.str

228 Chapter 8. API Reference arkouda, Release 2020.07.07

Returns Return type Union[pdarray,Strings] Raises • TypeError – Raised if series is not a Pandas Series object • ValueError – Raised if the Series dtype is not bool, float64, int64, string, datetime, or timedelta

Examples

>>> ak.from_series(pd.Series(np.random.randint(0,10,5))) array([9, 0, 4, 7, 9])

>>> ak.from_series(pd.Series(['1', '2', '3', '4', '5']),dtype=np.int64) array([1, 2, 3, 4, 5])

>>> ak.from_series(pd.Series(np.random.uniform(low=0.0,high=1.0,size=3))) array([0.57600036956445599, 0.41619265571741659, 0.6615356693784662])

>>> ak.from_series(pd.Series(['0.57600036956445599', '0.41619265571741659', '0.6615356693784662']), dtype=np.float64) array([0.57600036956445599, 0.41619265571741659, 0.6615356693784662])

>>> ak.from_series(pd.Series(np.random.choice([True, False],size=5))) array([True, False, True, True, True])

>>> ak.from_series(pd.Series(['True', 'False', 'False', 'True', 'True']), dtype=np.

˓→bool) array([True, True, True, True, True])

>>> ak.from_series(pd.Series(['a', 'b', 'c', 'd', 'e'], dtype="string")) array(['a', 'b', 'c', 'd', 'e'])

>>> ak.from_series(pd.Series(['a', 'b', 'c', 'd', 'e']),dtype=np.str) array(['a', 'b', 'c', 'd', 'e'])

>>> ak.from_series(pd.Series(pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01

˓→')]))) array([1514764800000000000, 1514764800000000000])

8.1. arkouda 229 arkouda, Release 2020.07.07

Notes

The supported datatypes are bool, float64, int64, string, and datetime64[ns]. The data type is either inferred from the the Series or is set via the dtype parameter. Series of datetime or timedelta are converted to Arkouda arrays of dtype int64 (nanoseconds) A Pandas Series containing strings has a dtype of object. Arkouda assumes the Series contains strings and sets the dtype to str arkouda.cast(pda: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings], dt: Union[numpy.dtype, str]) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Cast an array to another dtype. Parameters • pda (pdarray or Strings) – The array of values to cast • dtype (np.dtype or str) – The target dtype to cast values to Returns Array of values cast to desired dtype Return type pdarray or Strings

Notes

The cast is performed according to Chapel’s casting rules and is NOT safe from overflows or underflows. The user must ensure that the target dtype has the precision and capacity to hold the desired result.

Examples

>>> ak.cast(ak.linspace(1.0,5.0,5), dt=ak.int64) array([1, 2, 3, 4, 5])

>>> ak.cast(ak.arange(0,5), dt=ak.float64).dtype dtype('float64')

>>> ak.cast(ak.arange(0,5), dt=ak.bool) array([False, True, True, True, True])

>>> ak.cast(ak.linspace(0,4,5), dt=ak.bool) array([False, True, True, True, True]) arkouda.abs(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise absolute value of the array. Parameters pda (pdarray)– Returns A pdarray containing absolute values of the input array elements Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

230 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> ak.abs(ak.arange(-5,-1)) array([5, 4, 3, 2])

>>> ak.abs(ak.linspace(-5,-1,5)) array([5, 4, 3, 2, 1]) arkouda.log(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise natural log of the array. Parameters pda (pdarray)– Returns A pdarray containing natural log values of the input array elements Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

Notes

Logarithms with other bases can be computed as follows:

Examples

>>> A= ak.array([1, 10, 100]) # Natural log >>> ak.log(A) array([0, 2.3025850929940459, 4.6051701859880918]) # Log base 10 >>> ak.log(A)/ np.log(10) array([0, 1, 2]) # Log base 2 >>> ak.log(A)/ np.log(2) array([0, 3.3219280948873626, 6.6438561897747253]) arkouda.exp(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise exponential of the array. Parameters pda (pdarray)– Returns A pdarray containing exponential values of the input array elements Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

8.1. arkouda 231 arkouda, Release 2020.07.07

Examples

>>> ak.exp(ak.arange(1,5)) array([2.7182818284590451, 7.3890560989306504, 20.085536923187668, 54.

˓→598150033144236])

>>> ak.exp(ak.uniform(5,1.0,5.0)) array([11.84010843172504, 46.454368507659211, 5.5571769623557188, 33.494295836924771, 13.478894913238722]) arkouda.cumsum(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the cumulative sum over the array. The sum is inclusive, such that the i th element of the result is the sum of elements up to and including i. Parameters pda (pdarray)– Returns A pdarray containing cumulative sums for each element of the original pdarray Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.cumsum(ak.arange([1,5])) array([1, 3, 6])

>>> ak.cumsum(ak.uniform(5,1.0,5.0)) array([3.1598310770203937, 5.4110385860243131, 9.1622479306453748, 12.710615785506533, 13.945880905466208])

>>> ak.cumsum(ak.randint(0,1,5, dtype=ak.bool)) array([0, 1, 1, 2, 3]) arkouda.cumprod(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the cumulative product over the array. The product is inclusive, such that the i th element of the result is the product of elements up to and including i. Parameters pda (pdarray)– Returns A pdarray containing cumulative products for each element of the original pdarray Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

232 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> ak.cumprod(ak.arange(1,5)) array([1, 2, 6, 24]))

>>> ak.cumprod(ak.uniform(5,1.0,5.0)) array([1.5728783400481925, 7.0472855509390593, 33.78523998586553, 134.05309592737584, 450.21589865655358]) arkouda.sin(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise sine of the array. Parameters pda (pdarray)– Returns A pdarray containing sin for each element of the original pdarray Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray arkouda.cos(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise cosine of the array. Parameters pda (pdarray)– Returns A pdarray containing cosine for each element of the original pdarray Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray arkouda.hash(pda: arkouda.pdarrayclass.pdarray, full: bool = True) → Union[Tuple[arkouda.pdarrayclass.pdarray, arkouda.pdarrayclass.pdarray], arkouda.pdarrayclass.pdarray] Return an element-wise hash of the array. Parameters • pda (pdarray)– • full (bool) – By default, a 128-bit hash is computed and returned as two int64 arrays. If full=False, then a 64-bit hash is computed and returned as a single int64 array. Returns If full=True, a 2-tuple of pdarrays containing the high and low 64 bits of each hash, respec- tively. If full=False, a single pdarray containing a 64-bit hash Return type hashes Raises TypeError – Raised if the parameter is not a pdarray

Notes

This function uses the SIPhash algorithm, which can output either a 64-bit or 128-bit hash. However, the 64-bit hash runs a significant risk of collisions when applied to more than a few million unique values. Unlessthe number of unique values is known to be small, the 128-bit hash is strongly recommended. Note that this hash should not be used for security, or for any cryptographic application. Not only is SIPhash not intended for such uses, but this implementation employs a fixed key for the hash, which makes it possible foran adversary with control over input to engineer collisions.

8.1. arkouda 233 arkouda, Release 2020.07.07 arkouda.where(condition: arkouda.pdarrayclass.pdarray, A: Union[arkouda.dtypes.numeric_scalars, arkouda.pdarrayclass.pdarray], B: Union[arkouda.dtypes.numeric_scalars, arkouda.pdarrayclass.pdarray]) → arkouda.pdarrayclass.pdarray Returns an array with elements chosen from A and B based upon a conditioning array. As is the case with numpy.where, the return array consists of values from the first array (A) where the conditioning array elements are True and from the second array (B) where the conditioning array elements are False. Parameters • condition (pdarray) – Used to choose values from A or B • A (Union[numeric_scalars, pdarray]) – Value(s) used when condition is True • B (Union[numeric_scalars, pdarray]) – Value(s) used when condition is False Returns Values chosen from A where the condition is True and B where the condition is False Return type pdarray Raises • TypeError – Raised if the condition object is not a pdarray, if A or B is not an int, np.int64, float, np.float64, or pdarray, if pdarray dtypes are not supported or do not match, ormultiple condition clauses (see Notes section) are applied • ValueError – Raised if the shapes of the condition, A, and B pdarrays are unequal

Examples

>>> a1= ak.arange(1,10) >>> a2= ak.ones(9, dtype=np.int64) >>> cond= a1<5 >>> ak.where(cond,a1,a2) array([1, 2, 3, 4, 1, 1, 1, 1, 1])

>>> a1= ak.arange(1,10) >>> a2= ak.ones(9, dtype=np.int64) >>> cond= a1 ==5 >>> ak.where(cond,a1,a2) array([1, 1, 1, 1, 5, 1, 1, 1, 1])

>>> a1= ak.arange(1,10) >>> a2= 10 >>> cond= a1<5 >>> ak.where(cond,a1,a2) array([1, 2, 3, 4, 10, 10, 10, 10, 10])

234 Chapter 8. API Reference arkouda, Release 2020.07.07

Notes

A and B must have the same dtype and only one conditional clause is supported e.g., n < 5, n > 1, which is supported in numpy is not currently supported in Arkouda arkouda.histogram(pda: arkouda.pdarrayclass.pdarray, bins: arkouda.dtypes.int_scalars = 10) → arkouda.pdarrayclass.pdarray Compute a histogram of evenly spaced bins over the range of an array. Parameters • pda (pdarray) – The values to histogram • bins (int_scalars) – The number of equal-size bins to use (default: 10) Returns The number of values present in each bin Return type pdarray, int64 or float64 Raises • TypeError – Raised if the parameter is not a pdarray or if bins is not an int. • ValueError – Raised if bins < 1 • NotImplementedError – Raised if pdarray dtype is bool or uint8 See also: value_counts

Notes

The bins are evenly spaced in the interval [pda.min(), pda.max()]. Currently, the user must re-compute the bin edges, e.g. with np.linspace (see below) in order to plot the histogram.

Examples

>>> import matplotlib.pyplot as plt >>> A= ak.arange(0, 10,1) >>> nbins=3 >>> h= ak.histogram(A, bins=nbins) >>> h array([3, 3, 4]) # Recreate the bin edges in NumPy >>> binEdges= np.linspace(A.min(), A.max(), nbins+1) >>> binEdges array([0., 3., 6., 9.]) # To plot, use only the left edges, and export the histogram to NumPy >>> plt.plot(binEdges[:-1], h.to_ndarray()) arkouda.value_counts(pda: arkouda.pdarrayclass.pdarray) → Union[Categorical, Tuple[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings], Optional[arkouda.pdarrayclass.pdarray]]] Count the occurrences of the unique values of an array. Parameters pda (pdarray, int64) – The array of values to count Returns

8.1. arkouda 235 arkouda, Release 2020.07.07

• unique_values (pdarray, int64 or Strings) – The unique values, sorted in ascending order • counts (pdarray, int64) – The number of times the corresponding unique value occurs Raises TypeError – Raised if the parameter is not a pdarray See also: unique, histogram

Notes

This function differs from histogram() in that it only returns counts for values that are present, leaving out empty “bins”. This function delegates all logic to the unique() method where the return_counts parameter is set to True.

Examples

>>> A= ak.array([2,0,2,4,0,0]) >>> ak.value_counts(A) (array([0, 2, 4]), array([3, 2, 1]))

arkouda.isnan(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Test a pdarray for Not a number / NaN values Currently only supports float-value-based arrays Parameters pda (pdarray to test)– Returns Return type pdarray consisting of True / False values; True where NaN, False otherwise Raises • TypeError – Raised if the parameter is not a pdarray • RuntimeError – if the underlying pdarray is not float-based arkouda.ls_hdf(filename: str) → str This function calls the h5ls utility on a filename visible to the arkouda server. Parameters filename (str) – The name of the file to pass to h5ls Returns The string output of h5ls from the server Return type str Raises • TypeError – Raised if filename is not a str • ValueError – Raised if filename is empty or contains only whitespace • RuntimeError – Raised if error occurs in executing ls on an HDF5 file arkouda.read_hdf(dsetName: str, filenames: Union[str, List[str]], strictTypes: bool = True, allow_errors: bool = False, calc_string_offsets: bool = False) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Read a single dataset from multiple HDF5 files into an Arkouda pdarray or Strings object. Parameters • dsetName (str) – The name of the dataset (must be the same across all files)

236 Chapter 8. API Reference arkouda, Release 2020.07.07

• filenames (list or str) – Either a list of filenames or shell expression • strictTypes (bool) – If True (default), require all dtypes in all files to have the same precision and sign. If False, allow dtypes of different precision and sign across different files. For example, if one file contains a uint32 dataset and another contains an int64 dataset, the contents of both will be read into an int64 pdarray. • allow_errors (bool) – Default False, if True will allow files with read errors to be skipped instead of failing. A warning will be included in the return containing the total number of files skipped due to failure and up to 10 filenames. • calc_string_offsets (bool) – Default False, if True this will tell the server to calculate the offsets/segments array on the server versus loading them from HDF5 files. In thefuture this option may be set to True as the default. Returns A pdarray or Strings instance pointing to the server-side data Return type Union[pdarray,Strings] Raises • TypeError – Raised if dsetName is not a str or if filenames is neither a string nor a listof strings • ValueError – Raised if all datasets are not present in all hdf5 files • RuntimeError – Raised if one or more of the specified files cannot be opened See also: get_datasets, ls_hdf , read_all, load, save

Notes

If filenames is a string, it is interpreted as a shell expression (a single filename is a valid expression, soitwill work) and is expanded with glob to read all matching files. Use get_datasets to show the names of datasets in HDF5 files. If dsetName is not present in all files, a TypeError is raised. arkouda.read_all(filenames: Union[str, List[str]], datasets: Optional[Union[str, List[str]]] = None, iterative: bool = False, strictTypes: bool = True, allow_errors: bool = False, calc_string_offsets=False) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, Mapping[str, Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]] Read datasets from HDF5 files. Parameters • filenames (list or str) – Either a list of filenames or shell expression • datasets (list or str or None) – (List of) name(s) of dataset(s) to read (default: all available) • iterative (bool) – Iterative (True) or Single (False) function call(s) to server • strictTypes (bool) – If True (default), require all dtypes of a given dataset to have the same precision and sign. If False, allow dtypes of different precision and sign across different files. For example, if one file contains a uint32 dataset and another contains an int64 dataset with the same name, the contents of both will be read into an int64 pdarray. • allow_errors (bool) – Default False, if True will allow files with read errors to be skipped instead of failing. A warning will be included in the return containing the total number of files skipped due to failure and up to 10 filenames.

8.1. arkouda 237 arkouda, Release 2020.07.07

• calc_string_offsets (bool) – Default False, if True this will tell the server to calculate the offsets/segments array on the server versus loading them from HDF5 files. In thefuture this option may be set to True as the default. Returns • For a single dataset returns an Arkouda pdarray or Arkouda Strings object • and for multiple datasets returns a dictionary of Arkouda pdarrays or • Arkouda Strings. – Dictionary of {datasetName: pdarray or String} Raises • ValueError – Raised if all datasets are not present in all hdf5 files or if one or more of the specified files do not exist • RuntimeError – Raised if one or more of the specified files cannot be opened. If al- low_errors is true this may be raised if no values are returned from the server. • TypeError – Raised if we receive an unknown arkouda_type returned from the server See also: read_hdf , get_datasets, ls_hdf

Notes

If filenames is a string, it is interpreted as a shell expression (a single filename is a valid expression, soitwill work) and is expanded with glob to read all matching files. If iterative == True each dataset name and file names are passed to the server as independent sequential strings while if iterative == False all dataset names and file names are passed to the server in a single string. If datasets is None, infer the names of datasets from the first file and read all of them. Use get_datasets to show the names of datasets to HDF5 files. arkouda.load(path_prefix: str, dataset: str = 'array', calc_string_offsets: bool = False) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Load a pdarray previously saved with pdarray.save(). Parameters • path_prefix (str) – Filename prefix used to save the original pdarray • dataset (str) – Dataset name where the pdarray was saved, defaults to ‘array’ • calc_string_offsets (bool) – If True the server will ignore Segmented Strings ‘offsets’ array and derive it from the null-byte terminators. Defaults to False currently Returns The pdarray or Strings that was previously saved Return type Union[pdarray, Strings] Raises • TypeError – Raised if either path_prefix or dataset is not astr • ValueError – Raised if the dataset is not present in all hdf5 files or if the path_prefix does not correspond to files accessible to Arkouda • RuntimeError – Raised if the hdf5 files are present but there is an error in opening oneor more of them

238 Chapter 8. API Reference arkouda, Release 2020.07.07

See also: save, load_all, read_hdf , read_all arkouda.get_datasets(filename: str) → List[str] Get the names of datasets in an HDF5 file. Parameters filename (str) – Name of an HDF5 file visible to the arkouda server Returns Names of the datasets in the file Return type List[str] Raises • TypeError – Raised if filename is not a str • ValueError – Raised if filename is empty or contains only whitespace • RuntimeError – Raised if error occurs in executing ls on an HDF5 file See also: ls_hdf arkouda.load_all(path_prefix: str) → Mapping[str, Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings, arkouda.categorical.Categorical]] Load multiple pdarrays or Strings previously saved with save_all(). Parameters path_prefix (str) – Filename prefix used to save the original pdarray Returns Dictionary of {datsetName: pdarray} with the previously saved pdarrays Return type Mapping[str,pdarray] Raises • TypeError: – Raised if path_prefix is not a str • ValueError – Raised if all datasets are not present in all hdf5 files or if the path_prefix does not correspond to files accessible to Arkouda • RuntimeError – Raised if the hdf5 files are present but there is an error in opening oneor more of them See also: save_all, load, read_hdf , read_all arkouda.save_all(columns: Union[Mapping[str, arkouda.pdarrayclass.pdarray], List[arkouda.pdarrayclass.pdarray]], prefix_path: str, names: List[str] = None, mode: str = 'truncate') → None Save multiple named pdarrays to HDF5 files. Parameters • columns (dict or list of pdarrays) – Collection of arrays to save • prefix_path (str) – Directory and filename prefix for output files • names (list of str) – Dataset names for the pdarrays • mode ({'truncate' | 'append'}) – By default, truncate (overwrite) the output files if they exist. If ‘append’, attempt to create new dataset in existing files. Returns Return type None

8.1. arkouda 239 arkouda, Release 2020.07.07

Raises ValueError – Raised if (1) the lengths of columns and values differ or (2) the mode is not ‘truncate’ or ‘append’ See also: save, load_all

Notes

Creates one file per locale containing that locale’s chunk of each pdarray. If columns is a dictionary, thekeys are used as the HDF5 dataset names. Otherwise, if no names are supplied, 0-up integers are used. By default, any existing files at path_prefix will be overwritten, unless the user specifies the ‘append’ mode, inwhichcase arkouda will attempt to add as new datasets to existing files. If the wrong number of files is present or dataset names already exist, a RuntimeError is raised. class arkouda.GroupBy(keys: groupable, assume_sorted: bool = False, hash_strings: bool = True) Group an array or list of arrays by value, usually in preparation for aggregating the within-group values of another array. Parameters • keys ((list of ) pdarray, int64, Strings, or Categorical) – The array to group by value, or if list, the column arrays to group by row • assume_sorted (bool) – If True, assume keys is already sorted (Default: False) nkeys The number of key arrays (columns) Type int size The length of the input array(s), i.e. number of rows Type int permutation The permutation that sorts the keys array(s) by value (row) Type pdarray unique_keys The unique values of the keys array(s), in grouped order Type (list of) pdarray, Strings, or Categorical ngroups The length of the unique_keys array(s), i.e. number of groups Type int segments The start index of each group in the grouped array(s) Type pdarray logger Used for all logging operations Type ArkoudaLogger

Raises TypeError – Raised if keys is a pdarray with a dtype other than int64

240 Chapter 8. API Reference arkouda, Release 2020.07.07

Notes

Only accepts (list of) pdarrays of int64 dtype, Strings, or Categorical. Reductions find_segments(self ) → None count(self ) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Count the number of elements in each group, i.e. the number of times each key appears. Parameters none – Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • counts (pdarray, int64) – The number of times each unique key appears

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 2, 3, 1, 2, 4, 3, 4, 3, 4]) >>> g= ak.GroupBy(a) >>> keys,counts=g.count() >>> keys array([1, 2, 3, 4]) >>> counts array([1, 2, 4, 3])

aggregate(self, values: groupable, operator: str, skipna: bool = True) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and apply a reduction to each group’s values. Parameters • values (pdarray) – The values to group and reduce • operator (str) – The name of the reduction operator to use Returns • unique_keys (groupable) – The unique keys, in grouped order • aggregates (groupable) – One aggregate value per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if the requested operator is not supported for the values dtype

8.1. arkouda 241 arkouda, Release 2020.07.07

Examples

>>> keys= ak.arange(0, 10) >>> vals= ak.linspace(-1,1, 10) >>> g= ak.GroupBy(keys) >>> g.aggregate(vals, 'sum') (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([-1, -0.77777777777777768, -0.55555555555555536, -0.33333333333333348, -0.11111111111111116, 0.11111111111111116, 0.33333333333333348, 0.55555555555555536, 0.

˓→77777777777777768, 1])) >>> g.aggregate(vals, 'min') (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([-1, -0.77777777777777779, -0.55555555555555558, -0.33333333333333337, -0.11111111111111116, 0.

˓→11111111111111116, 0.33333333333333326, 0.55555555555555536, 0.77777777777777768, 1]))

sum(self, values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and sum each group’s values. Parameters values (pdarray) – The values to group and sum Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_sums (pdarray) – One sum per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The grouped sum of a boolean pdarray returns integers.

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.sum(b) (array([2, 3, 4]), array([8, 14, 6]))

242 Chapter 8. API Reference arkouda, Release 2020.07.07

prod(self, values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and compute the product of each group’s values. Parameters values (pdarray) – The values to group and multiply Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_products (pdarray, float64) – One product per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if prod is not supported for the values dtype

Notes

The return dtype is always float64.

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.prod(b) (array([2, 3, 4]), array([12, 108.00000000000003, 8.9999999999999982]))

mean(self, values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and compute the mean of each group’s values. Parameters values (pdarray) – The values to group and average Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_means (pdarray, float64) – One mean value per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

8.1. arkouda 243 arkouda, Release 2020.07.07

Notes

The return dtype is always float64.

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.mean(b) (array([2, 3, 4]), array([2.6666666666666665, 2.7999999999999998, 3]))

min(self, values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the mini- mum of each group’s values. Parameters values (pdarray) – The values to group and find minima Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_minima (pdarray) – One minimum per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object or if min is not supported for the values dtype • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if min is not supported for the values dtype

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.min(b) (array([2, 3, 4]), array([1, 1, 3]))

244 Chapter 8. API Reference arkouda, Release 2020.07.07

max(self, values: arkouda.pdarrayclass.pdarray, skipna: bool = True) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the maxi- mum of each group’s values. Parameters values (pdarray) – The values to group and find maxima Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_maxima (pdarray) – One maximum per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object or if max is not supported for the values dtype • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if max is not supported for the values dtype

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.max(b) (array([2, 3, 4]), array([4, 4, 3]))

argmin(self, values: arkouda.pdarrayclass.pdarray) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the location of the first minimum of each group’s values. Parameters values (pdarray) – The values to group and find argmin Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_argminima (pdarray, int64) – One index per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object or if argmax is not supported for the values dtype • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if argmin is not supported for the values dtype

8.1. arkouda 245 arkouda, Release 2020.07.07

Notes

The returned indices refer to the original values array as passed in, not the permutation applied by the GroupBy instance.

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.argmin(b) (array([2, 3, 4]), array([5, 4, 2]))

argmax(self, values: arkouda.pdarrayclass.pdarray) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the location of the first maximum of each group’s values. Parameters values (pdarray) – The values to group and find argmax Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_argmaxima (pdarray, int64) – One index per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray object or if argmax is not supported for the values dtype • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The returned indices refer to the original values array as passed in, not the permutation applied by the GroupBy instance.

Examples

>>> a= ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g= ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b= ak.randint(1,5,10) >>> b (continues on next page)

246 Chapter 8. API Reference arkouda, Release 2020.07.07

(continued from previous page) array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.argmax(b) (array([2, 3, 4]), array([9, 3, 2]))

nunique(self, values: groupable) → Tuple[groupable, arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the number of unique values in each group. Parameters values (pdarray, int64) – The values to group and find unique values Returns • unique_keys (groupable) – The unique keys, in grouped order • group_nunique (groupable) – Number of unique values per unique key in the GroupBy instance Raises • TypeError – Raised if the dtype(s) of values array(s) does/do not support the nunique method • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if nunique is not supported for the values dtype

Examples

>>> data= ak.array([3,4,3,1,1,4,3,4,1,4]) >>> data array([3, 4, 3, 1, 1, 4, 3, 4, 1, 4]) >>> labels= ak.array([1,1,1,2,2,2,3,3,3,4]) >>> labels ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4]) >>> g= ak.GroupBy(labels) >>> g.keys ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4]) >>> g.nunique(data) array([1,2,3,4]), array([2, 2, 3, 1]) # Group (1,1,1) has values [3,4,3] -> there are 2 unique values 3&4 # Group (2,2,2) has values [1,1,4] -> 2 unique values 1&4 # Group (3,3,3) has values [3,4,1] -> 3 unique values # Group (4) has values [4] -> 1 unique value

any(self, values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and perform an “or” reduction on each group. Parameters values (pdarray, bool) – The values to group and reduce with “or” Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_any (pdarray, bool) – One bool per unique key in the GroupBy instance

8.1. arkouda 247 arkouda, Release 2020.07.07

Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not bool • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array all(self, values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Using the permutation stored in the GroupBy instance, group another array of values and perform an “and” reduction on each group. Parameters values (pdarray, bool) – The values to group and reduce with “and” Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • group_any (pdarray, bool) – One bool per unique key in the GroupBy instance Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not bool • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if all is not supported for the values dtype OR(self, values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Bitwise OR of values in each segment. Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise OR reduction on each group. Parameters values (pdarray, int64) – The values to group and reduce with OR Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • result (pdarray, int64) – Bitwise OR of values in segments corresponding to keys Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64 • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if all is not supported for the values dtype AND(self, values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Bitwise AND of values in each segment. Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise AND reduction on each group. Parameters values (pdarray, int64) – The values to group and reduce with AND Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • result (pdarray, int64) – Bitwise AND of values in segments corresponding to keys

248 Chapter 8. API Reference arkouda, Release 2020.07.07

Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64 • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if all is not supported for the values dtype XOR(self, values: arkouda.pdarrayclass.pdarray) → Tuple[Union[arkouda.pdarrayclass.pdarray, List[Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]], arkouda.pdarrayclass.pdarray] Bitwise XOR of values in each segment. Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise XOR reduction on each group. Parameters values (pdarray, int64) – The values to group and reduce with XOR Returns • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order • result (pdarray, int64) – Bitwise XOR of values in segments corresponding to keys Raises • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64 • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array • RuntimeError – Raised if all is not supported for the values dtype broadcast(self, values: arkouda.pdarrayclass.pdarray, permute: bool = True) → arkouda.pdarrayclass.pdarray Fill each group’s segment with a constant value. Parameters • values (pdarray) – The values to put in each group’s segment • permute (bool) – If True (default), permute broadcast values back to the ordering of the original array on which GroupBy was called. If False, the broadcast values are grouped by value. Returns The broadcast values Return type pdarray Raises • TypeError – Raised if value is not a pdarray object • ValueError – Raised if the values array does not have one value per segment

8.1. arkouda 249 arkouda, Release 2020.07.07

Notes

This function is a sparse analog of np.broadcast. If a GroupBy object represents a sparse matrix (tensor), then this function takes a (dense) column vector and replicates each value to the non-zero elements in the corresponding row.

Examples

>>> a= ak.array([0,1,0,1,0]) >>> values= ak.array([3,5]) >>> g= ak.GroupBy(a) # By default, result is in original order >>> g.broadcast(values) array([3, 5, 3, 5, 3])

# With permute=False, result is in grouped order >>> g.broadcast(values, permute=False) array([3, 3, 3, 5, 5] >>> a= ak.randint(1,5,10) >>> a array([3, 1, 4, 4, 4, 1, 3, 3, 2, 2]) >>> g= ak.GroupBy(a) >>> keys,counts=g.count() >>> g.broadcast(counts>2) array([True False True True True False True True False False]) >>> g.broadcast(counts ==3) array([True False True True True False True True False False]) >>> g.broadcast(counts<4) array([True True True True True True True True True True]) arkouda.broadcast(segments: arkouda.pdarrayclass.pdarray, values: arkouda.pdarrayclass.pdarray, size: Union[int, numpy.int64] = - 1, permutation: Union[arkouda.pdarrayclass.pdarray, None] = None) Broadcast a dense column vector to the rows of a sparse matrix or grouped array. Parameters • segments (pdarray, int64) – Offsets of the start of each row in the sparse matrixor grouped array. Must be sorted in ascending order. • values (pdarray) – The values to broadcast, one per row (or group) • size (int) – The total number of nonzeros in the matrix. If permutation is given, this argument is ignored and the size is inferred from the permutation array. • permutation (pdarray, int64) – The permutation to go from the original ordering of nonzeros to the ordering grouped by row. To broadcast values back to the original ordering, this permutation will be inverted. If no permutation is supplied, it is assumed that the original nonzeros were already grouped by row. In this case, the size argument must be given. Returns The broadcast values, one per nonzero Return type pdarray Raises ValueError – • If segments and values are different sizes

250 Chapter 8. API Reference arkouda, Release 2020.07.07

• If segments are empty • If number of nonzeros (either user-specified or inferred from permutation) is less thanone

Examples

# Define a sparse matrix with 3 rows and 7 nonzeros >>> row_starts = ak.array([0, 2, 5]) >>> nnz =7#Broad- cast the row number to each nonzero element >>> row_number = ak.arange(3) >>> ak.broadcast(row_starts, row_number, nnz) array([0 0 1 1 1 2 2]) # If the original nonzeros were in reverse order... >>> permutation = ak.arange(6, -1, -1) >>> ak.broadcast(row_starts, row_number, permutation=permutation) array([2 2 1 1 1 0 0]) arkouda.GROUPBY_REDUCTION_TYPES class arkouda.Strings(offset_attrib: Union[arkouda.pdarrayclass.pdarray, str], bytes_attrib: Union[arkouda.pdarrayclass.pdarray, str]) Represents an array of strings whose data resides on the arkouda server. The user should not call this class directly; rather its instances are created by other arkouda functions. offsets The starting indices for each string Type pdarray bytes The raw bytes of all strings, joined by nulls Type pdarray size The number of strings in the array Type int_scalars nbytes The total number of bytes in all strings Type int_scalars ndim The rank of the array (currently only rank 1 arrays supported) Type int_scalars shape The sizes of each dimension of the array Type tuple dtype The dtype is ak.str Type dtype regex_dict Dictionary storing information on matches (cache of Strings.find_locations(pattern)) Keys - regex patterns Values - tuples of pdarrays (numMatches, matchStarts, matchLens) Type Dict[str, Tuple[pdarray, pdarray, pdarray]] logger Used for all logging operations

8.1. arkouda 251 arkouda, Release 2020.07.07

Type ArkoudaLogger

Notes

Strings is composed of two pdarrays: (1) offsets, which contains the starting indices for each string and (2)bytes, which contains the raw bytes of all strings, delimited by nulls. BinOps objtype = str abstract __iter__(self ) __len__(self ) → int __str__(self ) → str Return str(self). __repr__(self ) → str Return repr(self). _binop(self, other: Union[Strings, arkouda.dtypes.str_scalars], op: str) → arkouda.pdarrayclass.pdarray Executes the requested binop on this Strings instance and the parameter Strings object and returns the results within a pdarray object. Parameters • other (Strings, str_scalars) – the other object is a Strings object • op (str) – name of the binary operation to be performed Returns encapsulating the results of the requested binop Return type pdarray Raises • ValueError – Raised if (1) the op is not in the self.BinOps set, or (2) if the sizes of this and the other instance don’t match, or (3) the other object is not a Strings object • RuntimeError – Raised if a server-side error is thrown while executing the binary opera- tion __eq__(self, other) → bool Return self==value. __ne__(self, other) → bool Return self!=value. __getitem__(self, key) get_lengths(self ) → arkouda.pdarrayclass.pdarray Return the length of each string in the array. Returns The length of each string Return type pdarray, int Raises RuntimeError – Raised if there is a server-side error thrown cached_regex_patterns(self ) Returns the regex patterns for which Strings.find_locations(pattern) have been cached

252 Chapter 8. API Reference arkouda, Release 2020.07.07

find_locations(self, pattern: Union[bytes, arkouda.dtypes.str_scalars]) → Tuple[arkouda.pdarrayclass.pdarray, arkouda.pdarrayclass.pdarray, arkouda.pdarrayclass.pdarray]

Finds pattern matches and returns pdarrays containing the number, start postitions, and lengths of matches Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Parameters pattern (str_scalars) – The regex pattern used to find matches Returns • pdarray, int64 – For each original string, the number of pattern matches • pdarray, int64 – The start positons of pattern matches • pdarray, int64 – The lengths of pattern matches Raises • TypeError – Raised if the pattern parameter is not bytes or str_scalars • ValueError – Rasied if pattern is not a valid regex • RuntimeError – Raised if there is a server-side error thrown

See also: Strings.findall, Strings.match

Examples

>>> strings= ak.array([ '{} string {}'.format(i, i) for i in range(1,6)]) >>> strings array(['1 string 1', '2 string 2', '3 string 3', '4 string 4', '5 string 5']) >>> num_matches, starts, lens= strings.find_locations( '\d') >>> num_matches array([2, 2, 2, 2, 2]) >>> starts array([0, 9, 11, 20, 22, 31, 33, 42, 44, 53]) >>> lens array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]))

findall(self, pattern: Union[bytes, arkouda.dtypes.str_scalars], return_match_origins: bool = False) → Union[Strings, Tuple]

Return all non-overlapping matches of pattern in Strings as a new Strings object Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Parameters • pattern (str_scalars) – The regex pattern used to find matches • return_match_origins (bool) – If True, return a pdarray containing the index of the original string each pattern match is from Returns • Strings – Strings object containing only pattern matches

8.1. arkouda 253 arkouda, Release 2020.07.07

• pdarray, int64 (optional) – The index of the original string each pattern match is from Raises • TypeError – Raised if the pattern parameter is not bytes or str_scalars • ValueError – Rasied if pattern is not a valid regex • RuntimeError – Raised if there is a server-side error thrown

See also: Strings.find_locations, Strings.match

Examples

>>> strings= ak.array([ '{} string {}'.format(i, i) for i in range(1,6)]) >>> strings array(['1 string 1', '2 string 2', '3 string 3', '4 string 4', '5 string 5']) >>> matches, match_origins= strings.findall( '\d', return_match_origins= True) >>> matches array(['1', '1', '2', '2', '3', '3', '4', '4', '5', '5']) >>> match_origins array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4])

contains(self, substr: Union[bytes, arkouda.dtypes.str_scalars], regex: bool = False) → arkouda.pdarrayclass.pdarray Check whether each element contains the given substring. Parameters • substr (str_scalars) – The substring in the form of string or byte array to search for • regex (bool) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns True for elements that contain substr, False otherwise Return type pdarray, bool Raises • TypeError – Raised if the substr parameter is not bytes or str_scalars • ValueError – Rasied if substr is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: Strings.startswith, Strings.endswith

254 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> strings= ak.array([ '{} string {}'.format(i, i) for i in range(1,6)]) >>> strings array(['1 string 1', '2 string 2', '3 string 3', '4 string 4', '5 string 5']) >>> strings.contains('string') array([True, True, True, True, True]) >>> strings.contains('string\d ', regex=True) array([True, True, True, True, True])

startswith(self, substr: Union[bytes, arkouda.dtypes.str_scalars], regex: bool = False) → arkouda.pdarrayclass.pdarray Check whether each element starts with the given substring. Parameters • substr (Union[bytes, str_scalars]) – The prefix to search for • regex (bool) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns True for elements that start with substr, False otherwise Return type pdarray, bool Raises • TypeError – Raised if the substr parameter is not a bytes ior str_scalars • ValueError – Rasied if substr is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: Strings.contains, Strings.endswith

Examples

>>> strings_end= ak.array([ 'string {}'.format(i) for i in range(1,6)]) >>> strings_end array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5']) >>> strings_end.startswith('string') array([True, True, True, True, True]) >>> strings_start= ak.array([ '{} string'.format(i) for i in range(1,6)]) >>> strings_start array(['1 string', '2 string', '3 string', '4 string', '5 string']) >>> strings_start.startswith('\d str', regex= True) array([True, True, True, True, True])

endswith(self, substr: Union[bytes, arkouda.dtypes.str_scalars], regex: bool = False) → arkouda.pdarrayclass.pdarray Check whether each element ends with the given substring. Parameters • substr (Union[bytes, str_scalars]) – The suffix to search for • regex (bool) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

8.1. arkouda 255 arkouda, Release 2020.07.07

Returns True for elements that end with substr, False otherwise Return type pdarray, bool Raises • TypeError – Raised if the substr parameter is not bytes or str_scalars • ValueError – Rasied if substr is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: Strings.contains, Strings.startswith

Examples

>>> strings_start= ak.array([ '{} string'.format(i) for i in range(1,6)]) >>> strings_start array(['1 string', '2 string', '3 string', '4 string', '5 string']) >>> strings_start.endswith('ing') array([True, True, True, True, True]) >>> strings_end= ak.array([ 'string {}'.format(i) for i in range(1,6)]) >>> strings_end array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5']) >>> strings_end.endswith('ing\d ', regex= True) array([True, True, True, True, True])

match(self, pattern: Union[bytes, arkouda.dtypes.str_scalars]) → arkouda.pdarrayclass.pdarray For each element check whether the entire element matches the given regex, pattern. Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Parameters pattern (str_scalars) – The regex in the form of string or byte array to search for Returns True for elements that match pattern, False otherwise Return type pdarray, bool Raises • TypeError – Raised if the pattern parameter is not bytes or str_scalars • ValueError – Rasied if pattern is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: Strings.contains, Strings.startswith, Strings.endswith

256 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> strings= ak.array([ '{} string {}'.format(i, i) for i in range(1,6)]) >>> strings array(['1 string 1', '2 string 2', '3 string 3', '4 string 4', '5 string 5']) >>> strings.match('\d string\d ') array([True, True, True, True, True]) >>> strings.match('ing\d ') array([False, False, False, False, False])

flatten(self, delimiter: str, return_segments: bool = False, regex: bool = False) → Union[Strings, Tuple] Unpack delimiter-joined substrings into a flat array. Parameters • delimiter (str) – Characters used to split strings into substrings • return_segments (bool) – If True, also return mapping of original strings to first sub- string in return array. • regex (bool) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns • Strings – Flattened substrings with delimiters removed • pdarray, int64 (optional) – For each original string, the index of first corresponding sub- string in the return array See also: peel, rpeel

Examples

>>> orig= ak.array([ 'one|two', 'three|four|five', 'six']) >>> orig.flatten('|') array(['one', 'two', 'three', 'four', 'five', 'six']) >>> flat, map= orig.flatten( '|', return_segments=True) >>> map array([0, 2, 5]) >>> under= ak.array([ 'one_two', 'three_____four____five', 'six']) >>> under_flat, under_map= under.flatten( '_+', return_segments=True,␣

˓→regex=True) >>> under_flat array(['one', 'two', 'three', 'four', 'five', 'six']) >>> under_map array([0, 2, 5])

peel(self, delimiter: Union[bytes, arkouda.dtypes.str_scalars], times: arkouda.dtypes.int_scalars = 1, includeDelimiter: bool = False, keepPartial: bool = False, fromRight: bool = False, regex: bool = False) → Tuple Peel off one or more delimited fields from each string (similar to string.partition), returning two newarrays of strings. Warning: This function is experimental and not guaranteed to work. Parameters

8.1. arkouda 257 arkouda, Release 2020.07.07

• delimiter (Union[bytes, str_scalars]) – The separator where the split will occur • times (Union[int, np.int64]) – The number of times the delimiter is sought, i.e. skip over the first (times-1) delimiters • includeDelimiter (bool) – If true, append the delimiter to the end of the first return array. By default, it is prepended to the beginning of the second return array. • keepPartial (bool) – If true, a string that does not contain instances of the delimiter will be returned in the first array. By default, such strings are returned in the second array. • fromRight (bool) – If true, peel from the right instead of the left (see also rpeel) • regex (bool) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns left: Strings The field(s) peeled from the end of each string (unless fromRight istrue) right: Strings The remainder of each string after peeling (unless fromRight is true) Return type Tuple[Strings, Strings] Raises • TypeError – Raised if the delimiter parameter is not byte or str_scalars, if times is not int64, or if includeDelimiter, keepPartial, or fromRight is not bool • ValueError – Raised if times is < 1 or if delimiter is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: rpeel, stick, lstick

Examples

>>> s= ak.array([ 'a.b', 'c.d', 'e.f.g']) >>> s.peel('.') (array(['a', 'c', 'e']), array(['b', 'd', 'f.g'])) >>> s.peel('.', includeDelimiter=True) (array(['a.', 'c.', 'e.']), array(['b', 'd', 'f.g'])) >>> s.peel('.', times=2) (array(['', '', 'e.f']), array(['a.b', 'c.d', 'g'])) >>> s.peel('.', times=2, keepPartial=True) (array(['a.b', 'c.d', 'e.f']), array(['', '', 'g']))

rpeel(self, delimiter: Union[bytes, arkouda.dtypes.str_scalars], times: arkouda.dtypes.int_scalars = 1, includeDelimiter: bool = False, keepPartial: bool = False, regex: bool = False) Peel off one or more delimited fields from the end of each string (similar to string.rpartition), returningtwo new arrays of strings. Warning: This function is experimental and not guaranteed to work. Parameters • delimiter (Union[bytes, str_scalars]) – The separator where the split will occur • times (Union[int, np.int64]) – The number of times the delimiter is sought, i.e. skip over the last (times-1) delimiters

258 Chapter 8. API Reference arkouda, Release 2020.07.07

• includeDelimiter (bool) – If true, prepend the delimiter to the start of the first return array. By default, it is appended to the end of the second return array. • keepPartial (bool) – If true, a string that does not contain instances of the delimiter will be returned in the second array. By default, such strings are returned in the first array. • regex (bool) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) Returns left: Strings The remainder of the string after peeling right: Strings The field(s) that were peeled from the right of each string Return type Tuple[Strings, Strings] Raises • TypeError – Raised if the delimiter parameter is not bytes or str_scalars or if times is not int64 • ValueError – Raised if times is < 1 or if delimiter is not a valid regex • RuntimeError – Raised if there is a server-side error thrown See also: peel, stick, lstick

Examples

>>> s= ak.array([ 'a.b', 'c.d', 'e.f.g']) >>> s.rpeel('.') (array(['a', 'c', 'e.f']), array(['b', 'd', 'g'])) # Compared against peel >>> s.peel('.') (array(['a', 'c', 'e']), array(['b', 'd', 'f.g']))

stick(self, other: Strings, delimiter: Union[bytes, arkouda.dtypes.str_scalars] = '', toLeft: bool = False) → Strings Join the strings from another array onto one end of the strings of this array, optionally inserting a delimiter. Warning: This function is experimental and not guaranteed to work. Parameters • other (Strings) – The strings to join onto self’s strings • delimiter (str) – String inserted between self and other • toLeft (bool) – If true, join other strings to the left of self. By default, other is joined to the right of self. Returns The array of joined strings Return type Strings Raises • TypeError – Raised if the delimiter parameter is not bytes or str_scalars or if the other parameter is not a Strings instance

8.1. arkouda 259 arkouda, Release 2020.07.07

• ValueError – Raised if times is < 1 • RuntimeError – Raised if there is a server-side error thrown See also: lstick, peel, rpeel

Examples

>>> s= ak.array([ 'a', 'c', 'e']) >>> t= ak.array([ 'b', 'd', 'f']) >>> s.stick(t, delimiter='.') array(['a.b', 'c.d', 'e.f'])

__add__(self, other: Strings) → Strings lstick(self, other: Strings, delimiter: Union[bytes, arkouda.dtypes.str_scalars] = '') → Strings Join the strings from another array onto the left of the strings of this array, optionally inserting a delimiter. Warning: This function is experimental and not guaranteed to work. Parameters • other (Strings) – The strings to join onto self’s strings • delimiter (Union[bytes,str_scalars]) – String inserted between self and other Returns The array of joined strings, as other + self Return type Strings Raises • TypeError – Raised if the delimiter parameter is neither bytes nor a str or if the other parameter is not a Strings instance • RuntimeError – Raised if there is a server-side error thrown See also: stick, peel, rpeel

Examples

>>> s= ak.array([ 'a', 'c', 'e']) >>> t= ak.array([ 'b', 'd', 'f']) >>> s.lstick(t, delimiter='.') array(['b.a', 'd.c', 'f.e'])

__radd__(self, other: Strings) → Strings hash(self ) → Tuple[arkouda.pdarrayclass.pdarray, arkouda.pdarrayclass.pdarray] Compute a 128-bit hash of each string. Returns A tuple of two int64 pdarrays. The ith hash value is the concatenation of the ith values from each array. Return type Tuple[pdarray,pdarray]

260 Chapter 8. API Reference arkouda, Release 2020.07.07

Notes

The implementation uses SipHash128, a fast and balanced hash function (used by Python for dictionaries and sets). For realistic numbers of strings (up to about 10**15), the probability of a collision between two 128-bit hash values is negligible. group(self ) → arkouda.pdarrayclass.pdarray Return the permutation that groups the array, placing equivalent strings together. All instances of the same string are guaranteed to lie in one contiguous block of the permuted array, but the blocks are not necessarily ordered. Returns The permutation that groups the array by value Return type pdarray See also: GroupBy, unique

Notes

If the arkouda server is compiled with “-sSegmentedArray.useHash=true”, then arkouda uses 128-bit hash values to group strings, rather than sorting the strings directly. This method is fast, but the resulting permu- tation merely groups equivalent strings and does not sort them. If the “useHash” parameter is false, then a full sort is performed. Raises RuntimeError – Raised if there is a server-side error in executing group request or cre- ating the pdarray encapsulating the return message to_ndarray(self ) → numpy.ndarray Convert the array to a np.ndarray, transferring array data from the arkouda server to Python. If the array exceeds a built-in size limit, a RuntimeError is raised. Returns A numpy ndarray with the same strings as this array Return type np.ndarray

Notes

The number of bytes in the array cannot exceed arkouda.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.maxTransferBytes to a larger value, but proceed with caution. See also: array

8.1. arkouda 261 arkouda, Release 2020.07.07

Examples

>>> a= ak.array(["hello","my","world"]) >>> a.to_ndarray() array(['hello', 'my', 'world'], dtype='>> type(a.to_ndarray()) numpy.ndarray

save(self, prefix_path: str, dataset: str = 'strings_array', mode: str = 'truncate', save_offsets: bool = True) → str Save the Strings object to HDF5. The result is a collection of HDF5 files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the Strings array toits corresponding file. Parameters • prefix_path (str) – Directory and filename prefix that all output files share • dataset (str) – The name of the Strings dataset to be written, defaults to strings_array • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, create a new Strings dataset within existing files. • save_offsets (bool) – Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read. Returns Return type String message indicating result of save operation Raises • ValueError – Raised if the lengths of columns and values differ, or the mode is neither ‘truncate’ nor ‘append’ • TypeError – Raised if prefix_path, dataset, or mode is not astr See also: pdarrayIO.save

Notes

Important implementation notes: (1) Strings state is saved as two datasets within an hdf5 group: one for the string characters and one for the segments corresponding to the start of each string, (2) the hdf5 group is named via the dataset parameter. is_registered(self ) → numpy.bool_ Return True iff the object is contained in the registry Parameters None – Returns Indicates if the object is contained in the registry Return type bool Raises RuntimeError – Raised if there’s a server-side error thrown _list_component_names(self ) → List[str] Internal Function that returns a list of all component names

262 Chapter 8. API Reference arkouda, Release 2020.07.07

Parameters None – Returns List of all component names Return type List[str] info(self ) → str Returns a JSON formatted string containing information about all components of self Parameters None – Returns JSON string containing information about all components of self Return type str pretty_print_info(self ) → None Prints information about all components of self in a human readable format Parameters None – Returns Return type None register(self, user_defined_name: str) → Strings Register this Strings object with a user defined name in the arkouda server so it can be attached tolater using Strings.attach() This is an in-place operation, registering a Strings object more than once will update the name in the registry and remove the previously registered name. A name can only be registered to one object at a time. Parameters user_defined_name (str) – user defined name which the Strings object is tobe registered under Returns The same Strings object which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support afluid programming style. Please note you cannot register two different objects with the same name. Return type Strings Raises • TypeError – Raised if user_defined_name is not a str • RegistrationError – If the server was unable to register the Strings object with the user_defined_name If the user is attempting to register more than one object with thesame name, the former should be unregistered first to free up the registration name. See also: attach, unregister

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered. unregister(self ) → None Unregister a Strings object in the arkouda server which was previously registered using register() and/or attached to using attach() Returns Return type None Raises RuntimeError – Raised if the server could not find the internal name/symbol to remove

8.1. arkouda 263 arkouda, Release 2020.07.07

See also: register, attach

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered. static attach(user_defined_name: str) → Strings class method to return a Strings object attached to the registered name in the arkouda server which was registered using register() Parameters user_defined_name (str) – user defined name which the Strings object was reg- istered under Returns the Strings object registered with user_defined_name in the arkouda server Return type Strings object Raises TypeError – Raised if user_defined_name is not a str See also: register, unregister

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered. static unregister_strings_by_name(user_defined_name: str) → None Unregister a Strings object in the arkouda server previously registered via register() Parameters user_defined_name (str) – The registered name of the Strings object See also: register, unregister, attach, is_registered arkouda.join_on_eq_with_dt(a1: arkouda.pdarrayclass.pdarray, a2: arkouda.pdarrayclass.pdarray, t1: arkouda.pdarrayclass.pdarray, t2: arkouda.pdarrayclass.pdarray, dt: Union[int, numpy.int64], pred: str, result_limit: Union[int, numpy.int64] = 1000) → Tuple[arkouda.pdarrayclass.pdarray, arkouda.pdarrayclass.pdarray] Performs an inner-join on equality between two integer arrays where the time-window predicate is also true Parameters • a1 (pdarray, int64) – pdarray to be joined • a2 (pdarray, int64) – pdarray to be joined • t1 (pdarray) – timestamps in millis corresponding to the a1 pdarray • t2 (pdarray,) – timestamps in millis corresponding to the a2 pdarray • dt (Union[int,np.int64]) – time delta • pred (str) – time window predicate • result_limit (Union[int,np.int64]) – size limit for returned result Returns • result_array_one (pdarray, int64) – a1 indices where a1 == a2 • result_array_one (pdarray, int64) – a2 indices where a2 == a1

264 Chapter 8. API Reference arkouda, Release 2020.07.07

Raises • TypeError – Raised if a1, a2, t1, or t2 is not a pdarray, or if dt or result_limit is not an int • ValueError – if a1, a2, t1, or t2 dtype is not int64, pred is not ‘true_dt’, ‘abs_dt’, or ‘pos_dt’, or result_limit is < 0 class arkouda.Categorical(values, **kwargs) Represents an array of values belonging to named categories. Converting a Strings object to Categorical often saves memory and speeds up operations, especially if there are many repeated values, at the cost of some one-time work in initialization. Parameters values (Strings) – String values to convert to categories categories The set of category labels (determined automatically) Type Strings codes The category indices of the values or -1 for N/A Type pdarray, int64 permutation The permutation that groups the values in the same order as categories Type pdarray, int64 segments When values are grouped, the starting offset of each group Type pdarray, int64 size The number of items in the array Type Union[int,np.int64] nlevels The number of distinct categories Type Union[int,np.int64] ndim The rank of the array (currently only rank 1 arrays supported) Type Union[int,np.int64] shape The sizes of each dimension of the array Type tuple BinOps RegisterablePieces RequiredPieces objtype = category permutation segments

8.1. arkouda 265 arkouda, Release 2020.07.07

classmethod from_codes(cls, codes: arkouda.pdarrayclass.pdarray, categories: arkouda.strings.Strings, permutation=None, segments=None) → Categorical Make a Categorical from codes and categories arrays. If codes and categories have already been pre- computed, this constructor saves time. If not, please use the normal constructor. Parameters • codes (pdarray, int64) – Category indices of each value • categories (Strings) – Unique category labels • permutation (pdarray, int64) – The permutation that groups the values in the same order as categories • segments (pdarray, int64) – When values are grouped, the starting offset of each group Returns The Categorical object created from the input parameters Return type Categorical Raises TypeError – Raised if codes is not a pdarray of int64 objects or if categories is not a Strings object to_ndarray(self ) → numpy.ndarray Convert the array to a np.ndarray, transferring array data from the arkouda server to Python. This conversion discards category information and produces an ndarray of strings. If the arrays exceeds a built-in size limit, a RuntimeError is raised. Returns A numpy ndarray of strings corresponding to the values in this array Return type np.ndarray

Notes

The number of bytes in the array cannot exceed arkouda.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.maxTransferBytes to a larger value, but proceed with caution. abstract __iter__(self ) __len__(self ) __str__(self ) Return str(self). __repr__(self ) Return repr(self). _binop(self, other: Union[Categorical, arkouda.dtypes.str_scalars], op: arkouda.dtypes.str_scalars) → arkouda.pdarrayclass.pdarray Executes the requested binop on this Categorical instance and returns the results within a pdarray object. Parameters • other (Union[Categorical,str_scalars]) – the other object is a Categorical object or string scalar • op (str_scalars) – name of the binary operation to be performed Returns encapsulating the results of the requested binop

266 Chapter 8. API Reference arkouda, Release 2020.07.07

Return type pdarray Raises • ValueError – Raised if (1) the op is not in the self.BinOps set, or (2) if the sizes of this and the other instance don’t match • RuntimeError – Raised if a server-side error is thrown while executing the binary opera- tion _r_binop(self, other: Union[Categorical, arkouda.dtypes.str_scalars], op: arkouda.dtypes.str_scalars) → arkouda.pdarrayclass.pdarray

Executes the requested reverse binop on this Categorical instance and returns the results within a pdarray object. other [Union[Categorical,str_scalars]] the other object is a Categorical object or string scalar op [str_scalars] name of the binary operation to be performed

pdarray encapsulating the results of the requested binop

Raises

• ValueError Raised if (1) the op is not in the self.BinOps set, or (2) if the sizes of this and the other instance don’t match RuntimeError Raised if a server-side error is thrown while executing the binary operation

__eq__(self, other) Return self==value. __ne__(self, other) Return self!=value. __getitem__(self, key) → Categorical reset_categories(self ) → Categorical Recompute the category labels, discarding any unused labels. This method is often useful after slicing or indexing a Categorical array, when the resulting array only contains a subset of the original categories. In this case, eliminating unused categories can speed up other operations. Returns A Categorical object generated from the current instance Return type Categorical contains(self, substr: str) → arkouda.pdarrayclass.pdarray Check whether each element contains the given substring. Parameters substr (str) – The substring to search for Returns True for elements that contain substr, False otherwise Return type pdarray, bool Raises TypeError – Raised if substr is not a str

8.1. arkouda 267 arkouda, Release 2020.07.07

Notes

This method can be significantly faster than the corresponding method on Strings objects, because it searches the unique category labels instead of the full array. See also: Categorical.startswith, Categorical.endswith startswith(self, substr: str) → arkouda.pdarrayclass.pdarray Check whether each element starts with the given substring. Parameters substr (str) – The substring to search for Raises TypeError – Raised if substr is not a str Returns True for elements that contain substr, False otherwise Return type pdarray, bool

Notes

This method can be significantly faster than the corresponding method on Strings objects, because it searches the unique category labels instead of the full array. See also: Categorical.contains, Categorical.endswith endswith(self, substr: str) → arkouda.pdarrayclass.pdarray Check whether each element ends with the given substring. Parameters substr (str) – The substring to search for Raises TypeError – Raised if substr is not a str Returns True for elements that contain substr, False otherwise Return type pdarray, bool

Notes

This method can be significantly faster than the corresponding method on Strings objects, because it searches the unique category labels instead of the full array. See also: Categorical.startswith, Categorical.contains in1d(self, test: Union[arkouda.strings.Strings, Categorical]) → arkouda.pdarrayclass.pdarray Test whether each element of the Categorical object is also present in the test Strings or Categorical object. Returns a boolean array the same length as self that is True where an element of self is in test and False otherwise. Parameters test (Union[Strings,Categorical]) – The values against which to test each value of ‘self`. Returns The values self[in1d] are in the test Strings or Categorical object. Return type pdarray, bool Raises TypeError – Raised if test is not a Strings or Categorical object

268 Chapter 8. API Reference arkouda, Release 2020.07.07

See also: unique, intersect1d, union1d

Notes

in1d can be considered as an element-wise function version of the python keyword in, for 1-D sequences. in1d(a, b) is logically equivalent to ak.array([item in b for item in a]), but is much faster and scales to arbitrarily large a.

Examples

>>> strings= ak.array([ 'String {}'.format(i) for i in range(0,5)]) >>> cat= ak.Categorical(strings) >>> ak.in1d(cat,strings) array([True, True, True, True, True]) >>> strings= ak.array([ 'String {}'.format(i) for i in range(5,9)]) >>> catTwo= ak.Categorical(strings) >>> ak.in1d(cat,catTwo) array([False, False, False, False, False])

unique(self ) → Categorical group(self ) → arkouda.pdarrayclass.pdarray Return the permutation that groups the array, placing equivalent categories together. All instances of the same category are guaranteed to lie in one contiguous block of the permuted array, but the blocks are not necessarily ordered. Returns The permutation that groups the array by value Return type pdarray See also: GroupBy, unique

Notes

This method is faster than the corresponding Strings method. If the Categorical was created from a Strings object, then this function simply returns the cached permutation. Even if the Categorical was created using from_codes(), this function will be faster than Strings.group() because it sorts dense integer values, rather than 128-bit hash values. argsort(self ) sort(self ) concatenate(self, others: Sequence[Categorical], ordered: bool = True) → Categorical Merge this Categorical with other Categorical objects in the array, concatenating the arrays and synchro- nizing the categories. Parameters • others (Sequence[Categorical]) – The Categorical arrays to concatenate and merge with this one

8.1. arkouda 269 arkouda, Release 2020.07.07

• ordered (bool) – If True (default), the arrays will be appended in the order given. If False, array data may be interleaved in blocks, which can greatly improve performance but results in non-deterministic ordering of elements. Returns The merged Categorical object Return type Categorical Raises TypeError – Raised if any others array objects are not Categorical objects

Notes

This operation can be expensive – slower than concatenating Strings. save(self, prefix_path: str, dataset: str = 'categorical_array', mode: str = 'truncate') → str Save the Categorical object to HDF5. The result is a collection of HDF5 files, one file per locale ofthe arkouda server, where each filename starts with prefix_path and dataset. Each locale saves its chunk ofthe Strings array to its corresponding file. Parameters • prefix_path (str) – Directory and filename prefix that all output files share • dataset (str) – Name of the dataset to create in HDF5 files (must not already exist) • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, create a new Categorical dataset within existing files. Returns Return type String message indicating result of save operation Raises • ValueError – Raised if the lengths of columns and values differ, or the mode is neither ‘truncate’ nor ‘append’ • TypeError – Raised if prefix_path, dataset, or mode is not astr See also: pdarrayIO.save, pdarrayIO.load_all

Notes

Important implementation notes: (1) Strings state is saved as two datasets within an hdf5 group: one for the string characters and one for the segments corresponding to the start of each string, (2) the hdf5 group is named via the dataset parameter. register(self, user_defined_name: str) → Categorical Register this Categorical object and underlying components with the Arkouda server Parameters user_defined_name (str) – user defined name the Categorical is to be registered under, this will be the root name for underlying components Returns The same Categorical which is now registered with the arkouda server and has an up- dated name. This is an in-place modification, the original is returned to support a fluid pro- gramming style. Please note you cannot register two different Categoricals with the same name. Return type Categorical Raises

270 Chapter 8. API Reference arkouda, Release 2020.07.07

• TypeError – Raised if user_defined_name is not a str • RegistrationError – If the server was unable to register the Categorical with the user_defined_name See also: unregister, attach, unregister_categorical_by_name, is_registered

Notes

Objects registered with the server are immune to deletion until they are unregistered. unregister(self ) → None Unregister this Categorical object in the arkouda server which was previously registered using register() and/or attached to using attach() Raises RegistrationError – If the object is already unregistered or if there is a server error when attempting to unregister See also: register, attach, unregister_categorical_by_name, is_registered

Notes

Objects registered with the server are immune to deletion until they are unregistered. is_registered(self ) → numpy.bool_

Return True iff the object is contained in the registry

Returns Indicates if the object is contained in the registry Return type numpy.bool Raises RegistrationError – Raised if there’s a server-side error or a mis-match of registered components

See also: register, attach, unregister, unregister_categorical_by_name

Notes

Objects registered with the server are immune to deletion until they are unregistered. _get_components_dict(self ) → Dict Internal function that returns a dictionary with all required or non-None components of self Required Categorical components (Codes and Categories) are always included in returned components_dict Optional Categorical components (Permutation and Segments) are only included if they’ve been set (are not None) Returns Dictionary of all required or non-None components of self Keys: component names (Codes, Categories, Permutation, Segments) Values: components of self

8.1. arkouda 271 arkouda, Release 2020.07.07

Return type Dict _list_component_names(self ) → List[str] Internal function that returns a list of all component names Parameters None – Returns List of all component names Return type List[str] info(self ) → str Returns a JSON formatted string containing information about all components of self Parameters None – Returns JSON string containing information about all components of self Return type str pretty_print_info(self ) → None Prints information about all components of self in a human readable format Parameters None – Returns Return type None static attach(user_defined_name: str) → Categorical

Function to return a Categorical object attached to the registered name in the arkouda server which was registered using register() user_defined_name [str] user defined name which Categorical object was registered under

Categorical The Categorical object created by re-attaching to the corresponding server compo- nents

Raises TypeError – if user_defined_name is not a string See also: register, is_registered, unregister, unregister_categorical_by_name

static unregister_categorical_by_name(user_defined_name: str) → None Function to unregister Categorical object by name which was registered with the arkouda server via regis- ter() Parameters user_defined_name (str) – Name under which the Categorical object was regis- tered Raises • TypeError – if user_defined_name is not a string • RegistrationError – if there is an issue attempting to unregister any underlying com- ponents See also: register, unregister, attach, is_registered

272 Chapter 8. API Reference arkouda, Release 2020.07.07

static parse_hdf_categoricals(d: Mapping[str, Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings]]) → Tuple[List[str], Dict[str, Categorical]] This function should be used in conjunction with the load_all function which reads hdf5 files and reconsti- tutes Categorical objects. Categorical objects use a naming convention and HDF5 structure so they can be identified and constructed for the user. In general you should not call this method directly Parameters d (Dictionary of String to either Pdarray or Strings object)– Returns • 2-Tuple of List of strings containing key names which should be removed and Dictionary of base name to • Categorical object See also: Categorical.save, load_all arkouda.enableVerbose() → None Enables verbose logging (DEBUG log level) for all ArkoudaLoggers arkouda.disableVerbose(logLevel: LogLevel = LogLevel.INFO) → None Disables verbose logging (DEBUG log level) for all ArkoudaLoggers, setting the log level for each to the logLevel parameter Parameters logLevel (LogLevel) – The new log level, defaultts to LogLevel.INFO Raises TypeError – Raised if logLevel is not a LogLevel enum class arkouda.pdarray(name: str, mydtype: numpy.dtype, size: arkouda.dtypes.int_scalars, ndim: arkouda.dtypes.int_scalars, shape: Sequence[int], itemsize: arkouda.dtypes.int_scalars) The basic arkouda array class. This class contains only the attributies of the array; the data resides on the arkouda server. When a server operation results in a new array, arkouda will create a pdarray instance that points to the array data on the server. As such, the user should not initialize pdarray instances directly. name The server-side identifier for the array Type str dtype The element type of the array Type dtype size The number of elements in the array Type int_scalars ndim The rank of the array (currently only rank 1 arrays supported) Type int_scalars shape A list or tuple containing the sizes of each dimension of the array Type Sequence[int] itemsize The size in bytes of each element

8.1. arkouda 273 arkouda, Release 2020.07.07

Type int_scalars BinOps OpEqOps objtype = pdarray __array_priority__ = 1000 __del__(self ) __bool__(self ) → bool __len__(self ) __str__(self ) Return str(self). __repr__(self ) Return repr(self). format_other(self, other: object) → numpy.dtype Attempt to cast scalar other to the element dtype of this pdarray, and print the resulting value to a string (e.g. for sending to a server command). The user should not call this function directly. Parameters other (object) – The scalar to be cast to the pdarray.dtype Returns Return type np.dtype corresponding to the other parameter Raises TypeError – Raised if the other parameter cannot be converted to Numpy dtype _binop(self, other: pdarray, op: str) → pdarray Executes binary operation specified by the op string Parameters • other (pdarray) – The pdarray upon which the binop is to be executed • op (str) – The binop to be executed Returns A pdarray encapsulating the binop result Return type pdarray Raises • ValueError – Raised if the op is not within the pdarray.BinOps set, or if the pdarray sizes don’t match • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype _r_binop(self, other: pdarray, op: str) → pdarray Executes reverse binary operation specified by the op string Parameters • other (pdarray) – The pdarray upon which the reverse binop is to be executed • op (str) – The name of the reverse binop to be executed Returns A pdarray encapsulating the reverse binop result Return type pdarray Raises

274 Chapter 8. API Reference arkouda, Release 2020.07.07

• ValueError – Raised if the op is not within the pdarray.BinOps set • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype __add__(self, other) __radd__(self, other) __sub__(self, other) __rsub__(self, other) __mul__(self, other) __rmul__(self, other) __truediv__(self, other) __rtruediv__(self, other) __floordiv__(self, other) __rfloordiv__(self, other) __mod__(self, other) __rmod__(self, other) __lshift__(self, other) __rlshift__(self, other) __rshift__(self, other) __rrshift__(self, other) __and__(self, other) __rand__(self, other) __or__(self, other) __ror__(self, other) __xor__(self, other) __rxor__(self, other) __pow__(self, other) __rpow__(self, other) __lt__(self, other) Return selfvalue. __le__(self, other) Return self<=value. __ge__(self, other) Return self>=value. __eq__(self, other) Return self==value. __ne__(self, other) Return self!=value.

8.1. arkouda 275 arkouda, Release 2020.07.07

__neg__(self ) __invert__(self ) opeq(self, other, op) __iadd__(self, other) __isub__(self, other) __imul__(self, other) __itruediv__(self, other) __ifloordiv__(self, other) __ilshift__(self, other) __irshift__(self, other) __iand__(self, other) __ior__(self, other) __ixor__(self, other) __ipow__(self, other) abstract __iter__(self ) __getitem__(self, key) __setitem__(self, key, value) fill(self, value: arkouda.dtypes.numeric_scalars) → None Fill the array (in place) with a constant value. Parameters value (numeric_scalars)– Raises TypeError – Raised if value is not an int, int64, float, or float64 any(self ) → numpy.bool_ Return True iff any element of the array evaluates to True. all(self ) → numpy.bool_ Return True iff all elements of the array evaluate to True. is_registered(self ) → numpy.bool_ Return True iff the object is contained in the registry Parameters None – Returns Indicates if the object is contained in the registry Return type bool Raises RuntimeError – Raised if there’s a server-side error thrown _list_component_names(self ) → List[str] Internal Function that returns a list of all component names Parameters None – Returns List of all component names Return type List[str] info(self ) → str Returns a JSON formatted string containing information about all components of self

276 Chapter 8. API Reference arkouda, Release 2020.07.07

Parameters None – Returns JSON string containing information about all components of self Return type str pretty_print_info(self ) → None Prints information about all components of self in a human readable format Parameters None – Returns Return type None is_sorted(self ) → numpy.bool_ Return True iff the array is monotonically non-decreasing. Parameters None – Returns Indicates if the array is monotonically non-decreasing Return type bool Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown sum(self ) → arkouda.dtypes.numpy_scalars Return the sum of all elements in the array. prod(self ) → numpy.float64 Return the product of all elements in the array. Return value is always a np.float64 or np.int64. min(self ) → arkouda.dtypes.numpy_scalars Return the minimum value of the array. max(self ) → arkouda.dtypes.numpy_scalars Return the maximum value of the array. argmin(self ) → numpy.int64 Return the index of the first occurrence of the array min value argmax(self ) → numpy.int64 Return the index of the first occurrence of the array max value. mean(self ) → numpy.float64 Return the mean of the array. var(self, ddof: arkouda.dtypes.int_scalars = 0) → numpy.float64 Compute the variance. See arkouda.var for details. Parameters ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating var Returns The scalar variance of the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • ValueError – Raised if the ddof >= pdarray size • RuntimeError – Raised if there’s a server-side error thrown

8.1. arkouda 277 arkouda, Release 2020.07.07

std(self, ddof: arkouda.dtypes.int_scalars = 0) → numpy.float64 Compute the standard deviation. See arkouda.std for details. Parameters ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std Returns The scalar standard deviation of the array Return type np.float64 Raises • TypeError – Raised if pda is not a pdarray instance • RuntimeError – Raised if there’s a server-side error thrown mink(self, k: arkouda.dtypes.int_scalars) → pdarray Compute the minimum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray maxk(self, k: arkouda.dtypes.int_scalars) → pdarray Compute the maximum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray argmink(self, k: arkouda.dtypes.int_scalars) → pdarray Compute the minimum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns Indices corresponding to the maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray argmaxk(self, k: arkouda.dtypes.int_scalars) → pdarray Finds the indices corresponding to the maximum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns Indices corresponding to the maximum k values, sorted Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray to_ndarray(self ) → numpy.ndarray Convert the array to a np.ndarray, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised. Returns A numpy ndarray with the same attributes and data as the pdarray

278 Chapter 8. API Reference arkouda, Release 2020.07.07

Return type np.ndarray Raises RuntimeError – Raised if there is a server-side error thrown, if the pdarray size ex- ceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution. See also: array

Examples

>>> a= ak.arange(0,5,1) >>> a.to_ndarray() array([0, 1, 2, 3, 4])

>>> type(a.to_ndarray()) numpy.ndarray

to_cuda(self ) Convert the array to a Numba DeviceND array, transferring array data from the arkouda server to Python via ndarray. If the array exceeds a builtin size limit, a RuntimeError is raised. Returns A Numba ndarray with the same attributes and data as the pdarray; on GPU Return type numba.DeviceNDArray Raises • ImportError – Raised if CUDA is not available • ModuleNotFoundError – Raised if Numba is either not installed or not enabled • RuntimeError – Raised if there is a server-side error thrown in the course of retrieving the pdarray.

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution. See also: array

8.1. arkouda 279 arkouda, Release 2020.07.07

Examples

>>> a= ak.arange(0,5,1) >>> a.to_cuda() array([0, 1, 2, 3, 4])

>>> type(a.to_cuda()) numpy.devicendarray

save(self, prefix_path: str, dataset: str = 'array', mode: str = 'truncate') → str Save the pdarray to HDF5. The result is a collection of HDF5 files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the array to its corresponding file. Parameters • prefix_path (str) – Directory and filename prefix that all output files share • dataset (str) – Name of the dataset to create in HDF5 files (must not already exist) • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files. Returns Return type string message indicating result of save operation Raises • RuntimeError – Raised if a server-side error is thrown saving the pdarray • ValueError – Raised if there is an error in parsing the prefix path pointing to file write location or if the mode parameter is neither truncate nor append • TypeError – Raised if any one of the prefix_path, dataset, or mode parameters is nota string See also: save_all, load, read_hdf , read_all

Notes

The prefix_path must be visible to the arkouda server and the user must have write permission. Output files have names of the form _LOCALE.hdf, where ranges from 0 to numLocales. If any of the output files already exist and the mode is ‘truncate’, they will be overwrit- ten. If the mode is ‘append’ and the number of output files is less than the number of locales or adataset with the same name already exists, a RuntimeError will result.

280 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> a= ak.arange(0, 100,1) >>> a.save('arkouda_range', dataset='array')

Array is saved in numLocales files with names like tmp/arkouda_range_LOCALE0.hdf The array can be read back in as follows >>> b= ak.load( 'arkouda_range', dataset='array') >>> (a == b).all() True

register(self, user_defined_name: str) → pdarray Register this pdarray with a user defined name in the arkouda server so it can be attached to later using pdarray.attach() This is an in-place operation, registering a pdarray more than once will update the name in the registry and remove the previously registered name. A name can only be registered to one pdarray at a time. Parameters user_defined_name (str) – user defined name array is to be registered under Returns The same pdarray which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different pdarrays with the same name. Return type pdarray Raises • TypeError – Raised if user_defined_name is not a str • RegistrationError – If the server was unable to register the pdarray with the user_defined_name If the user is attempting to register more than one pdarray withthe same name, the former should be unregistered first to free up the registration name. See also: attach, unregister, is_registered, list_registry, unregister_pdarray_by_name

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a= zeros(100) >>> a.register("my_zeros") >>> # potentially disconnect from server and reconnect to server >>> b= ak.pdarray.attach("my_zeros") >>> # ...other work... >>> b.unregister()

unregister(self ) → None Unregister a pdarray in the arkouda server which was previously registered using register() and/or attahced to using attach() Returns

8.1. arkouda 281 arkouda, Release 2020.07.07

Return type None Raises RuntimeError – Raised if the server could not find the internal name/symbol to remove See also: register, unregister, is_registered, unregister_pdarray_by_name, list_registry

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a= zeros(100) >>> a.register("my_zeros") >>> # potentially disconnect from server and reconnect to server >>> b= ak.pdarray.attach("my_zeros") >>> # ...other work... >>> b.unregister()

static attach(user_defined_name: str) → pdarray class method to return a pdarray attached to the registered name in the arkouda server which was registered using register() Parameters user_defined_name (str) – user defined name which array was registered under Returns pdarray which is bound to corresponding server side component that was registered with user_defined_name Return type pdarray Raises TypeError – Raised if user_defined_name is not a str See also: register, unregister, is_registered, unregister_pdarray_by_name, list_registry

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a= zeros(100) >>> a.register("my_zeros") >>> # potentially disconnect from server and reconnect to server >>> b= ak.pdarray.attach("my_zeros") >>> # ...other work... >>> b.unregister() arkouda.int64 arkouda.isSupportedInt(num)

282 Chapter 8. API Reference arkouda, Release 2020.07.07 arkouda.from_series(series: pandas.Series, dtype: Optional[Union[type, str]] = None) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Converts a Pandas Series to an Arkouda pdarray or Strings object. If dtype is None, the dtype is inferred from the Pandas Series. Otherwise, the dtype parameter is set if the dtype of the Pandas Series is to be overridden or is unknown (for example, in situations where the Series dtype is object). Parameters • series (Pandas Series) – The Pandas Series with a dtype of bool, float64, int64, or string • dtype (Optional[type]) – The valid dtype types are np.bool, np.float64, np.int64, and np.str Returns Return type Union[pdarray,Strings] Raises • TypeError – Raised if series is not a Pandas Series object • ValueError – Raised if the Series dtype is not bool, float64, int64, string, datetime, or timedelta

Examples

>>> ak.from_series(pd.Series(np.random.randint(0,10,5))) array([9, 0, 4, 7, 9])

>>> ak.from_series(pd.Series(['1', '2', '3', '4', '5']),dtype=np.int64) array([1, 2, 3, 4, 5])

>>> ak.from_series(pd.Series(np.random.uniform(low=0.0,high=1.0,size=3))) array([0.57600036956445599, 0.41619265571741659, 0.6615356693784662])

>>> ak.from_series(pd.Series(['0.57600036956445599', '0.41619265571741659', '0.6615356693784662']), dtype=np.float64) array([0.57600036956445599, 0.41619265571741659, 0.6615356693784662])

>>> ak.from_series(pd.Series(np.random.choice([True, False],size=5))) array([True, False, True, True, True])

>>> ak.from_series(pd.Series(['True', 'False', 'False', 'True', 'True']), dtype=np.

˓→bool) array([True, True, True, True, True])

>>> ak.from_series(pd.Series(['a', 'b', 'c', 'd', 'e'], dtype="string")) array(['a', 'b', 'c', 'd', 'e'])

>>> ak.from_series(pd.Series(['a', 'b', 'c', 'd', 'e']),dtype=np.str) array(['a', 'b', 'c', 'd', 'e'])

8.1. arkouda 283 arkouda, Release 2020.07.07

>>> ak.from_series(pd.Series(pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01

˓→')]))) array([1514764800000000000, 1514764800000000000])

Notes

The supported datatypes are bool, float64, int64, string, and datetime64[ns]. The data type is either inferred from the the Series or is set via the dtype parameter. Series of datetime or timedelta are converted to Arkouda arrays of dtype int64 (nanoseconds) A Pandas Series containing strings has a dtype of object. Arkouda assumes the Series contains strings and sets the dtype to str arkouda.ak_array(a: Union[arkouda.pdarrayclass.pdarray, numpy.ndarray, Iterable]) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Convert a Python or Numpy Iterable to a pdarray or Strings object, sending the corresponding data to the arkouda server. Parameters a (Union[pdarray, np.ndarray]) – Rank-1 array of a supported dtype Returns A pdarray instance stored on arkouda server or Strings instance, which is composed of two pdarrays stored on arkouda server Return type pdarray or Strings Raises • TypeError – Raised if a is not a pdarray, np.ndarray, or Python Iterable such as a list, array, tuple, or deque • RuntimeError – Raised if a is not one-dimensional, nbytes > maxTransferBytes, a.dtype is not supported (not in DTypes), or if the product of a size and a.itemsize > maxTransferBytes • ValueError – Raised if the returned message is malformed or does not contain the fields required to generate the array. See also: pdarray.to_ndarray

Notes

The number of bytes in the input array cannot exceed arkouda.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overwhelming the connection between the Python client and the arkouda server, under the assumption that it is a low-bandwidth connection. The user may override this limit by setting ak.maxTransferBytes to a larger value, but should proceed with caution. If the pdrray or ndarray is of type U, this method is called twice recursively to create the Strings object and the two corresponding pdarrays for string bytes and offsets, respectively.

284 Chapter 8. API Reference arkouda, Release 2020.07.07

Examples

>>> ak.array(np.arange(1,10)) array([1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> ak.array(range(1,10)) array([1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> strings= ak.array([ 'string {}'.format(i) for i in range(0,5)]) >>> type(strings) arkouda.cast(pda: Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings], dt: Union[numpy.dtype, str]) → Union[arkouda.pdarrayclass.pdarray, arkouda.strings.Strings] Cast an array to another dtype. Parameters • pda (pdarray or Strings) – The array of values to cast • dtype (np.dtype or str) – The target dtype to cast values to Returns Array of values cast to desired dtype Return type pdarray or Strings

Notes

The cast is performed according to Chapel’s casting rules and is NOT safe from overflows or underflows. The user must ensure that the target dtype has the precision and capacity to hold the desired result.

Examples

>>> ak.cast(ak.linspace(1.0,5.0,5), dt=ak.int64) array([1, 2, 3, 4, 5])

>>> ak.cast(ak.arange(0,5), dt=ak.float64).dtype dtype('float64')

>>> ak.cast(ak.arange(0,5), dt=ak.bool) array([False, True, True, True, True])

>>> ak.cast(ak.linspace(0,4,5), dt=ak.bool) array([False, True, True, True, True]) arkouda.akabs(pda: arkouda.pdarrayclass.pdarray) → arkouda.pdarrayclass.pdarray Return the element-wise absolute value of the array. Parameters pda (pdarray)– Returns A pdarray containing absolute values of the input array elements Return type pdarray Raises TypeError – Raised if the parameter is not a pdarray

8.1. arkouda 285 arkouda, Release 2020.07.07

Examples

>>> ak.abs(ak.arange(-5,-1)) array([5, 4, 3, 2])

>>> ak.abs(ak.linspace(-5,-1,5)) array([5, 4, 3, 2, 1]) arkouda._BASE_UNIT = ns arkouda._unit2normunit arkouda._unit2factor arkouda._get_factor(unit: str) → int arkouda._identity(x, **kwargs) class arkouda._Timescalar(scalar) class arkouda._AbstractBaseTime(array, unit: str = _BASE_UNIT) Bases: arkouda.pdarrayclass.pdarray Base class for Datetime and Timedelta; not user-facing. Arkouda handles time similar to Pandas (albeit with less functionality), in that all absolute and relative times are represented in nanoseconds as int64 behind the scenes. Datetime and Timedelta can be constructed from Arkouda, NumPy, or Pandas arrays; in each case, the input values are normalized to nanoseconds on initialization, so that all resulting operations are transparent. classmethod _get_callback(cls, other, op) floor(self, freq) Round times down to the nearest integer of a given frequency. Parameters freq (str {'d', 'm', 'h', 's', 'ms', 'us', 'ns'}) – Frequency to round to Returns Values rounded down to nearest frequency Return type self.__class__ ceil(self, freq) Round times up to the nearest integer of a given frequency. Parameters freq (str {'d', 'm', 'h', 's', 'ms', 'us', 'ns'}) – Frequency to round to Returns Values rounded up to nearest frequency Return type self.__class__ round(self, freq) Round times to the nearest integer of a given frequency. Midpoint values will be rounded to nearest even integer. Parameters freq (str {'d', 'm', 'h', 's', 'ms', 'us', 'ns'}) – Frequency to round to Returns Values rounded to nearest frequency Return type self.__class__ to_ndarray(self ) Convert the array to a np.ndarray, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised. Returns A numpy ndarray with the same attributes and data as the pdarray Return type np.ndarray

286 Chapter 8. API Reference arkouda, Release 2020.07.07

Raises RuntimeError – Raised if there is a server-side error thrown, if the pdarray size ex- ceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution. See also: array

Examples

>>> a= ak.arange(0,5,1) >>> a.to_ndarray() array([0, 1, 2, 3, 4])

>>> type(a.to_ndarray()) numpy.ndarray

__str__(self ) Return str(self). __repr__(self ) → str Return repr(self). _binop(self, other, op) Executes binary operation specified by the op string Parameters • other (pdarray) – The pdarray upon which the binop is to be executed • op (str) – The binop to be executed Returns A pdarray encapsulating the binop result Return type pdarray Raises • ValueError – Raised if the op is not within the pdarray.BinOps set, or if the pdarray sizes don’t match • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype _r_binop(self, other, op) Executes reverse binary operation specified by the op string Parameters • other (pdarray) – The pdarray upon which the reverse binop is to be executed • op (str) – The name of the reverse binop to be executed

8.1. arkouda 287 arkouda, Release 2020.07.07

Returns A pdarray encapsulating the reverse binop result Return type pdarray Raises • ValueError – Raised if the op is not within the pdarray.BinOps set • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype opeq(self, other, op) static _is_datetime_scalar(scalar) static _is_timedelta_scalar(scalar) _scalar_callback(self, key) __getitem__(self, key) __setitem__(self, key, value) min(self ) Return the minimum value of the array. max(self ) Return the maximum value of the array. mink(self, k) Compute the minimum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray maxk(self, k) Compute the maximum “k” values. Parameters k (int_scalars) – The desired count of maximum values to be returned by the output. Returns The maximum k values from pda Return type pdarray, int Raises TypeError – Raised if pda is not a pdarray class arkouda.Datetime(array, unit: str = _BASE_UNIT) Bases: _AbstractBaseTime Represents a date and/or time. Datetime is the Arkouda analog to pandas DatetimeIndex and other timeseries data types. Parameters • array (int64 pdarray, pd.DatetimeIndex, pd.Series, or np.datetime64 array)– • uint (str, default 'ns') – For int64 pdarray, denotes the unit of the input. Ignored for pandas and numpy arrays, which carry their own unit. Not case-sensitive; prefixes of full names (like ‘sec’) are accepted.

288 Chapter 8. API Reference arkouda, Release 2020.07.07

Possible values: – ’weeks’ or ‘w’ – ’days’ or ‘d’ – ’hours’ or ‘h’ – ’minutes’, ‘m’, or ‘t’ – ’seconds’ or ‘s’ – ’milliseconds’, ‘ms’, or ‘l’ – ’microseconds’, ‘us’, or ‘u’ – ’nanoseconds’, ‘ns’, or ‘n’ Unlike in pandas, units cannot be combined or mixed with integers

Notes

The ._data attribute is always in nanoseconds with int64 dtype. supported_with_datetime supported_with_r_datetime supported_with_timedelta supported_with_r_timedelta supported_opeq supported_with_pdarray supported_with_r_pdarray classmethod _get_callback(cls, otherclass, op) _scalar_callback(self, scalar) static _is_supported_scalar(scalar) to_pandas(self ) Convert array to a pandas DatetimeIndex. Note: if the array size exceeds client.maxTransferBytes, a Run- timeError is raised. See also: to_ndarray sum(self ) Return the sum of all elements in the array. class arkouda.Timedelta(array, unit: str = _BASE_UNIT) Bases: _AbstractBaseTime Represents a duration, the difference between two dates or times. Timedelta is the Arkouda equivalent of pandas.TimedeltaIndex. Parameters • array (int64 pdarray, pd.TimedeltaIndex, pd.Series, or np.timedelta64 array)–

8.1. arkouda 289 arkouda, Release 2020.07.07

• unit (str, default 'ns') – For int64 pdarray, denotes the unit of the input. Ignored for pandas and numpy arrays, which carry their own unit. Not case-sensitive; prefixes of full names (like ‘sec’) are accepted. Possible values: – ’weeks’ or ‘w’ – ’days’ or ‘d’ – ’hours’ or ‘h’ – ’minutes’, ‘m’, or ‘t’ – ’seconds’ or ‘s’ – ’milliseconds’, ‘ms’, or ‘l’ – ’microseconds’, ‘us’, or ‘u’ – ’nanoseconds’, ‘ns’, or ‘n’ Unlike in pandas, units cannot be combined or mixed with integers

Notes

The ._data attribute is always in nanoseconds with int64 dtype. supported_with_datetime supported_with_r_datetime supported_with_timedelta supported_with_r_timedelta supported_opeq supported_with_pdarray supported_with_r_pdarray classmethod _get_callback(cls, otherclass, op) _scalar_callback(self, scalar) static _is_supported_scalar(scalar) to_pandas(self ) Convert array to a pandas TimedeltaIndex. Note: if the array size exceeds client.maxTransferBytes, a RuntimeError is raised. See also: to_ndarray std(self, ddof: Union[int, numpy.int64] = 0) Returns the standard deviation as a pd.Timedelta object sum(self ) Return the sum of all elements in the array. abs(self ) Absolute value of time interval.

290 Chapter 8. API Reference arkouda, Release 2020.07.07 arkouda.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs) Creates a fixed frequency Datetime range. Alias for ak.Datetime(pd.date_range(args)). Subject to size limit imposed by client.maxTransferBytes. Parameters • start (str or datetime-like, optional) – Left bound for generating dates. • end (str or datetime-like, optional) – Right bound for generating dates. • periods (int, optional) – Number of periods to generate. • freq (str or DateOffset, default 'D') – Frequency strings can have multiples, e.g. ‘5H’. See timeseries.offset_aliases for a list of frequency aliases. • tz (str or tzinfo, optional) – Time zone name for returning localized DatetimeIn- dex, for example ‘Asia/Hong_Kong’. By default, the resulting DatetimeIndex is timezone- naive. • normalize (bool, default False) – Normalize start/end dates to midnight before gen- erating date range. • name (str, default None) – Name of the resulting DatetimeIndex. • closed ({None, 'left', 'right'}, optional) – Make the interval closed with respect to the given frequency to the ‘left’, ‘right’, or both sides (None, the default). • **kwargs – For compatibility. Has no effect on the result. Returns rng Return type DatetimeIndex

Notes

Of the four parameters start, end, periods, and freq, exactly three must be specified. If freq is omitted, the resulting DatetimeIndex will have periods linearly spaced elements between start and end (closed on both sides). To learn more about the frequency strings, please see this link. arkouda.timedelta_range(start=None, end=None, periods=None, freq=None, name=None, closed=None, **kwargs) Return a fixed frequency TimedeltaIndex, with day as the default frequency. Aliasfor ak.Timedelta(pd. timedelta_range(args)). Subject to size limit imposed by client.maxTransferBytes. Parameters • start (str or timedelta-like, default None) – Left bound for generating timedeltas. • end (str or timedelta-like, default None) – Right bound for generating timedeltas. • periods (int, default None) – Number of periods to generate. • freq (str or DateOffset, default 'D') – Frequency strings can have multiples, e.g. ‘5H’. • name (str, default None) – Name of the resulting TimedeltaIndex. • closed (str, default None) – Make the interval closed with respect to the given fre- quency to the ‘left’, ‘right’, or both sides (None).

8.1. arkouda 291 arkouda, Release 2020.07.07

Returns rng Return type TimedeltaIndex

Notes

Of the four parameters start, end, periods, and freq, exactly three must be specified. If freq is omitted, the resulting TimedeltaIndex will have periods linearly spaced elements between start and end (closed on both sides). To learn more about the frequency strings, please see this link. arkouda.AllSymbols = __AllSymbols__ arkouda.RegisteredSymbols = __RegisteredSymbols__ arkouda.information(names: Union[List[str], str] = RegisteredSymbols) → str Returns JSON formatted string containing information about the objects in names Parameters names (Union[List[str], str]) – names is either the name of an object or list of names of objects to retrieve info if names is ak.AllSymbols, retrieves info for all symbols in the symbol table if names is ak.RegisteredSymbols, retrieves info for all symbols in the registry Returns JSON formatted string containing a list of information for each object in names Return type str Raises RuntimeError – Raised if a server-side error is thrown in the process of retrieving informa- tion about the objects in names arkouda.list_registry() → List[str] Return a list containing the names of all registered objects Parameters None – Returns List of all object names in the registry Return type list Raises RuntimeError – Raised if there’s a server-side error thrown arkouda.list_symbol_table() → List[str] Return a list containing the names of all objects in the symbol table Parameters None – Returns List of all object names in the symbol table Return type list Raises RuntimeError – Raised if there’s a server-side error thrown arkouda.pretty_print_information(names: Union[List[str], str] = RegisteredSymbols) → None Prints verbose information for each object in names in a human readable format Parameters names (Union[List[str], str]) – names is either the name of an object or list of names of objects to retrieve info if names is ak.AllSymbols, retrieves info for all symbols in the symbol table if names is ak.RegisteredSymbols, retrieves info for all symbols in the registry Returns Return type None Raises RuntimeError – Raised if a server-side error is thrown in the process of retrieving informa- tion about the objects in names

292 Chapter 8. API Reference CHAPTER NINE

CHAPEL API REFERENCE

293 arkouda, Release 2020.07.07

294 Chapter 9. Chapel API Reference CHAPTER TEN

INDICES AND TABLES

• genindex • modindex • search

295 arkouda, Release 2020.07.07

296 Chapter 10. Indices and tables PYTHON MODULE INDEX

a arkouda, 87 arkouda._version, 87 arkouda.categorical, 90 arkouda.client, 98 arkouda.dtypes, 100 arkouda.groupbyclass, 102 arkouda.infoclass, 113 arkouda.io_util, 115 arkouda.join, 116 arkouda.logger, 117 arkouda.message, 117 arkouda.numeric, 119 arkouda.pdarrayclass, 130 arkouda.pdarraycreation, 148 arkouda.pdarrayIO, 126 arkouda.pdarraysetops, 159 arkouda.security, 165 arkouda.sorting, 166 arkouda.strings, 168 arkouda.timeclass, 182

297 arkouda, Release 2020.07.07

298 Python Module Index INDEX

Symbols __getitem__() (arkouda.strings.Strings method), 170 _AbstractBaseTime (class in arkouda), 286 __getitem__() (arkouda.timeclass._AbstractBaseTime _AbstractBaseTime (class in arkouda.timeclass), 182 method), 184 _BASE_UNIT (in module arkouda), 286 __gt__() (arkouda.pdarray method), 198, 275 _BASE_UNIT (in module arkouda.timeclass), 182 __gt__() (arkouda.pdarrayclass.pdarray method), 133 _Timescalar (class in arkouda), 286 __iadd__() (arkouda.pdarray method), 198, 276 _Timescalar (class in arkouda.timeclass), 182 __iadd__() (arkouda.pdarrayclass.pdarray method), __add__() (arkouda.Strings method), 260 134 __add__() (arkouda.pdarray method), 197, 275 __iand__() (arkouda.pdarray method), 199, 276 __add__() (arkouda.pdarrayclass.pdarray method), 133 __iand__() (arkouda.pdarrayclass.pdarray method), __add__() (arkouda.strings.Strings method), 177 134 __and__() (arkouda.pdarray method), 198, 275 __ifloordiv__() (arkouda.pdarray method), 198, 276 __and__() (arkouda.pdarrayclass.pdarray method), 133 __ifloordiv__() (arkouda.pdarrayclass.pdarray __array_priority__ (arkouda.pdarray attribute), 196, method), 134 274 __ilshift__() (arkouda.pdarray method), 199, 276 __array_priority__ (arkouda.pdarrayclass.pdarray __ilshift__() (arkouda.pdarrayclass.pdarray attribute), 132 method), 134 __bool__() (arkouda.pdarray method), 196, 274 __imul__() (arkouda.pdarray method), 198, 276 __bool__() (arkouda.pdarrayclass.pdarray method), __imul__() (arkouda.pdarrayclass.pdarray method), 132 134 __del__() (arkouda.pdarray method), 196, 274 __invert__() (arkouda.pdarray method), 198, 276 __del__() (arkouda.pdarrayclass.pdarray method), 132 __invert__() (arkouda.pdarrayclass.pdarray method), __eq__() (arkouda.Categorical method), 267 134 __eq__() (arkouda.Strings method), 252 __ior__() (arkouda.pdarray method), 199, 276 __eq__() (arkouda.categorical.Categorical method), 92 __ior__() (arkouda.pdarrayclass.pdarray method), 134 __eq__() (arkouda.pdarray method), 198, 275 __ipow__() (arkouda.pdarray method), 199, 276 __eq__() (arkouda.pdarrayclass.pdarray method), 133 __ipow__() (arkouda.pdarrayclass.pdarray method), __eq__() (arkouda.strings.Strings method), 170 134 __floordiv__() (arkouda.pdarray method), 198, 275 __irshift__() (arkouda.pdarray method), 199, 276 __floordiv__() (arkouda.pdarrayclass.pdarray __irshift__() (arkouda.pdarrayclass.pdarray method), 133 method), 134 __ge__() (arkouda.pdarray method), 198, 275 __isub__() (arkouda.pdarray method), 198, 276 __ge__() (arkouda.pdarrayclass.pdarray method), 133 __isub__() (arkouda.pdarrayclass.pdarray method), __getitem__() (arkouda.Categorical method), 267 134 __getitem__() (arkouda.Strings method), 252 __iter__() (arkouda.Categorical method), 266 __getitem__() (arkouda._AbstractBaseTime method), __iter__() (arkouda.Strings method), 252 288 __iter__() (arkouda.categorical.Categorical method), __getitem__() (arkouda.categorical.Categorical 91 method), 92 __iter__() (arkouda.pdarray method), 199, 276 __getitem__() (arkouda.pdarray method), 199, 276 __iter__() (arkouda.pdarrayclass.pdarray method), __getitem__() (arkouda.pdarrayclass.pdarray 134 method), 134 __iter__() (arkouda.strings.Strings method), 169

299 arkouda, Release 2020.07.07

__itruediv__() (arkouda.pdarray method), 198, 276 __repr__() (arkouda.pdarrayclass.pdarray method), __itruediv__() (arkouda.pdarrayclass.pdarray 132 method), 134 __repr__() (arkouda.strings.Strings method), 169 __ixor__() (arkouda.pdarray method), 199, 276 __repr__() (arkouda.timeclass._AbstractBaseTime __ixor__() (arkouda.pdarrayclass.pdarray method), method), 184 134 __rfloordiv__() (arkouda.pdarray method), 198, 275 __le__() (arkouda.pdarray method), 198, 275 __rfloordiv__() (arkouda.pdarrayclass.pdarray __le__() (arkouda.pdarrayclass.pdarray method), 133 method), 133 __len__() (arkouda.Categorical method), 266 __rlshift__() (arkouda.pdarray method), 198, 275 __len__() (arkouda.Strings method), 252 __rlshift__() (arkouda.pdarrayclass.pdarray __len__() (arkouda.categorical.Categorical method), method), 133 91 __rmod__() (arkouda.pdarray method), 198, 275 __len__() (arkouda.pdarray method), 196, 274 __rmod__() (arkouda.pdarrayclass.pdarray method), __len__() (arkouda.pdarrayclass.pdarray method), 132 133 __len__() (arkouda.strings.Strings method), 169 __rmul__() (arkouda.pdarray method), 197, 275 __lshift__() (arkouda.pdarray method), 198, 275 __rmul__() (arkouda.pdarrayclass.pdarray method), __lshift__() (arkouda.pdarrayclass.pdarray method), 133 133 __ror__() (arkouda.pdarray method), 198, 275 __lt__() (arkouda.pdarray method), 198, 275 __ror__() (arkouda.pdarrayclass.pdarray method), 133 __lt__() (arkouda.pdarrayclass.pdarray method), 133 __rpow__() (arkouda.pdarray method), 198, 275 __mod__() (arkouda.pdarray method), 198, 275 __rpow__() (arkouda.pdarrayclass.pdarray method), __mod__() (arkouda.pdarrayclass.pdarray method), 133 133 __mul__() (arkouda.pdarray method), 197, 275 __rrshift__() (arkouda.pdarray method), 198, 275 __mul__() (arkouda.pdarrayclass.pdarray method), 133 __rrshift__() (arkouda.pdarrayclass.pdarray __ne__() (arkouda.Categorical method), 267 method), 133 __ne__() (arkouda.Strings method), 252 __rshift__() (arkouda.pdarray method), 198, 275 __ne__() (arkouda.categorical.Categorical method), 92 __rshift__() (arkouda.pdarrayclass.pdarray method), __ne__() (arkouda.pdarray method), 198, 275 133 __ne__() (arkouda.pdarrayclass.pdarray method), 133 __rsub__() (arkouda.pdarray method), 197, 275 __ne__() (arkouda.strings.Strings method), 170 __rsub__() (arkouda.pdarrayclass.pdarray method), __neg__() (arkouda.pdarray method), 198, 275 133 __neg__() (arkouda.pdarrayclass.pdarray method), 134 __rtruediv__() (arkouda.pdarray method), 198, 275 __or__() (arkouda.pdarray method), 198, 275 __rtruediv__() (arkouda.pdarrayclass.pdarray __or__() (arkouda.pdarrayclass.pdarray method), 133 method), 133 __pow__() (arkouda.pdarray method), 198, 275 __rxor__() (arkouda.pdarray method), 198, 275 __pow__() (arkouda.pdarrayclass.pdarray method), 133 __rxor__() (arkouda.pdarrayclass.pdarray method), __radd__() (arkouda.Strings method), 260 133 __radd__() (arkouda.pdarray method), 197, 275 __setitem__() (arkouda._AbstractBaseTime method), __radd__() (arkouda.pdarrayclass.pdarray method), 288 133 __setitem__() (arkouda.pdarray method), 199, 276 __radd__() (arkouda.strings.Strings method), 178 __setitem__() (arkouda.pdarrayclass.pdarray __rand__() (arkouda.pdarray method), 198, 275 method), 134 __rand__() (arkouda.pdarrayclass.pdarray method), __setitem__() (arkouda.timeclass._AbstractBaseTime 133 method), 184 __repr__() (arkouda.Categorical method), 266 __slots (arkouda.message.RequestMessage attribute), __repr__() (arkouda.Strings method), 252 118 __repr__() (arkouda._AbstractBaseTime method), 287 __slots__ (arkouda.message.ReplyMessage attribute), __repr__() (arkouda.categorical.Categorical method), 118 91 __str__() (arkouda.Categorical method), 266 __repr__() (arkouda.message.MessageFormat __str__() (arkouda.Strings method), 252 method), 117 __str__() (arkouda._AbstractBaseTime method), 287 __repr__() (arkouda.message.MessageType method), __str__() (arkouda.categorical.Categorical method), 118 91 __repr__() (arkouda.pdarray method), 197, 274 __str__() (arkouda.message.MessageFormat method),

300 Index arkouda, Release 2020.07.07

117 method), 290 __str__() (arkouda.message.MessageType method), _is_supported_scalar() (ark- 118 ouda.timeclass.Datetime static method), __str__() (arkouda.pdarray method), 197, 274 186 __str__() (arkouda.pdarrayclass.pdarray method), 132 _is_supported_scalar() (ark- __str__() (arkouda.strings.Strings method), 169 ouda.timeclass.Timedelta static method), __str__() (arkouda.timeclass._AbstractBaseTime 187 method), 184 _is_timedelta_scalar() (ark- __sub__() (arkouda.pdarray method), 197, 275 ouda._AbstractBaseTime static method), __sub__() (arkouda.pdarrayclass.pdarray method), 133 288 __truediv__() (arkouda.pdarray method), 198, 275 _is_timedelta_scalar() (ark- __truediv__() (arkouda.pdarrayclass.pdarray ouda.timeclass._AbstractBaseTime static method), 133 method), 184 __version__ (in module arkouda), 195 _list_component_names() (arkouda.Categorical __xor__() (arkouda.pdarray method), 198, 275 method), 272 __xor__() (arkouda.pdarrayclass.pdarray method), 133 _list_component_names() (arkouda.Strings method), _binop() (arkouda.Categorical method), 266 262 _binop() (arkouda.Strings method), 252 _list_component_names() (ark- _binop() (arkouda._AbstractBaseTime method), 287 ouda.categorical.Categorical method), 97 _binop() (arkouda.categorical.Categorical method), 91 _list_component_names() (arkouda.pdarray _binop() (arkouda.pdarray method), 197, 274 method), 199, 276 _binop() (arkouda.pdarrayclass.pdarray method), 132 _list_component_names() (ark- _binop() (arkouda.strings.Strings method), 169 ouda.pdarrayclass.pdarray method), 134 _binop() (arkouda.timeclass._AbstractBaseTime _list_component_names() (arkouda.strings.Strings method), 184 method), 180 _get_callback() (arkouda.Datetime class method), _r_binop() (arkouda.Categorical method), 267 289 _r_binop() (arkouda._AbstractBaseTime method), 287 _get_callback() (arkouda.Timedelta class method), _r_binop() (arkouda.categorical.Categorical method), 290 92 _get_callback() (arkouda._AbstractBaseTime class _r_binop() (arkouda.pdarray method), 197, 274 method), 286 _r_binop() (arkouda.pdarrayclass.pdarray method), _get_callback() (arkouda.timeclass.Datetime class 132 method), 186 _r_binop() (arkouda.timeclass._AbstractBaseTime _get_callback() (arkouda.timeclass.Timedelta class method), 184 method), 187 _scalar_callback() (arkouda.Datetime method), 289 _get_callback() (ark- _scalar_callback() (arkouda.Timedelta method), 290 ouda.timeclass._AbstractBaseTime class _scalar_callback() (arkouda._AbstractBaseTime method), 183 method), 288 _get_components_dict() (arkouda.Categorical _scalar_callback() (arkouda.timeclass.Datetime method), 271 method), 186 _get_components_dict() (ark- _scalar_callback() (arkouda.timeclass.Timedelta ouda.categorical.Categorical method), 96 method), 187 _get_factor() (in module arkouda), 286 _scalar_callback() (ark- _get_factor() (in module arkouda.timeclass), 182 ouda.timeclass._AbstractBaseTime method), _identity() (in module arkouda), 286 184 _identity() (in module arkouda.timeclass), 182 _unit2factor (in module arkouda), 286 _is_datetime_scalar() (arkouda._AbstractBaseTime _unit2factor (in module arkouda.timeclass), 182 static method), 288 _unit2normunit (in module arkouda), 286 _is_datetime_scalar() (ark- _unit2normunit (in module arkouda.timeclass), 182 ouda.timeclass._AbstractBaseTime static method), 184 A _is_supported_scalar() (arkouda.Datetime static abs() (arkouda.timeclass.Timedelta method), 187 method), 289 abs() (arkouda.Timedelta method), 290 _is_supported_scalar() (arkouda.Timedelta static abs() (in module arkouda), 33, 230

Index 301 arkouda, Release 2020.07.07 abs() (in module arkouda.numeric), 120 arkouda._version aggregate() (arkouda.GroupBy method), 57, 241 module, 87 aggregate() (arkouda.groupbyclass.GroupBy method), arkouda.categorical 103 module, 90 ak_array() (in module arkouda), 284 arkouda.client akabs() (in module arkouda), 285 module, 98 all() (arkouda.GroupBy method), 57, 248 arkouda.dtypes all() (arkouda.groupbyclass.GroupBy method), 110 module, 100 all() (arkouda.pdarray method), 45, 199, 276 arkouda.groupbyclass all() (arkouda.pdarrayclass.pdarray method), 134 module, 102 all() (in module arkouda), 36, 206 arkouda.infoclass all() (in module arkouda.pdarrayclass), 141 module, 113 all_scalars (in module arkouda), 196 arkouda.io_util all_scalars (in module arkouda.dtypes), 101 module, 115 AllSymbols (in module arkouda), 292 arkouda.join AllSymbols (in module arkouda.infoclass), 114 module, 116 AND() (arkouda.GroupBy method), 56, 248 arkouda.logger AND() (arkouda.groupbyclass.GroupBy method), 111 module, 117 any() (arkouda.GroupBy method), 58, 247 arkouda.message any() (arkouda.groupbyclass.GroupBy method), 110 module, 117 any() (arkouda.pdarray method), 45, 199, 276 arkouda.numeric any() (arkouda.pdarrayclass.pdarray method), 134 module, 119 any() (in module arkouda), 36, 206 arkouda.pdarrayclass any() (in module arkouda.pdarrayclass), 141 module, 130 arange() (in module arkouda), 22, 223 arkouda.pdarraycreation arange() (in module arkouda.pdarraycreation), 153 module, 148 argmax() (arkouda.GroupBy method), 58, 246 arkouda.pdarrayIO argmax() (arkouda.groupbyclass.GroupBy method), 108 module, 126 argmax() (arkouda.pdarray method), 46, 200, 277 arkouda.pdarraysetops argmax() (arkouda.pdarrayclass.pdarray method), 135 module, 159 argmax() (in module arkouda), 38, 207 arkouda.security argmax() (in module arkouda.pdarrayclass), 142 module, 165 argmaxk() (arkouda.pdarray method), 47, 201, 278 arkouda.sorting argmaxk() (arkouda.pdarrayclass.pdarray method), 136 module, 166 argmaxk() (in module arkouda), 41, 211 arkouda.strings argmaxk() (in module arkouda.pdarrayclass), 146 module, 168 argmin() (arkouda.GroupBy method), 59, 245 arkouda.timeclass argmin() (arkouda.groupbyclass.GroupBy method), 108 module, 182 argmin() (arkouda.pdarray method), 46, 200, 277 ARKOUDA_SUPPORTED_DTYPES (in module arkouda), 195 argmin() (arkouda.pdarrayclass.pdarray method), 135 ARKOUDA_SUPPORTED_DTYPES (in module ark- argmin() (in module arkouda), 38, 207 ouda.dtypes), 101 argmin() (in module arkouda.pdarrayclass), 142 array() (in module arkouda), 25, 220 argmink() (arkouda.pdarray method), 47, 201, 278 array() (in module arkouda.pdarraycreation), 150 argmink() (arkouda.pdarrayclass.pdarray method), 136 asdict() (arkouda.message.RequestMessage method), argmink() (in module arkouda), 41, 210 118 argmink() (in module arkouda.pdarrayclass), 145 attach() (arkouda.Categorical static method), 272 args (arkouda.message.RequestMessage attribute), 118 attach() (arkouda.categorical.Categorical static argsort() (arkouda.Categorical method), 269 method), 97 argsort() (arkouda.categorical.Categorical method), attach() (arkouda.pdarray static method), 205, 282 94 attach() (arkouda.pdarrayclass.pdarray static method), argsort() (in module arkouda), 49, 213 140 argsort() (in module arkouda.sorting), 166 attach() (arkouda.Strings static method), 264 arkouda attach() (arkouda.strings.Strings static method), 181 module, 87 attach_pdarray() (in module arkouda), 212

302 Index arkouda, Release 2020.07.07 attach_pdarray() (in module arkouda.pdarrayclass), connect() (in module arkouda.client), 98 147 contains() (arkouda.Categorical method), 75, 267 contains() (arkouda.categorical.Categorical method), B 92 BINARY (arkouda.message.MessageFormat attribute), contains() (arkouda.Strings method), 67, 254 117 contains() (arkouda.strings.Strings method), 171 BinOps (arkouda.Categorical attribute), 265 cos() (in module arkouda), 35, 233 BinOps (arkouda.categorical.Categorical attribute), 90 cos() (in module arkouda.numeric), 122 BinOps (arkouda.pdarray attribute), 196, 274 count() (arkouda.GroupBy method), 61, 241 BinOps (arkouda.pdarrayclass.pdarray attribute), 132 count() (arkouda.groupbyclass.GroupBy method), 103 BinOps (arkouda.Strings attribute), 252 cumprod() (in module arkouda), 36, 232 BinOps (arkouda.strings.Strings attribute), 169 cumprod() (in module arkouda.numeric), 122 bool (in module arkouda), 195 cumsum() (in module arkouda), 35, 232 bool (in module arkouda.dtypes), 101 cumsum() (in module arkouda.numeric), 121 bool_scalars (in module arkouda), 195 bool_scalars (in module arkouda.dtypes), 101 D broadcast() (arkouda.GroupBy method), 60, 249 date_range() (in module arkouda), 290 broadcast() (arkouda.groupbyclass.GroupBy method), date_range() (in module arkouda.timeclass), 187 112 Datetime (class in arkouda), 288 broadcast() (in module arkouda), 250 Datetime (class in arkouda.timeclass), 185 broadcast() (in module arkouda.groupbyclass), 112 delimited_file_to_dict() (in module ark- bytes (arkouda.Strings attribute), 251 ouda.io_util), 115 bytes (arkouda.strings.Strings attribute), 168 dict_to_delimited_file() (in module ark- ouda.io_util), 115 C disableVerbose() (in module arkouda), 273 cached_regex_patterns() (arkouda.Strings method), disableVerbose() (in module arkouda.logger), 117 252 disconnect() (in module arkouda.client), 99 cached_regex_patterns() (arkouda.strings.Strings dtype (arkouda.pdarray attribute), 16, 196, 273 method), 170 dtype (arkouda.pdarrayclass.pdarray attribute), 131 cast() (in module arkouda), 18, 230, 285 dtype (arkouda.Strings attribute), 251 cast() (in module arkouda.numeric), 119 dtype (arkouda.strings.Strings attribute), 169 Categorical (class in arkouda), 74, 265 dtype (in module arkouda), 195 Categorical (class in arkouda.categorical), 90 dtype (in module arkouda.dtypes), 101 categories (arkouda.Categorical attribute), 74, 265 DTypeObjects (in module arkouda), 195 categories (arkouda.categorical.Categorical at- DTypeObjects (in module arkouda.dtypes), 101 tribute), 90 DTypes (in module arkouda), 195 ceil() (arkouda._AbstractBaseTime method), 286 DTypes (in module arkouda.dtypes), 101 ceil() (arkouda.timeclass._AbstractBaseTime method), 183 E check_np_dtype() (in module arkouda), 195 enableVerbose() (in module arkouda), 273 check_np_dtype() (in module arkouda.dtypes), 101 enableVerbose() (in module arkouda.logger), 117 clear() (in module arkouda), 205 endswith() (arkouda.Categorical method), 76, 268 clear() (in module arkouda.pdarrayclass), 140 endswith() (arkouda.categorical.Categorical method), cmd (arkouda.message.RequestMessage attribute), 118 93 coargsort() (in module arkouda), 49, 213 endswith() (arkouda.Strings method), 69, 255 coargsort() (in module arkouda.sorting), 167 endswith() (arkouda.strings.Strings method), 173 codes (arkouda.Categorical attribute), 74, 265 ERROR (arkouda.message.MessageType attribute), 118 codes (arkouda.categorical.Categorical attribute), 90 exp() (in module arkouda), 34, 231 concatenate() (arkouda.Categorical method), 269 exp() (in module arkouda.numeric), 121 concatenate() (arkouda.categorical.Categorical method), 94 F concatenate() (in module arkouda), 24, 216 fill() (arkouda.pdarray method), 199, 276 concatenate() (in module arkouda.pdarraysetops), 161 fill() (arkouda.pdarrayclass.pdarray method), 134 connect() (in module arkouda), 15 find_locations() (arkouda.Strings method), 252

Index 303 arkouda, Release 2020.07.07

find_locations() (arkouda.strings.Strings method), get_versions() (in module arkouda), 195 170 get_versions() (in module arkouda._version), 89 find_segments() (arkouda.GroupBy method), 241 git_get_keywords() (in module arkouda._version), 88 find_segments() (arkouda.groupbyclass.GroupBy git_pieces_from_vcs() (in module ark- method), 103 ouda._version), 88 findall() (arkouda.Strings method), 253 git_versions_from_keywords() (in module ark- findall() (arkouda.strings.Strings method), 171 ouda._version), 88 flatten() (arkouda.Strings method), 73, 257 group() (arkouda.Categorical method), 269 flatten() (arkouda.strings.Strings method), 174 group() (arkouda.categorical.Categorical method), 94 float64 (in module arkouda), 195 group() (arkouda.Strings method), 261 float64 (in module arkouda.dtypes), 101 group() (arkouda.strings.Strings method), 178 float_scalars (in module arkouda), 195 GroupBy (class in arkouda), 55, 240 float_scalars (in module arkouda.dtypes), 101 GroupBy (class in arkouda.groupbyclass), 102 floor() (arkouda._AbstractBaseTime method), 286 GROUPBY_REDUCTION_TYPES (in module arkouda), 251 floor() (arkouda.timeclass._AbstractBaseTime GROUPBY_REDUCTION_TYPES (in module ark- method), 183 ouda.groupbyclass), 102 format (arkouda.message.RequestMessage attribute), 118 H format_other() (arkouda.pdarray method), 197, 274 HANDLERS (in module arkouda._version), 88 format_other() (arkouda.pdarrayclass.pdarray hash() (arkouda.Strings method), 260 method), 132 hash() (arkouda.strings.Strings method), 178 from_codes() (arkouda.Categorical class method), 74, hash() (in module arkouda), 233 265 hash() (in module arkouda.numeric), 122 from_codes() (arkouda.categorical.Categorical class histogram() (in module arkouda), 47, 235 method), 91 histogram() (in module arkouda.numeric), 124 from_series() (in module arkouda), 228, 282 from_series() (in module arkouda.pdarraycreation), I 149 in1d() (arkouda.Categorical method), 268 fromdict() (arkouda.message.ReplyMessage static in1d() (arkouda.categorical.Categorical method), 93 method), 118 in1d() (in module arkouda), 51, 215 in1d() (in module arkouda.pdarraysetops), 160 G info() (arkouda.Categorical method), 272 generate_token() (in module arkouda.security), 165 info() (arkouda.categorical.Categorical method), 97 generate_username_token_json() (in module ark- info() (arkouda.pdarray method), 199, 276 ouda.security), 166 info() (arkouda.pdarrayclass.pdarray method), 134 get_arkouda_client_directory() (in module ark- info() (arkouda.Strings method), 263 ouda.security), 165 info() (arkouda.strings.Strings method), 180 get_byteorder() (in module arkouda), 196 information() (in module arkouda), 292 get_byteorder() (in module arkouda.dtypes), 101 information() (in module arkouda.infoclass), 114 get_config() (in module arkouda._version), 88 int64 (in module arkouda), 195, 282 get_config() (in module arkouda.client), 99 int64 (in module arkouda.dtypes), 101 get_datasets() (in module arkouda), 30, 239 int_scalars (in module arkouda), 196 get_datasets() (in module arkouda.pdarrayIO), 129 int_scalars (in module arkouda.dtypes), 101 get_directory() (in module arkouda.io_util), 115 intersect1d() (in module arkouda), 52, 218 get_home_directory() (in module arkouda.security), intersect1d() (in module arkouda.pdarraysetops), 162 165 is_registered() (arkouda.Categorical method), 271 get_keywords() (in module arkouda._version), 88 is_registered() (arkouda.categorical.Categorical get_lengths() (arkouda.Strings method), 252 method), 96 get_lengths() (arkouda.strings.Strings method), 170 is_registered() (arkouda.pdarray method), 199, 276 get_mem_used() (in module arkouda.client), 99 is_registered() (arkouda.pdarrayclass.pdarray get_server_byteorder() (in module arkouda), 196 method), 134 get_server_byteorder() (in module arkouda.dtypes), is_registered() (arkouda.Strings method), 262 101 is_registered() (arkouda.strings.Strings method), get_username() (in module arkouda.security), 166 180

304 Index arkouda, Release 2020.07.07 is_sorted() (arkouda.pdarray method), 45, 200, 277 maxk() (arkouda.pdarrayclass.pdarray method), 136 is_sorted() (arkouda.pdarrayclass.pdarray method), maxk() (arkouda.timeclass._AbstractBaseTime method), 135 185 is_sorted() (in module arkouda), 36, 206 maxk() (in module arkouda), 40, 209 is_sorted() (in module arkouda.pdarrayclass), 141 maxk() (in module arkouda.pdarrayclass), 144 isnan() (in module arkouda), 236 mean() (arkouda.GroupBy method), 62, 243 isnan() (in module arkouda.numeric), 125 mean() (arkouda.groupbyclass.GroupBy method), 106 isSupportedInt() (in module arkouda), 282 mean() (arkouda.pdarray method), 46, 200, 277 itemsize (arkouda.pdarray attribute), 17, 196, 273 mean() (arkouda.pdarrayclass.pdarray method), 135 itemsize (arkouda.pdarrayclass.pdarray attribute), 132 mean() (in module arkouda), 38, 207 mean() (in module arkouda.pdarrayclass), 142 J MessageFormat (class in arkouda.message), 117 join_on_eq_with_dt() (in module arkouda), 264 MessageType (class in arkouda.message), 118 join_on_eq_with_dt() (in module arkouda.join), 116 min() (arkouda._AbstractBaseTime method), 288 min() (arkouda.GroupBy method), 63, 244 L min() (arkouda.groupbyclass.GroupBy method), 106 min() (arkouda.pdarray method), 46, 200, 277 linspace() (in module arkouda), 22, 224 min() (arkouda.pdarrayclass.pdarray method), 135 linspace() (in module arkouda.pdarraycreation), 154 min() (arkouda.timeclass._AbstractBaseTime method), list_registry() (in module arkouda), 292 184 list_registry() (in module arkouda.infoclass), 114 min() (in module arkouda), 37, 207 list_symbol_table() (in module arkouda), 292 min() (in module arkouda.pdarrayclass), 142 list_symbol_table() (in module arkouda.infoclass), mink() (arkouda._AbstractBaseTime method), 288 114 mink() (arkouda.pdarray method), 46, 201, 278 load() (in module arkouda), 32, 238 mink() (arkouda.pdarrayclass.pdarray method), 136 load() (in module arkouda.pdarrayIO), 128 mink() (arkouda.timeclass._AbstractBaseTime method), load_all() (in module arkouda), 32, 239 185 load_all() (in module arkouda.pdarrayIO), 129 mink() (in module arkouda), 39, 209 log() (in module arkouda), 34, 231 mink() (in module arkouda.pdarrayclass), 144 log() (in module arkouda.numeric), 120 module logger (arkouda.GroupBy attribute), 55, 240 arkouda, 87 logger (arkouda.groupbyclass.GroupBy attribute), 103 arkouda._version, 87 logger (arkouda.Strings attribute), 251 arkouda.categorical, 90 logger (arkouda.strings.Strings attribute), 169 arkouda.client, 98 LONG_VERSION_PY (in module arkouda._version), 88 arkouda.dtypes, 100 ls_hdf() (in module arkouda), 30, 236 arkouda.groupbyclass, 102 ls_hdf() (in module arkouda.pdarrayIO), 126 arkouda.infoclass, 113 lstick() (arkouda.Strings method), 72, 260 arkouda.io_util, 115 lstick() (arkouda.strings.Strings method), 177 arkouda.join, 116 M arkouda.logger, 117 arkouda.message, 117 match() (arkouda.Strings method), 256 arkouda.numeric, 119 match() (arkouda.strings.Strings method), 173 arkouda.pdarrayclass, 130 max() (arkouda._AbstractBaseTime method), 288 arkouda.pdarraycreation, 148 max() (arkouda.GroupBy method), 61, 244 arkouda.pdarrayIO, 126 max() (arkouda.groupbyclass.GroupBy method), 107 arkouda.pdarraysetops, 159 max() (arkouda.pdarray method), 46, 200, 277 arkouda.security, 165 max() (arkouda.pdarrayclass.pdarray method), 135 arkouda.sorting, 166 max() (arkouda.timeclass._AbstractBaseTime method), arkouda.strings, 168 185 arkouda.timeclass, 182 max() (in module arkouda), 37, 207 msg (arkouda.message.ReplyMessage attribute), 118 max() (in module arkouda.pdarrayclass), 142 msgType (arkouda.message.ReplyMessage attribute), 118 maxk() (arkouda._AbstractBaseTime method), 288 maxk() (arkouda.pdarray method), 46, 201, 278

Index 305 arkouda, Release 2020.07.07

N parse_hdf_categoricals() (ark- name (arkouda.pdarray attribute), 16, 196, 273 ouda.categorical.Categorical static method), name (arkouda.pdarrayclass.pdarray attribute), 131 97 nbytes (arkouda.Strings attribute), 251 pdarray (class in arkouda), 16, 196, 273 nbytes (arkouda.strings.Strings attribute), 169 pdarray (class in arkouda.pdarrayclass), 131 ndim (arkouda.Categorical attribute), 74, 265 peel() (arkouda.Strings method), 69, 257 ndim (arkouda.categorical.Categorical attribute), 90 peel() (arkouda.strings.Strings method), 175 ndim (arkouda.pdarray attribute), 16, 196, 273 permutation (arkouda.Categorical attribute), 74, 265 ndim (arkouda.pdarrayclass.pdarray attribute), 131 permutation (arkouda.categorical.Categorical at- ndim (arkouda.Strings attribute), 251 tribute), 90, 91 ndim (arkouda.strings.Strings attribute), 169 permutation (arkouda.GroupBy attribute), 55, 240 ngroups (arkouda.GroupBy attribute), 55, 240 permutation (arkouda.groupbyclass.GroupBy at- ngroups (arkouda.groupbyclass.GroupBy attribute), 102 tribute), 102 nkeys (arkouda.GroupBy attribute), 55, 240 plus_or_dot() (in module arkouda._version), 89 nkeys (arkouda.groupbyclass.GroupBy attribute), 102 pretty_print_info() (arkouda.Categorical method), nlevels (arkouda.Categorical attribute), 74, 265 272 nlevels (arkouda.categorical.Categorical attribute), 90 pretty_print_info() (ark- NORMAL (arkouda.message.MessageType attribute), 118 ouda.categorical.Categorical method), 97 NotThisMethod, 88 pretty_print_info() (arkouda.pdarray method), 199, numeric_scalars (in module arkouda), 196 277 numeric_scalars (in module arkouda.dtypes), 101 pretty_print_info() (arkouda.pdarrayclass.pdarray numpy_scalars (in module arkouda), 196 method), 135 numpy_scalars (in module arkouda.dtypes), 101 pretty_print_info() (arkouda.Strings method), 263 nunique() (arkouda.GroupBy method), 63, 247 pretty_print_info() (arkouda.strings.Strings nunique() (arkouda.groupbyclass.GroupBy method), method), 180 109 pretty_print_information() (in module arkouda), 292 O pretty_print_information() (in module ark- objtype (arkouda.Categorical attribute), 265 ouda.infoclass), 114 prod() objtype (arkouda.categorical.Categorical attribute), 90 (arkouda.GroupBy method), 64, 242 prod() objtype (arkouda.pdarray attribute), 196, 274 (arkouda.groupbyclass.GroupBy method), 105 prod() objtype (arkouda.pdarrayclass.pdarray attribute), 132 (arkouda.pdarray method), 45, 200, 277 prod() objtype (arkouda.Strings attribute), 252 (arkouda.pdarrayclass.pdarray method), 135 prod() objtype (arkouda.strings.Strings attribute), 169 (in module arkouda), 37, 206 prod() offsets (arkouda.Strings attribute), 251 (in module arkouda.pdarrayclass), 141 offsets (arkouda.strings.Strings attribute), 168 ones() (in module arkouda), 20, 221 R ones() (in module arkouda.pdarraycreation), 151 randint() (in module arkouda), 23, 225 ones_like() (in module arkouda), 21, 222 randint() (in module arkouda.pdarraycreation), 155 ones_like() (in module arkouda.pdarraycreation), 153 random_strings_lognormal() (in module arkouda), opeq() (arkouda._AbstractBaseTime method), 288 227 opeq() (arkouda.pdarray method), 198, 276 random_strings_lognormal() (in module ark- opeq() (arkouda.pdarrayclass.pdarray method), 134 ouda.pdarraycreation), 158 opeq() (arkouda.timeclass._AbstractBaseTime method), random_strings_uniform() (in module arkouda), 227 184 random_strings_uniform() (in module ark- OpEqOps (arkouda.pdarray attribute), 196, 274 ouda.pdarraycreation), 157 OpEqOps (arkouda.pdarrayclass.pdarray attribute), 132 read_all() (in module arkouda), 29, 237 OR() (arkouda.GroupBy method), 56, 248 read_all() (in module arkouda.pdarrayIO), 127 OR() (arkouda.groupbyclass.GroupBy method), 110 read_hdf() (in module arkouda), 28, 236 read_hdf() (in module arkouda.pdarrayIO), 127 P Reductions (arkouda.GroupBy attribute), 241 Reductions parse_hdf_categoricals() (arkouda.Categorical (arkouda.groupbyclass.GroupBy attribute), static method), 272 103 regex_dict (arkouda.Strings attribute), 251

306 Index arkouda, Release 2020.07.07

regex_dict (arkouda.strings.Strings attribute), 169 save() (arkouda.pdarray method), 203, 280 register() (arkouda.Categorical method), 270 save() (arkouda.pdarrayclass.pdarray method), 138 register() (arkouda.categorical.Categorical method), save() (arkouda.Strings method), 262 95 save() (arkouda.strings.Strings method), 179 register() (arkouda.pdarray method), 204, 281 save() (in module arkouda.pdarray), 30 register() (arkouda.pdarrayclass.pdarray method), save_all() (in module arkouda), 31, 239 139 save_all() (in module arkouda.pdarrayIO), 129 register() (arkouda.Strings method), 263 segments (arkouda.Categorical attribute), 74, 265 register() (arkouda.strings.Strings method), 180 segments (arkouda.categorical.Categorical attribute), register_vcs_handler() (in module ark- 90, 91 ouda._version), 88 segments (arkouda.GroupBy attribute), 55, 240 RegisterablePieces (arkouda.Categorical attribute), segments (arkouda.groupbyclass.GroupBy attribute), 265 103 RegisterablePieces (ark- setdiff1d() (in module arkouda), 53, 218 ouda.categorical.Categorical attribute), setdiff1d() (in module arkouda.pdarraysetops), 163 90 setxor1d() (in module arkouda), 54, 219 RegisteredSymbols (in module arkouda), 292 setxor1d() (in module arkouda.pdarraysetops), 164 RegisteredSymbols (in module arkouda.infoclass), 114 shape (arkouda.Categorical attribute), 74, 265 RegistrationError, 148, 213 shape (arkouda.categorical.Categorical attribute), 90 render() (in module arkouda._version), 89 shape (arkouda.pdarray attribute), 16, 196, 273 render_git_describe() (in module ark- shape (arkouda.pdarrayclass.pdarray attribute), 131 ouda._version), 89 shape (arkouda.Strings attribute), 251 render_git_describe_long() (in module ark- shape (arkouda.strings.Strings attribute), 169 ouda._version), 89 shutdown() (in module arkouda.client), 99 render_pep440() (in module arkouda._version), 89 sin() (in module arkouda), 35, 233 render_pep440_old() (in module arkouda._version), sin() (in module arkouda.numeric), 122 89 size (arkouda.Categorical attribute), 74, 265 render_pep440_post() (in module arkouda._version), size (arkouda.categorical.Categorical attribute), 90 89 size (arkouda.GroupBy attribute), 55, 240 render_pep440_pre() (in module arkouda._version), size (arkouda.groupbyclass.GroupBy attribute), 102 89 size (arkouda.pdarray attribute), 16, 196, 273 ReplyMessage (class in arkouda.message), 118 size (arkouda.pdarrayclass.pdarray attribute), 131 RequestMessage (class in arkouda.message), 118 size (arkouda.Strings attribute), 251 RequiredPieces (arkouda.Categorical attribute), 265 size (arkouda.strings.Strings attribute), 169 RequiredPieces (arkouda.categorical.Categorical at- sort() (arkouda.Categorical method), 269 tribute), 90 sort() (arkouda.categorical.Categorical method), 94 reset_categories() (arkouda.Categorical method), sort() (in module arkouda), 214 267 sort() (in module arkouda.sorting), 168 reset_categories() (ark- standard_normal() (in module arkouda), 226 ouda.categorical.Categorical method), 92 standard_normal() (in module ark- resolve_scalar_dtype() (in module arkouda), 195 ouda.pdarraycreation), 157 resolve_scalar_dtype() (in module arkouda.dtypes), startswith() (arkouda.Categorical method), 75, 268 101 startswith() (arkouda.categorical.Categorical round() (arkouda._AbstractBaseTime method), 286 method), 93 round() (arkouda.timeclass._AbstractBaseTime startswith() (arkouda.Strings method), 68, 255 method), 183 startswith() (arkouda.strings.Strings method), 172 rpeel() (arkouda.Strings method), 70, 258 std() (arkouda.pdarray method), 46, 200, 277 rpeel() (arkouda.strings.Strings method), 176 std() (arkouda.pdarrayclass.pdarray method), 136 run_command() (in module arkouda._version), 88 std() (arkouda.timeclass.Timedelta method), 187 ruok() (in module arkouda.client), 99 std() (arkouda.Timedelta method), 290 std() (in module arkouda), 39, 208 S std() (in module arkouda.pdarrayclass), 143 save() (arkouda.Categorical method), 270 stick() (arkouda.Strings method), 71, 259 save() (arkouda.categorical.Categorical method), 95 stick() (arkouda.strings.Strings method), 177

Index 307 arkouda, Release 2020.07.07 str_ (in module arkouda), 195 supported_with_r_pdarray (arkouda.Timedelta at- str_ (in module arkouda.dtypes), 101 tribute), 290 str_scalars (in module arkouda), 196 supported_with_r_timedelta (arkouda.Datetime at- str_scalars (in module arkouda.dtypes), 101 tribute), 289 STRING (arkouda.message.MessageFormat attribute), supported_with_r_timedelta (ark- 117 ouda.timeclass.Datetime attribute), 186 Strings (class in arkouda), 251 supported_with_r_timedelta (ark- Strings (class in arkouda.strings), 168 ouda.timeclass.Timedelta attribute), 187 sum() (arkouda.Datetime method), 289 supported_with_r_timedelta (arkouda.Timedelta at- sum() (arkouda.GroupBy method), 65, 242 tribute), 290 sum() (arkouda.groupbyclass.GroupBy method), 104 supported_with_timedelta (arkouda.Datetime sum() (arkouda.pdarray method), 45, 200, 277 attribute), 289 sum() (arkouda.pdarrayclass.pdarray method), 135 supported_with_timedelta (ark- sum() (arkouda.timeclass.Datetime method), 186 ouda.timeclass.Datetime attribute), 186 sum() (arkouda.timeclass.Timedelta method), 187 supported_with_timedelta (ark- sum() (arkouda.Timedelta method), 290 ouda.timeclass.Timedelta attribute), 187 sum() (in module arkouda), 37, 206 supported_with_timedelta (arkouda.Timedelta at- sum() (in module arkouda.pdarrayclass), 141 tribute), 290 supported_opeq (arkouda.Datetime attribute), 289 supported_opeq (arkouda.timeclass.Datetime at- T tribute), 186 Timedelta (class in arkouda), 289 supported_opeq (arkouda.timeclass.Timedelta at- Timedelta (class in arkouda.timeclass), 186 tribute), 187 timedelta_range() (in module arkouda), 291 supported_opeq (arkouda.Timedelta attribute), 290 timedelta_range() (in module arkouda.timeclass), supported_with_datetime (arkouda.Datetime at- 188 tribute), 289 to_cuda() (arkouda.pdarray method), 202, 279 supported_with_datetime (ark- to_cuda() (arkouda.pdarrayclass.pdarray method), 137 ouda.timeclass.Datetime attribute), 186 to_ndarray() (arkouda._AbstractBaseTime method), supported_with_datetime (ark- 286 ouda.timeclass.Timedelta attribute), 187 to_ndarray() (arkouda.Categorical method), 266 supported_with_datetime (arkouda.Timedelta to_ndarray() (arkouda.categorical.Categorical attribute), 290 method), 91 supported_with_pdarray (arkouda.Datetime at- to_ndarray() (arkouda.pdarray method), 201, 278 tribute), 289 to_ndarray() (arkouda.pdarrayclass.pdarray method), supported_with_pdarray (ark- 136 ouda.timeclass.Datetime attribute), 186 to_ndarray() (arkouda.Strings method), 261 supported_with_pdarray (ark- to_ndarray() (arkouda.strings.Strings method), 178 ouda.timeclass.Timedelta attribute), 187 to_ndarray() (arkouda.timeclass._AbstractBaseTime supported_with_pdarray (arkouda.Timedelta at- method), 183 tribute), 290 to_ndarray() (in module arkouda.Categorical), 76 supported_with_r_datetime (arkouda.Datetime at- to_ndarray() (in module arkouda.pdarray), 17, 26 tribute), 289 to_ndarray() (in module arkouda.Strings), 27, 66 supported_with_r_datetime (ark- to_pandas() (arkouda.Datetime method), 289 ouda.timeclass.Datetime attribute), 186 to_pandas() (arkouda.timeclass.Datetime method), 186 supported_with_r_datetime (ark- to_pandas() (arkouda.timeclass.Timedelta method), ouda.timeclass.Timedelta attribute), 187 187 supported_with_r_datetime (arkouda.Timedelta at- to_pandas() (arkouda.Timedelta method), 290 tribute), 290 token (arkouda.message.RequestMessage attribute), 118 supported_with_r_pdarray (arkouda.Datetime translate_np_dtype() (in module arkouda), 195 attribute), 289 translate_np_dtype() (in module arkouda.dtypes), supported_with_r_pdarray (ark- 101 ouda.timeclass.Datetime attribute), 186 supported_with_r_pdarray (ark- U ouda.timeclass.Timedelta attribute), 187 uint8 (in module arkouda), 195

308 Index arkouda, Release 2020.07.07

uint8 (in module arkouda.dtypes), 101 write_line_to_file() (in module arkouda.io_util), uniform() (in module arkouda), 226 115 uniform() (in module arkouda.pdarraycreation), 156 union1d() (in module arkouda), 52, 217 X union1d() (in module arkouda.pdarraysetops), 162 XOR() (arkouda.GroupBy method), 56, 249 unique() (arkouda.Categorical method), 269 XOR() (arkouda.groupbyclass.GroupBy method), 111 unique() (arkouda.categorical.Categorical method), 94 unique() (in module arkouda), 50, 215 Z unique() (in module arkouda.pdarraysetops), 159 zeros() (in module arkouda), 19, 220 unique_keys (arkouda.GroupBy attribute), 55, 240 zeros() (in module arkouda.pdarraycreation), 151 unique_keys (arkouda.groupbyclass.GroupBy at- zeros_like() (in module arkouda), 20, 222 tribute), 102 zeros_like() (in module arkouda.pdarraycreation), unregister() (arkouda.Categorical method), 271 152 unregister() (arkouda.categorical.Categorical method), 96 unregister() (arkouda.pdarray method), 204, 281 unregister() (arkouda.pdarrayclass.pdarray method), 139 unregister() (arkouda.Strings method), 263 unregister() (arkouda.strings.Strings method), 181 unregister_categorical_by_name() (ark- ouda.Categorical static method), 272 unregister_categorical_by_name() (ark- ouda.categorical.Categorical static method), 97 unregister_pdarray_by_name() (in module ark- ouda), 212 unregister_pdarray_by_name() (in module ark- ouda.pdarrayclass), 147 unregister_strings_by_name() (arkouda.Strings static method), 264 unregister_strings_by_name() (ark- ouda.strings.Strings static method), 181 user (arkouda.message.ReplyMessage attribute), 118 user (arkouda.message.RequestMessage attribute), 118 username_tokenizer (in module arkouda.security), 165 V value_counts() (in module arkouda), 48, 235 value_counts() (in module arkouda.numeric), 125 var() (arkouda.pdarray method), 46, 200, 277 var() (arkouda.pdarrayclass.pdarray method), 135 var() (in module arkouda), 38, 208 var() (in module arkouda.pdarrayclass), 143 VersioneerConfig (class in arkouda._version), 88 versions_from_parentdir() (in module ark- ouda._version), 88 W WARNING (arkouda.message.MessageType attribute), 118 where() (in module arkouda), 42, 233 where() (in module arkouda.numeric), 123

Index 309