Python

Numpy dispatcher: when Numpy becomes a protocol for an ecosystem

06/07/201919/01/2020 by Christian S. Perone

Introduction

Not a lot of people working with the Python scientific ecosystem are aware of the NEP 18 (dispatch mechanism for NumPy’s high-level array functions). Given the importance of this protocol, I decided to write this short introduction to the new dispatcher that will certainly bring a lot of benefits for the Python scientific ecosystem.

If you used PyTorch, TensorFlow, Dask, etc, you certainly noticed the similarity of their API contracts with Numpy. And it’s not by accident, Numpy’s API is one of the most fundamental and widely-used APIs for scientific computing. Numpy is so pervasive, that it ceased to be only an API and it is becoming more a protocol or an API specification.

There are countless advantages when an an entire ecosystem adopts an API specification. The wide adoption of the Numpy protocols will certainly help create a standardized API that will provide seamless integration across implementations, even for the ones that are quite different than the ndarray.

The main goal of the NEP 18, is to provide a protocol that is materialized on the method named __array_function__, where it will allow developers of frameworks such as PyTorch, TensorFlow, Dask, etc, to provide their own implementations of the Numpy API specification.

The __array_function__ protocol is still experimental and will be enabled by default on the next Numpy v.1.17, however, on the v1.16 you can enable it by setting the environment variable called NUMPY_EXPERIMENTAL_ARRAY_FUNCTION to 1.

PyTorch example

It’s easy to provide an example of how the protocol can be implemented in PyTorch in order to get a feeling of what it will allow you to do.

First, we define the protocol that we will use to monkey-patch PyTorch (but you can change the framework itself, of course, that’s the whole point):

HANDLED_FUNCTIONS = {}
def __array_function__(self, func, types, args, kwargs):
    if func not in HANDLED_FUNCTIONS:
        return NotImplemented
    if not all(issubclass(t, torch.Tensor) for t in types):
        return NotImplemented
    return HANDLED_FUNCTIONS[func](*args, **kwargs)

After that, we create a simple decorator to set the HANDLED_FUNCTIONS dictionary with PyTorch’s implementation of the Numpy API:

def implements(numpy_function):
    def decorator(func):
        HANDLED_FUNCTIONS[numpy_function] = func
        return func
    return decorator

Later, we just have to monkey patch PyTorch (not required if you change PyTorch code itself) and provide a Numpy implementation of the PyTorch operations:

torch.Tensor.__array_function__ = __array_function__

@implements(np.sum)
def npsum(arr, axis=0):
    return torch.sum(arr, dim=axis)

And that’s it ! You can now do things like this:

>>> torch_tensor = torch.ones(10)
>>> type(torch_tensor)
torch.Tensor
>>> result = np.sum(torch_tensor)
>>> result
tensor(10.)
>>> type(result)
torch.Tensor

This is quite amazing, given that you can easily mix APIs together and write a single piece of code that will work in PyTorch, Numpy or other frameworks independently of the multi-dimensional array implementation and diverging to the correct backend implementation.

Dask example

Dask is already implementing the protocol, so you don’t have to monkey-patch it:

>>> import numpy as np
>>> import dask.array as da

>>> dask_array = da.random.random((5000, 1000), chunks=(1000, 1000))
>>> type(dask_array)
dask.array.core.Array

>>> u, s, v = np.linalg.svd(dask_array) # Numpy API !
>>> u

>>> s

>>> v

As you can see, when we used the singular decomposition from Numpy, we were able to define the Dask computational graph using the Numpy API:

>>> v.visualize()

And we can now compute the result using Dask lazy eval:

>>> result = v.compute()
>>> result
array([[ 0.03178809,  0.03147971,  0.0313492 , ...,  0.03201226,
         0.03142089,  0.03180248],
       [ 0.00272678, -0.0182263 ,  0.00839066, ...,  0.01195828,
         0.01681703, -0.05086809],
       [-0.00851417,  0.0411336 ,  0.02738713, ...,  0.03447079,
         0.04614097,  0.02292196],
       ...,

And you can do the same with CuPy as well.

The new dispatch protocol will open the door for seamless integration among frameworks on the Python scientific ecosystem. I hope that more and more frameworks will start to adopt it in near future.

– Christian S. Perone

Cite this article as: Christian S. Perone, "Numpy dispatcher: when Numpy becomes a protocol for an ecosystem," in Terra Incognita, 06/07/2019, https://blog.christianperone.com/2019/07/numpy-dispatcher-when-numpy-becomes-a-protocol-for-an-ecosystem/.

Numpy dispatcher: when Numpy becomes a protocol for an ecosystem

Introduction

PyTorch example

Dask example

Leave a Reply Cancel reply

Tags