|
| 1 | +.. _data-interchange: |
| 2 | + |
| 3 | +Data interchange mechanisms |
| 4 | +=========================== |
| 5 | + |
| 6 | +This section discusses the mechanism to convert one type of array into another. |
| 7 | +As discussed in the :ref:`assumptions-dependencies <Assumptions>` section, |
| 8 | +*functions* provided by an array library are not expected to operate on |
| 9 | +*array types* implemented by another library. Instead, the array can be |
| 10 | +converted to a "native" array type. |
| 11 | + |
| 12 | +The interchange mechanism must offer the following: |
| 13 | + |
| 14 | +1. Data access via a protocol that describes the memory layout of the array |
| 15 | + in an implementation-independent manner. |
| 16 | + |
| 17 | + *Rationale: any number of libraries must be able to exchange data, and no |
| 18 | + particular package must be needed to do so.* |
| 19 | + |
| 20 | +2. Support for all dtypes in this API standard (see :ref:`data-types`). |
| 21 | + |
| 22 | +3. Device support. It must be possible to determine on what device the array |
| 23 | + that is to be converted lives. |
| 24 | + |
| 25 | + *Rationale: there are CPU-only, GPU-only, and multi-device array types; |
| 26 | + it's best to support these with a single protocol (with separate |
| 27 | + per-device protocols it's hard to figure out unambiguous rules for which |
| 28 | + protocol gets used, and the situation will get more complex over time |
| 29 | + as TPU's and other accelerators become more widely available).* |
| 30 | + |
| 31 | +4. Zero-copy semantics where possible, making a copy only if needed (e.g. |
| 32 | + when data is not contiguous in memory). |
| 33 | + |
| 34 | + *Rationale: performance.* |
| 35 | + |
| 36 | +5. A Python-side and a C-side interface, the latter with a stable C ABI. |
| 37 | + |
| 38 | + *Rationale: all prominent existing array libraries are implemented in |
| 39 | + C/C++, and are released independently from each other. Hence a stable C |
| 40 | + ABI is required for packages to work well together.* |
| 41 | + |
| 42 | +The best candidate for this protocol is |
| 43 | +`DLPack <https://github.com/dmlc/dlpack>`_, and hence that is what this |
| 44 | +standard has chosen as the primary/recommended protocol. Note that the |
| 45 | +``asarray`` function also supports the Python buffer protocol (CPU-only) to |
| 46 | +support libraries that already implement buffer protocol support. |
| 47 | + |
| 48 | +.. note:: |
| 49 | + The main alternatives to DLPack are device-specific methods: |
| 50 | + |
| 51 | + - The `buffer protocol <https://docs.python.org/dev/c-api/buffer.html>`_ on CPU |
| 52 | + - ``__cuda_array_interface__`` for CUDA, specified in the Numba documentation |
| 53 | + `here <https://numba.pydata.org/numba-doc/0.43.0/cuda/cuda_array_interface.html>`_ |
| 54 | + (Python-side only at the moment) |
| 55 | + |
| 56 | + An issue with device-specific protocols are: if two libraries both |
| 57 | + support multiple device types, in which order should the protocols be |
| 58 | + tried? A growth in the number of protocols to support each time a new |
| 59 | + device gets supported by array libraries (e.g. TPUs, AMD GPUs, emerging |
| 60 | + hardware accelerators) also seems undesirable. |
| 61 | + |
| 62 | + In addition to the above argument, it is also clear from adoption |
| 63 | + patterns that DLPack has the widest support. The buffer protocol, despite |
| 64 | + being a lot older and standardized as part of Python itself via PEP 3118, |
| 65 | + hardly has any support from array libraries. CPU interoperability is |
| 66 | + mostly dealt with via the NumPy-specific ``__array__`` (which, when called, |
| 67 | + means the object it is attached to must return a ``numpy.ndarray`` |
| 68 | + containing the data the object holds). |
| 69 | + |
| 70 | + See the `RFC to adopt DLPack <https://github.com/data-apis/consortium-feedback/issues/1>`_ |
| 71 | + for discussion that preceded the adoption of DLPack. |
| 72 | + |
| 73 | + |
| 74 | +DLPack support |
| 75 | +-------------- |
| 76 | + |
| 77 | +.. note:: |
| 78 | + DLPack is a standalone protocol/project and can therefore be used outside of |
| 79 | + this standard. Python libraries that want to implement only DLPack support |
| 80 | + are recommended to do so using the same syntax and semantics as outlined |
| 81 | + below. They are not required to return an array object from ``from_dlpack`` |
| 82 | + which conforms to this standard. |
| 83 | + |
| 84 | + DLPack itself has no documentation currently outside of the inline comments in |
| 85 | + `dlpack.h <https://github.com/dmlc/dlpack/blob/main/include/dlpack/dlpack.h>`_. |
| 86 | + In the future, the below content may be migrated to the (to-be-written) DLPack docs. |
| 87 | + |
| 88 | + |
| 89 | +Syntax for data interchange with DLPack |
| 90 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 91 | + |
| 92 | +The array API will offer the following syntax for data interchange: |
| 93 | + |
| 94 | +1. A ``from_dlpack(x)`` function, which accepts (array) objects with a |
| 95 | + ``__dlpack__`` method and uses that method to construct a new array |
| 96 | + containing the data from ``x``. |
| 97 | +2. ``__dlpack__(self, stream=None)`` and ``__dlpack_device__`` methods on the |
| 98 | + array object, which will be called from within ``from_dlpack``, to query |
| 99 | + what device the array is on (may be needed to pass in the correct |
| 100 | + stream, e.g. in the case of multiple GPUs) and to access the data. |
| 101 | + |
| 102 | + |
| 103 | +Semantics |
| 104 | +~~~~~~~~~ |
| 105 | + |
| 106 | +DLPack describe the memory layout of strided, n-dimensional arrays. |
| 107 | +When a user calls ``y = from_dlpack(x)``, the library implementing ``x`` (the |
| 108 | +"producer") will provide access to the data from ``x`` to the library |
| 109 | +containing ``from_dlpack`` (the "consumer"). If possible, this must be |
| 110 | +zero-copy (i.e. ``y`` will be a *view* on ``x``). If not possible, that library |
| 111 | +may make a copy of the data. In both cases: |
| 112 | + |
| 113 | +- the producer keeps owning the memory |
| 114 | +- ``y`` may or may not be a view, therefore the user must keep the recommendation to avoid mutating ``y`` in mind - see :ref:`copyview-mutability`. |
| 115 | +- Both ``x`` and ``y`` may continue to be used just like arrays created in other ways. |
| 116 | + |
| 117 | +If an array that is accessed via the interchange protocol lives on a |
| 118 | +device that the requesting library does not support, it is recommended to |
| 119 | +raise a ``TypeError``. |
| 120 | + |
| 121 | +Stream handling through the ``stream`` keyword applies to CUDA and ROCm (perhaps |
| 122 | +to other devices that have a stream concept as well, however those haven't been |
| 123 | +considered in detail). The consumer must pass the stream it will use to the |
| 124 | +producer; the producer must synchronize or wait on the stream when necessary. |
| 125 | +In the common case of the default stream being used, synchronization will be |
| 126 | +unnecessary so asynchronous execution is enabled. |
| 127 | + |
| 128 | + |
| 129 | +Implementation |
| 130 | +~~~~~~~~~~~~~~ |
| 131 | + |
| 132 | +*Note that while this API standard largely tries to avoid discussing |
| 133 | +implementation details, some discussion and requirements are needed |
| 134 | +here because data interchange requires coordination between |
| 135 | +implementers on, e.g., memory management.* |
| 136 | + |
| 137 | +.. image:: /_static/images/DLPack_diagram.png |
| 138 | + :alt: Diagram of DLPack structs |
| 139 | + |
| 140 | +*DLPack diagram. Dark blue are the structs it defines, light blue |
| 141 | +struct members, gray text enum values of supported devices and data |
| 142 | +types.* |
| 143 | + |
| 144 | +The ``__dlpack__`` method will produce a ``PyCapsule`` containing a |
| 145 | +``DLManagedTensor``, which will be consumed immediately within |
| 146 | +``from_dlpack`` - therefore it is consumed exactly once, and it will not be |
| 147 | +visible to users of the Python API. |
| 148 | + |
| 149 | +The producer must set the ``PyCapsule`` name to ``"dltensor"`` so that |
| 150 | +it can be inspected by name, and set ``PyCapsule_Destructor`` that calls |
| 151 | +the ``deleter`` of the ``DLManagedTensor`` when the ``"dltensor"``-named |
| 152 | +capsule is no longer needed. |
| 153 | + |
| 154 | +The consumer must transer ownership of the ``DLManangedTensor`` from the |
| 155 | +capsule to its own object. It does so by renaming the capsule to |
| 156 | +``"used_dltensor"`` to ensure that ``PyCapsule_Destructor`` will not get |
| 157 | +called (ensured if ``PyCapsule_Destructor`` calls ``deleter`` only for |
| 158 | +capsules whose name is ``"dltensor"``), but the ``deleter`` of the |
| 159 | +``DLManagedTensor`` will be called by the destructor of the consumer |
| 160 | +library object created to own the ``DLManagerTensor`` obtained from the |
| 161 | +capsule. |
| 162 | + |
| 163 | +Note: the capsule names ``"dltensor"`` and ``"used_dltensor"`` must be |
| 164 | +statically allocated. |
| 165 | + |
| 166 | +When the ``strides`` field in the ``DLTensor`` struct is ``NULL``, it indicates a |
| 167 | +row-major compact array. If the array is of size zero, the data pointer in |
| 168 | +``DLTensor`` should be set to either ``NULL`` or ``0``. |
| 169 | + |
| 170 | +DLPack version used must be ``0.2 <= DLPACK_VERSION < 1.0``. For further |
| 171 | +details on DLPack design and how to implement support for it, |
| 172 | +refer to `github.com/dmlc/dlpack <https://github.com/dmlc/dlpack>`_. |
| 173 | + |
| 174 | +.. warning:: |
| 175 | + DLPack contains a ``device_id``, which will be the device |
| 176 | + ID (an integer, ``0, 1, ...``) which the producer library uses. In |
| 177 | + practice this will likely be the same numbering as that of the |
| 178 | + consumer, however that is not guaranteed. Depending on the hardware |
| 179 | + type, it may be possible for the consumer library implementation to |
| 180 | + look up the actual device from the pointer to the data - this is |
| 181 | + possible for example for CUDA device pointers. |
| 182 | + |
| 183 | + It is recommended that implementers of this array API consider and document |
| 184 | + whether the ``.device`` attribute of the array returned from ``from_dlpack`` is |
| 185 | + guaranteed to be in a certain order or not. |
0 commit comments