Skip to content

Commit e3bd0a1

Browse files
authored
Transform data interchange mechanisms md to rst (#378)
1 parent bfbd2a8 commit e3bd0a1

File tree

2 files changed

+185
-181
lines changed

2 files changed

+185
-181
lines changed

spec/design_topics/data_interchange.md

-181
This file was deleted.
+185
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
.. _data-interchange:
2+
3+
Data interchange mechanisms
4+
===========================
5+
6+
This section discusses the mechanism to convert one type of array into another.
7+
As discussed in the :ref:`assumptions-dependencies <Assumptions>` section,
8+
*functions* provided by an array library are not expected to operate on
9+
*array types* implemented by another library. Instead, the array can be
10+
converted to a "native" array type.
11+
12+
The interchange mechanism must offer the following:
13+
14+
1. Data access via a protocol that describes the memory layout of the array
15+
in an implementation-independent manner.
16+
17+
*Rationale: any number of libraries must be able to exchange data, and no
18+
particular package must be needed to do so.*
19+
20+
2. Support for all dtypes in this API standard (see :ref:`data-types`).
21+
22+
3. Device support. It must be possible to determine on what device the array
23+
that is to be converted lives.
24+
25+
*Rationale: there are CPU-only, GPU-only, and multi-device array types;
26+
it's best to support these with a single protocol (with separate
27+
per-device protocols it's hard to figure out unambiguous rules for which
28+
protocol gets used, and the situation will get more complex over time
29+
as TPU's and other accelerators become more widely available).*
30+
31+
4. Zero-copy semantics where possible, making a copy only if needed (e.g.
32+
when data is not contiguous in memory).
33+
34+
*Rationale: performance.*
35+
36+
5. A Python-side and a C-side interface, the latter with a stable C ABI.
37+
38+
*Rationale: all prominent existing array libraries are implemented in
39+
C/C++, and are released independently from each other. Hence a stable C
40+
ABI is required for packages to work well together.*
41+
42+
The best candidate for this protocol is
43+
`DLPack <https://github.com/dmlc/dlpack>`_, and hence that is what this
44+
standard has chosen as the primary/recommended protocol. Note that the
45+
``asarray`` function also supports the Python buffer protocol (CPU-only) to
46+
support libraries that already implement buffer protocol support.
47+
48+
.. note::
49+
The main alternatives to DLPack are device-specific methods:
50+
51+
- The `buffer protocol <https://docs.python.org/dev/c-api/buffer.html>`_ on CPU
52+
- ``__cuda_array_interface__`` for CUDA, specified in the Numba documentation
53+
`here <https://numba.pydata.org/numba-doc/0.43.0/cuda/cuda_array_interface.html>`_
54+
(Python-side only at the moment)
55+
56+
An issue with device-specific protocols are: if two libraries both
57+
support multiple device types, in which order should the protocols be
58+
tried? A growth in the number of protocols to support each time a new
59+
device gets supported by array libraries (e.g. TPUs, AMD GPUs, emerging
60+
hardware accelerators) also seems undesirable.
61+
62+
In addition to the above argument, it is also clear from adoption
63+
patterns that DLPack has the widest support. The buffer protocol, despite
64+
being a lot older and standardized as part of Python itself via PEP 3118,
65+
hardly has any support from array libraries. CPU interoperability is
66+
mostly dealt with via the NumPy-specific ``__array__`` (which, when called,
67+
means the object it is attached to must return a ``numpy.ndarray``
68+
containing the data the object holds).
69+
70+
See the `RFC to adopt DLPack <https://github.com/data-apis/consortium-feedback/issues/1>`_
71+
for discussion that preceded the adoption of DLPack.
72+
73+
74+
DLPack support
75+
--------------
76+
77+
.. note::
78+
DLPack is a standalone protocol/project and can therefore be used outside of
79+
this standard. Python libraries that want to implement only DLPack support
80+
are recommended to do so using the same syntax and semantics as outlined
81+
below. They are not required to return an array object from ``from_dlpack``
82+
which conforms to this standard.
83+
84+
DLPack itself has no documentation currently outside of the inline comments in
85+
`dlpack.h <https://github.com/dmlc/dlpack/blob/main/include/dlpack/dlpack.h>`_.
86+
In the future, the below content may be migrated to the (to-be-written) DLPack docs.
87+
88+
89+
Syntax for data interchange with DLPack
90+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
91+
92+
The array API will offer the following syntax for data interchange:
93+
94+
1. A ``from_dlpack(x)`` function, which accepts (array) objects with a
95+
``__dlpack__`` method and uses that method to construct a new array
96+
containing the data from ``x``.
97+
2. ``__dlpack__(self, stream=None)`` and ``__dlpack_device__`` methods on the
98+
array object, which will be called from within ``from_dlpack``, to query
99+
what device the array is on (may be needed to pass in the correct
100+
stream, e.g. in the case of multiple GPUs) and to access the data.
101+
102+
103+
Semantics
104+
~~~~~~~~~
105+
106+
DLPack describe the memory layout of strided, n-dimensional arrays.
107+
When a user calls ``y = from_dlpack(x)``, the library implementing ``x`` (the
108+
"producer") will provide access to the data from ``x`` to the library
109+
containing ``from_dlpack`` (the "consumer"). If possible, this must be
110+
zero-copy (i.e. ``y`` will be a *view* on ``x``). If not possible, that library
111+
may make a copy of the data. In both cases:
112+
113+
- the producer keeps owning the memory
114+
- ``y`` may or may not be a view, therefore the user must keep the recommendation to avoid mutating ``y`` in mind - see :ref:`copyview-mutability`.
115+
- Both ``x`` and ``y`` may continue to be used just like arrays created in other ways.
116+
117+
If an array that is accessed via the interchange protocol lives on a
118+
device that the requesting library does not support, it is recommended to
119+
raise a ``TypeError``.
120+
121+
Stream handling through the ``stream`` keyword applies to CUDA and ROCm (perhaps
122+
to other devices that have a stream concept as well, however those haven't been
123+
considered in detail). The consumer must pass the stream it will use to the
124+
producer; the producer must synchronize or wait on the stream when necessary.
125+
In the common case of the default stream being used, synchronization will be
126+
unnecessary so asynchronous execution is enabled.
127+
128+
129+
Implementation
130+
~~~~~~~~~~~~~~
131+
132+
*Note that while this API standard largely tries to avoid discussing
133+
implementation details, some discussion and requirements are needed
134+
here because data interchange requires coordination between
135+
implementers on, e.g., memory management.*
136+
137+
.. image:: /_static/images/DLPack_diagram.png
138+
:alt: Diagram of DLPack structs
139+
140+
*DLPack diagram. Dark blue are the structs it defines, light blue
141+
struct members, gray text enum values of supported devices and data
142+
types.*
143+
144+
The ``__dlpack__`` method will produce a ``PyCapsule`` containing a
145+
``DLManagedTensor``, which will be consumed immediately within
146+
``from_dlpack`` - therefore it is consumed exactly once, and it will not be
147+
visible to users of the Python API.
148+
149+
The producer must set the ``PyCapsule`` name to ``"dltensor"`` so that
150+
it can be inspected by name, and set ``PyCapsule_Destructor`` that calls
151+
the ``deleter`` of the ``DLManagedTensor`` when the ``"dltensor"``-named
152+
capsule is no longer needed.
153+
154+
The consumer must transer ownership of the ``DLManangedTensor`` from the
155+
capsule to its own object. It does so by renaming the capsule to
156+
``"used_dltensor"`` to ensure that ``PyCapsule_Destructor`` will not get
157+
called (ensured if ``PyCapsule_Destructor`` calls ``deleter`` only for
158+
capsules whose name is ``"dltensor"``), but the ``deleter`` of the
159+
``DLManagedTensor`` will be called by the destructor of the consumer
160+
library object created to own the ``DLManagerTensor`` obtained from the
161+
capsule.
162+
163+
Note: the capsule names ``"dltensor"`` and ``"used_dltensor"`` must be
164+
statically allocated.
165+
166+
When the ``strides`` field in the ``DLTensor`` struct is ``NULL``, it indicates a
167+
row-major compact array. If the array is of size zero, the data pointer in
168+
``DLTensor`` should be set to either ``NULL`` or ``0``.
169+
170+
DLPack version used must be ``0.2 <= DLPACK_VERSION < 1.0``. For further
171+
details on DLPack design and how to implement support for it,
172+
refer to `github.com/dmlc/dlpack <https://github.com/dmlc/dlpack>`_.
173+
174+
.. warning::
175+
DLPack contains a ``device_id``, which will be the device
176+
ID (an integer, ``0, 1, ...``) which the producer library uses. In
177+
practice this will likely be the same numbering as that of the
178+
consumer, however that is not guaranteed. Depending on the hardware
179+
type, it may be possible for the consumer library implementation to
180+
look up the actual device from the pointer to the data - this is
181+
possible for example for CUDA device pointers.
182+
183+
It is recommended that implementers of this array API consider and document
184+
whether the ``.device`` attribute of the array returned from ``from_dlpack`` is
185+
guaranteed to be in a certain order or not.

0 commit comments

Comments
 (0)