Skip to content

Commit 39e0d62

Browse files
authored
Add section on copy-view behaviour and mutability (#66)
Closes gh-24
1 parent 0e8d675 commit 39e0d62

File tree

3 files changed

+77
-2
lines changed

3 files changed

+77
-2
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
.. _copyview-mutability:
2+
3+
# Copy-view behaviour and mutability
4+
5+
Strided array implementations (e.g. NumPy, PyTorch, CuPy, MXNet) typically
6+
have the concept of a "view", meaning an array containing data in memory that
7+
belongs to another array (i.e. a different "view" on the original data).
8+
Views are useful for performance reasons - not copying data to a new location
9+
saves memory and is faster than copying - but can also affect the semantics
10+
of code. This happens when views are combined with _mutating_ operations.
11+
This simple example illustrates that:
12+
13+
```python
14+
x = ones(1)
15+
x += 2
16+
y = x # `y` *may* be a view
17+
y -= 1 # if `y` is a view, this modifies `x`
18+
```
19+
20+
Code as simple as the above example will not be portable between array
21+
libraries - for NumPy/PyTorch/CuPy/MXNet `x` will contain the value `2`,
22+
while for TensorFlow/JAX/Dask it will contain the value `3`. The combination
23+
of views and mutability is fundamentally problematic here if the goal is to
24+
be able to write code with unambiguous semantics.
25+
26+
Views are necessary for getting good performance out of the current strided
27+
array libraries. It is not always clear however when a library will return a
28+
view, and when it will return a copy. This API standard does not attempt to
29+
specify this - libraries can do either.
30+
31+
There are several types of operations that do in-place mutation of data
32+
contained in arrays. These include:
33+
34+
1. Inplace operators (e.g. `*=`)
35+
2. Item assignment (e.g. `x[0] = 1`)
36+
3. Slice assignment (e.g., `x[:2, :] = 3`)
37+
4. The `out=` keyword present in some strided array libraries (e.g. `sin(x, out=y`))
38+
39+
Libraries like TensorFlow and JAX tend to support inplace operators, provide
40+
alternative syntax for item and slice assignment (e.g. an `update_index`
41+
function or `x.at[idx].set(y)`), and have no need for `out=`.
42+
43+
A potential solution could be to make views read-only, or use copy-on-write
44+
semantics. Both are hard to implement and would present significant issues
45+
for backwards compatibility for current strided array libraries. Read-only
46+
views would also not be a full solution, given that mutating the original
47+
(base) array will also result in ambiguous semantics. Hence this API standard
48+
does not attempt to go down this route.
49+
50+
Both inplace operators and item/slice assignment can be mapped onto
51+
equivalent functional expressions (e.g. `x[idx] = val` maps to
52+
`x.at[idx].set(val)`), and given that both inplace operators and item/slice
53+
assignment are very widely used in both library and end user code, this
54+
standard chooses to include them.
55+
56+
The situation with `out=` is slightly different - it's less heavily used, and
57+
easier to avoid. It's also not an optimal API, because it mixes an
58+
"efficiency of implementation" consideration ("you're allowed to do this
59+
inplace") with the semantics of a function ("the output _must_ be placed into
60+
this array). There are libraries that do some form of tracing or abstract
61+
interpretation over a language that does not support mutation (to make
62+
analysis easier); in those cases implementing `out=` with correct handling of
63+
views may even be impossible to do. There's alternatives, for example the
64+
donated arguments in JAX or working buffers in LAPACK, that allow the user to
65+
express "you _may_ overwrite this data, do whatever is fastest". Given that
66+
those alternatives aren't widely used in array libraries today, this API
67+
standard chooses to (a) leave out `out=`, and (b) not specify another method
68+
of reusing arrays that are no longer needed as buffers.
69+
70+
This leaves the problem of the initial example - with this API standard it
71+
remains possible to write code that will not work the same for all array
72+
libraries. This is something that the user must be careful about.
73+
74+
.. note::
75+
76+
It is recommended that users avoid any mutating operations when a view may be involved."

spec/design_topics/eager_lazy_eval.md

-1
This file was deleted.

spec/design_topics/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Design topics & constraints
55
:caption: Design topics & constraints
66
:maxdepth: 1
77

8-
eager_lazy_eval
8+
copies_views_and_mutation
99
parallelism
1010
static_typing
1111
data_interchange

0 commit comments

Comments
 (0)