|
| 1 | +.. _copyview-mutability: |
| 2 | + |
| 3 | +# Copy-view behaviour and mutability |
| 4 | + |
| 5 | +Strided array implementations (e.g. NumPy, PyTorch, CuPy, MXNet) typically |
| 6 | +have the concept of a "view", meaning an array containing data in memory that |
| 7 | +belongs to another array (i.e. a different "view" on the original data). |
| 8 | +Views are useful for performance reasons - not copying data to a new location |
| 9 | +saves memory and is faster than copying - but can also affect the semantics |
| 10 | +of code. This happens when views are combined with _mutating_ operations. |
| 11 | +This simple example illustrates that: |
| 12 | + |
| 13 | +```python |
| 14 | +x = ones(1) |
| 15 | +x += 2 |
| 16 | +y = x # `y` *may* be a view |
| 17 | +y -= 1 # if `y` is a view, this modifies `x` |
| 18 | +``` |
| 19 | + |
| 20 | +Code as simple as the above example will not be portable between array |
| 21 | +libraries - for NumPy/PyTorch/CuPy/MXNet `x` will contain the value `2`, |
| 22 | +while for TensorFlow/JAX/Dask it will contain the value `3`. The combination |
| 23 | +of views and mutability is fundamentally problematic here if the goal is to |
| 24 | +be able to write code with unambiguous semantics. |
| 25 | + |
| 26 | +Views are necessary for getting good performance out of the current strided |
| 27 | +array libraries. It is not always clear however when a library will return a |
| 28 | +view, and when it will return a copy. This API standard does not attempt to |
| 29 | +specify this - libraries can do either. |
| 30 | + |
| 31 | +There are several types of operations that do in-place mutation of data |
| 32 | +contained in arrays. These include: |
| 33 | + |
| 34 | +1. Inplace operators (e.g. `*=`) |
| 35 | +2. Item assignment (e.g. `x[0] = 1`) |
| 36 | +3. Slice assignment (e.g., `x[:2, :] = 3`) |
| 37 | +4. The `out=` keyword present in some strided array libraries (e.g. `sin(x, out=y`)) |
| 38 | + |
| 39 | +Libraries like TensorFlow and JAX tend to support inplace operators, provide |
| 40 | +alternative syntax for item and slice assignment (e.g. an `update_index` |
| 41 | +function or `x.at[idx].set(y)`), and have no need for `out=`. |
| 42 | + |
| 43 | +A potential solution could be to make views read-only, or use copy-on-write |
| 44 | +semantics. Both are hard to implement and would present significant issues |
| 45 | +for backwards compatibility for current strided array libraries. Read-only |
| 46 | +views would also not be a full solution, given that mutating the original |
| 47 | +(base) array will also result in ambiguous semantics. Hence this API standard |
| 48 | +does not attempt to go down this route. |
| 49 | + |
| 50 | +Both inplace operators and item/slice assignment can be mapped onto |
| 51 | +equivalent functional expressions (e.g. `x[idx] = val` maps to |
| 52 | +`x.at[idx].set(val)`), and given that both inplace operators and item/slice |
| 53 | +assignment are very widely used in both library and end user code, this |
| 54 | +standard chooses to include them. |
| 55 | + |
| 56 | +The situation with `out=` is slightly different - it's less heavily used, and |
| 57 | +easier to avoid. It's also not an optimal API, because it mixes an |
| 58 | +"efficiency of implementation" consideration ("you're allowed to do this |
| 59 | +inplace") with the semantics of a function ("the output _must_ be placed into |
| 60 | +this array). There's alternatives, for example the donated arguments in JAX |
| 61 | +or working buffers in LAPACK, that allow the user to express "you _may_ |
| 62 | +overwrite this data, do whatever is fastest". Given that those alternatives |
| 63 | +aren't widely used in array libraries today, this API standard chooses to (a) |
| 64 | +leave out `out=`, and (b) not specify another method of reusing arrays that |
| 65 | +are no longer needed as buffers. |
| 66 | + |
| 67 | +This leaves the problem of the initial example - with this API standard it |
| 68 | +remains possible to write code that will not work the same for all array |
| 69 | +libraries. This is something that the user must be careful about. |
| 70 | + |
| 71 | +.. note:: |
| 72 | + |
| 73 | + It is recommended that users avoid any mutating operations when a view may be involved." |
0 commit comments