Skip to content

block mutation of read-only array in series #14359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion doc/source/whatsnew/v0.19.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,9 @@ Bug Fixes
- Bug in ``pd.concat`` with dataframes heterogeneous in length and tuple ``keys`` (:issue:`14438`)
- Bug in ``MultiIndex.set_levels`` where illegal level values were still set after raising an error (:issue:`13754`)
- Bug in ``DataFrame.to_json`` where ``lines=True`` and a value contained a ``}`` character (:issue:`14391`)
- Bug in ``df.groupby`` causing an ``AttributeError`` when grouping a single index frame by a column and the index level (:issue`14327`)
- Bug in ``df.groupby`` causing an ``AttributeError`` when grouping a single
- index frame by a column and the index level (:issue`14327`)

- Bug in ``pd.pivot_table`` may raise ``TypeError`` or ``ValueError`` when ``index`` or ``columns``
is not scalar and ``values`` is not specified (:issue:`14380`)
- Bug in ``Series.__setitem__` which allowed mutating read-only arrays (:issue:`14359`).
12 changes: 9 additions & 3 deletions pandas/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -975,7 +975,9 @@ def astype_intsafe(ndarray[object] arr, new_dtype):
if is_datelike and checknull(v):
result[i] = NPY_NAT
else:
util.set_value_at(result, i, v)
# we can use the unsafe version because we know `result` is mutable
# since it was created from `np.empty`
util.set_value_at_unsafe(result, i, v)

return result

Expand All @@ -986,7 +988,9 @@ cpdef ndarray[object] astype_unicode(ndarray arr):
ndarray[object] result = np.empty(n, dtype=object)

for i in range(n):
util.set_value_at(result, i, unicode(arr[i]))
# we can use the unsafe version because we know `result` is mutable
# since it was created from `np.empty`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here (and anywhere there is a range)

util.set_value_at_unsafe(result, i, unicode(arr[i]))

return result

Expand All @@ -997,7 +1001,9 @@ cpdef ndarray[object] astype_str(ndarray arr):
ndarray[object] result = np.empty(n, dtype=object)

for i in range(n):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should simply be checked once for the array (and don't call the unsafe function)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need an explicit check because we created this array with np.empty so it is mutable. The unsafe version of the function is the one that doesn't do the writeable check so this does what you describe unless I am missing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I read them backwards.....ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so perf basically the same then for non-readonly arrays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this line, it is always readonly. The only places where we call this in a loop are with known writeable arrays created by empty. In those cases we are going through the exact same code path as before, the function was just renamed to set_value_at_unsafe. In the other case, we were only doing scalar assignments so there was no looping. Those pay a very slight penalty to check the flags but in almost all cases it will return true. Also the main cost of that is dereferencing the array object but we do that in the expected case anyways.

util.set_value_at(result, i, str(arr[i]))
# we can use the unsafe version because we know `result` is mutable
# since it was created from `np.empty`
util.set_value_at_unsafe(result, i, str(arr[i]))

return result

Expand Down
15 changes: 14 additions & 1 deletion pandas/src/util.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,12 @@ cdef inline object get_value_at(ndarray arr, object loc):

return get_value_1d(arr, i)

cdef inline set_value_at(ndarray arr, object loc, object value):
cdef inline set_value_at_unsafe(ndarray arr, object loc, object value):
"""Sets a value into the array without checking the writeable flag.

This should be used when setting values in a loop, check the writeable
flag above the loop and then eschew the check on each iteration.
"""
cdef:
Py_ssize_t i, sz
if is_float_object(loc):
Expand All @@ -87,6 +92,14 @@ cdef inline set_value_at(ndarray arr, object loc, object value):

assign_value_1d(arr, i, value)

cdef inline set_value_at(ndarray arr, object loc, object value):
"""Sets a value into the array after checking that the array is mutable.
"""
if not cnp.PyArray_ISWRITEABLE(arr):
raise ValueError('assignment destination is read-only')

set_value_at_unsafe(arr, loc, value)

cdef inline int is_contiguous(ndarray arr):
return cnp.PyArray_CHKFLAGS(arr, cnp.NPY_C_CONTIGUOUS)

Expand Down
34 changes: 34 additions & 0 deletions pandas/tests/series/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1947,6 +1947,40 @@ def test_multilevel_preserve_name(self):
self.assertEqual(result.name, s.name)
self.assertEqual(result2.name, s.name)

def test_setitem_scalar_into_readonly_backing_data(self):
# GH14359: test that you cannot mutate a read only buffer

array = np.zeros(5)
array.flags.writeable = False # make the array immutable
series = Series(array)

for n in range(len(series)):
with self.assertRaises(ValueError):
series[n] = 1

self.assertEqual(
array[n],
0,
msg='even though the ValueError was raised, the underlying'
' array was still mutated!',
)

def test_setitem_slice_into_readonly_backing_data(self):
# GH14359: test that you cannot mutate a read only buffer

array = np.zeros(5)
array.flags.writeable = False # make the array immutable
series = Series(array)

with self.assertRaises(ValueError):
series[1:3] = 1

self.assertTrue(
not array.any(),
msg='even though the ValueError was raised, the underlying'
' array was still mutated!',
)

if __name__ == '__main__':
import nose
nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'],
Expand Down
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -476,7 +476,8 @@ def pxd(name):
'pandas/src/period_helper.c']},
index={'pyxfile': 'index',
'sources': ['pandas/src/datetime/np_datetime.c',
'pandas/src/datetime/np_datetime_strings.c']},
'pandas/src/datetime/np_datetime_strings.c'],
'pxdfiles': ['src/util']},
algos={'pyxfile': 'algos',
'pxdfiles': ['src/util'],
'depends': _pxi_dep['algos']},
Expand Down