Skip to content

API/BUG: PandasArray __setitem__ can change underlying buffer #28150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Aug 26, 2019 · 8 comments · Fixed by #28119
Closed

API/BUG: PandasArray __setitem__ can change underlying buffer #28150

jbrockmendel opened this issue Aug 26, 2019 · 8 comments · Fixed by #28119
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@jbrockmendel
Copy link
Member

Discussed briefly in #28119, the ndarray backing a PandasArray can be swapped out by setitem, which can have surprising behavior for views

arr = pd.array([1, 2, 3])
view1 = arr.view()
view2 = arr[:]
view3 = np.asarray(arr)

arr[0] = 9
assert view1[0] == 9
assert view2[0] == 9 
assert view3[0] == 9

arr[1] = 2.5
view1[-1] = 5
assert arr[-1] == 5  # FAIL
@jreback jreback added this to the 1.0 milestone Sep 10, 2019
@jreback jreback added the ExtensionArray Extending pandas with custom dtypes or arrays. label Sep 10, 2019
@jbrockmendel
Copy link
Member Author

@TomAugspurger sure this is closed by #28119? My understanding was that specifically punted on this topic.

@jbrockmendel jbrockmendel reopened this Sep 17, 2019
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 17, 2019 via email

@TomAugspurger
Copy link
Contributor

So the way NumPy handles this is by not allowing __setitem__ to change the dtype of the array.

In [18]: a = np.array([1, 2])

In [19]: a[0] = 1.5

In [20]: a
Out[20]: array([1, 2])

On master, we now match that behavior. The original code snippet passes.

I think the remaining question is: do we want this behavior? Silently truncating the input? Or would we rather raise when the input cannot be faithfully inserted into an array with that dtype (can we even determine that)?

@jreback
Copy link
Contributor

jreback commented Sep 23, 2019

maybe we should start warning on this? I think we are going to eventually be able to 'fix' this, but would be a biggish change.

@jbrockmendel
Copy link
Member Author

maybe we should start warning on this? I think we are going to eventually be able to 'fix' this, but would be a biggish change.

Is anyone actually using PandasArray? I'm hoping they'll eventually back Block, but until then, it isn't clear to me that this is really user-facing.

I think the remaining question is: do we want this behavior? Silently truncating the input? Or would we rather raise when the input cannot be faithfully inserted into an array with that dtype (can we even determine that)?

Truncating is basically the same as ndarray behavior, which I'm fine with. Raising is basically the same as DTA/TDA/PA/Categorical (more general EA is less well-defined AFAICT) behavior, which I'm also fine with. If we make it raise, I think that means we can get rid of Block._can_hold_element (although we would be replacing it with a try/except, which I'm wary of).

@TomAugspurger
Copy link
Contributor

Is anyone actually using PandasArray? I'm hoping they'll eventually back Block, but until then, it isn't clear to me that this is really user-facing.

The main way would be via Series[...].array for an ndarray-backed Series. I think we can afford to change this without a warning.


Are we likely to do this for 1.0?

@TomAugspurger
Copy link
Contributor

I don't think this is a blocker for 1.0.

@TomAugspurger TomAugspurger modified the milestones: 1.0, Contributions Welcome Dec 30, 2019
@jbrockmendel
Copy link
Member Author

closed by #30523.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants