-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REGR: fix regression in scalar setitem with setting a length-1 array-like #48057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: fix regression in scalar setitem with setting a length-1 array-like #48057
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments
|
||
# TODO For object dtype this happens as well, but should we rather preserve | ||
# the nested data and set as such? | ||
df = DataFrame({"A": [1, 2, 3], "B": np.array([1, "a", "b"], dtype=object)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sound good
tm.assert_frame_equal(df, expected) | ||
|
||
# but for object dtype we preserve the nested data | ||
df = DataFrame({"A": [1, 2, 3], "B": np.array([1, "a", "b"], dtype=object)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do the above, this is correct I think
def test_scalar_setitem_series_with_nested_value_length1(value, indexer_sli): | ||
# For numeric data, assigning length-1 array to scalar position gets unpacked | ||
# TODO this only happens in case of ndarray, should we make this consistent | ||
# for all list-likes? (as happens for DataFrame.(i)loc, see test above) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this should unpack for everything. The DataFrame case already does
@phofl just to be clear: those TODO comments I added are interesting points (and I should open new issues for those), but fixing those are out of scope of this focused regression fix for 1.4.4 (your feedback on those questions was very welcome! just to set expectations that I don't plan to "fix" those todos here ;)). See also the longer explanation in the top post. |
Yep understood this, no worries. Just wanted to coment in code to avoid misunderstandings with the references |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good to merge
This comment was marked as resolved.
This comment was marked as resolved.
Thanks @jorisvandenbossche (failures unrelated) |
…with setting a length-1 array-like
…alar setitem with setting a length-1 array-like) (#48161) Backport PR #48057: REGR: fix regression in scalar setitem with setting a length-1 array-like Co-authored-by: Joris Van den Bossche <[email protected]>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.While testing the various related cases, I noticed that we don't have very consistent behaviour always, see the TODO notes in the tests.
In pandas 1.3, we always unpacked an array-like when setting, also for object dtype. This means that for object dtype, it unpacked it for len-1, and raised an error for longer arrays:
With the change in #42780, in pandas 1.4 we started to preserve the nested object in case of len-1, but still raising for longer arrays (with object dtype column):
This is a bit inconsistent I would say, and I think we should probably long term allow setting nested data of any length in case of object dtype column (the first TODO note).
Because of the above change (no longer unpacking for len-1 array), this started to error for numeric columns (i.e. the actual regression reported in #46268).
What I did in this PR is limiting the change of #42780 to only object dtype, so we keep the new behaviour of preserving the nested object when setting in case object dtype, while fixing the regression for numeric data.
(strictly speaking, we could also say that it is a regression for the object dtype case, but there I think it was actually an improvement in behaviour, and since this was already changed in 1.3.5, and nobody reported it, it seems better to keep that now)
In addition, while testing the same for Series, I noticed that for Series setitem, we only do this unpacking of len-1 array-like values for actual numpy arrays, and not for setting array-likes in general (the second TODO note).
TODO: open new issues to keep track of those TODOs and update those comments