REGR: fix regression in scalar setitem with setting a length-1 array-like #48057

jorisvandenbossche · 2022-08-12T13:22:48Z

closes REGR: ValueError: setting an array element with a sequence. when assigning an array.array using .loc #46268
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

While testing the various related cases, I noticed that we don't have very consistent behaviour always, see the TODO notes in the tests.
In pandas 1.3, we always unpacked an array-like when setting, also for object dtype. This means that for object dtype, it unpacked it for len-1, and raised an error for longer arrays:

In [31]: df = pd.DataFrame({"A": [1, 2, 3], "B": np.array([1, 'a', 'b'], dtype=object)})

In [32]: df.loc[0, "B"] = np.array([0])

In [33]: df
Out[33]: 
   A  B
0  1  0
1  2  a
2  3  b

In [34]: df.loc[0, "B"] = np.array([0, 1])
...
ValueError: Must have equal len keys and value when setting with an iterable

With the change in #42780, in pandas 1.4 we started to preserve the nested object in case of len-1, but still raising for longer arrays (with object dtype column):

In [24]: df = pd.DataFrame({"A": [1, 2, 3], "B": np.array([1, 'a', 'b'], dtype=object)})

In [25]: df.loc[0, "B"] = np.array([0])

In [26]: df
Out[26]: 
   A    B
0  1  [0]   # <-----
1  2    a
2  3    b

In [27]: df.loc[1, "B"] = np.array([0, 1])
...
ValueError: Must have equal len keys and value when setting with an iterable

This is a bit inconsistent I would say, and I think we should probably long term allow setting nested data of any length in case of object dtype column (the first TODO note).

Because of the above change (no longer unpacking for len-1 array), this started to error for numeric columns (i.e. the actual regression reported in #46268).

What I did in this PR is limiting the change of #42780 to only object dtype, so we keep the new behaviour of preserving the nested object when setting in case object dtype, while fixing the regression for numeric data.
(strictly speaking, we could also say that it is a regression for the object dtype case, but there I think it was actually an improvement in behaviour, and since this was already changed in 1.3.5, and nobody reported it, it seems better to keep that now)

In addition, while testing the same for Series, I noticed that for Series setitem, we only do this unpacking of len-1 array-like values for actual numpy arrays, and not for setting array-likes in general (the second TODO note).

TODO: open new issues to keep track of those TODOs and update those comments

…like

phofl

Some comments

phofl · 2022-08-12T15:56:48Z

pandas/tests/indexing/test_indexing.py

+
+    # TODO For object dtype this happens as well, but should we rather preserve
+    # the nested data and set as such?
+    df = DataFrame({"A": [1, 2, 3], "B": np.array([1, "a", "b"], dtype=object)})


I think this sound good

phofl · 2022-08-12T15:57:10Z

pandas/tests/indexing/test_indexing.py

+    tm.assert_frame_equal(df, expected)
+
+    # but for object dtype we preserve the nested data
+    df = DataFrame({"A": [1, 2, 3], "B": np.array([1, "a", "b"], dtype=object)})


If we do the above, this is correct I think

phofl · 2022-08-12T15:59:35Z

pandas/tests/indexing/test_indexing.py

+def test_scalar_setitem_series_with_nested_value_length1(value, indexer_sli):
+    # For numeric data, assigning length-1 array to scalar position gets unpacked
+    # TODO this only happens in case of ndarray, should we make this consistent
+    # for all list-likes? (as happens for DataFrame.(i)loc, see test above)


Yes, I think this should unpack for everything. The DataFrame case already does

jorisvandenbossche · 2022-08-12T16:08:57Z

@phofl just to be clear: those TODO comments I added are interesting points (and I should open new issues for those), but fixing those are out of scope of this focused regression fix for 1.4.4 (your feedback on those questions was very welcome! just to set expectations that I don't plan to "fix" those todos here ;)). See also the longer explanation in the top post.

phofl · 2022-08-12T16:12:35Z

Yep understood this, no worries. Just wanted to coment in code to avoid misunderstandings with the references

phofl

I think this is good to merge

…-with-nested

mroeschke · 2022-08-19T17:26:30Z

Thanks @jorisvandenbossche (failures unrelated)

…with setting a length-1 array-like

…alar setitem with setting a length-1 array-like) (#48161) Backport PR #48057: REGR: fix regression in scalar setitem with setting a length-1 array-like Co-authored-by: Joris Van den Bossche <[email protected]>

…like (pandas-dev#48057)

REGR: fix regression in scalar setitem with setting a length-1 array-…

67c4e63

…like

jorisvandenbossche added Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). labels Aug 12, 2022

jorisvandenbossche added this to the 1.4.4 milestone Aug 12, 2022

jorisvandenbossche mentioned this pull request Aug 12, 2022

REGR: ValueError: setting an array element with a sequence. when assigning an array.array using .loc #46268

Closed

3 tasks

phofl reviewed Aug 12, 2022

View reviewed changes

phofl approved these changes Aug 18, 2022

View reviewed changes

Merge remote-tracking branch 'upstream/main' into regr-setitem-scalar…

1046fa9

…-with-nested

mroeschke approved these changes Aug 19, 2022

View reviewed changes

mroeschke merged commit 73e0b5e into pandas-dev:main Aug 19, 2022

This comment was marked as resolved.

Sign in to view

lumberbot-app bot added the Still Needs Manual Backport label Aug 19, 2022

mroeschke pushed a commit to mroeschke/pandas that referenced this pull request Aug 19, 2022

Backport PR pandas-dev#48057: REGR: fix regression in scalar setitem …

67be0da

…with setting a length-1 array-like

mroeschke mentioned this pull request Aug 19, 2022

Manual Backport PR #48057 on branch 1.4.x (REGR: fix regression in scalar setitem with setting a length-1 array-like) #48161

Merged

3 tasks

jorisvandenbossche deleted the regr-setitem-scalar-with-nested branch August 20, 2022 06:51

simonjayhawkins removed the Still Needs Manual Backport label Aug 22, 2022

noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022

REGR: fix regression in scalar setitem with setting a length-1 array-…

db1a172

…like (pandas-dev#48057)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: fix regression in scalar setitem with setting a length-1 array-like #48057

REGR: fix regression in scalar setitem with setting a length-1 array-like #48057

jorisvandenbossche commented Aug 12, 2022 •

edited

Loading

phofl left a comment

phofl Aug 12, 2022

phofl Aug 12, 2022

phofl Aug 12, 2022

jorisvandenbossche commented Aug 12, 2022

phofl commented Aug 12, 2022

phofl left a comment

This comment was marked as resolved.

mroeschke commented Aug 19, 2022

REGR: fix regression in scalar setitem with setting a length-1 array-like #48057

REGR: fix regression in scalar setitem with setting a length-1 array-like #48057

Conversation

jorisvandenbossche commented Aug 12, 2022 • edited Loading

phofl left a comment

Choose a reason for hiding this comment

phofl Aug 12, 2022

Choose a reason for hiding this comment

phofl Aug 12, 2022

Choose a reason for hiding this comment

phofl Aug 12, 2022

Choose a reason for hiding this comment

jorisvandenbossche commented Aug 12, 2022

phofl commented Aug 12, 2022

phofl left a comment

Choose a reason for hiding this comment

This comment was marked as resolved.

mroeschke commented Aug 19, 2022

jorisvandenbossche commented Aug 12, 2022 •

edited

Loading