Skip to content

BUG: loc.setitem raising when expanding empty frame with array value #50065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -752,6 +752,7 @@ Indexing
- Bug in :meth:`DataFrame.loc` coercing dtypes when setting values with a list indexer (:issue:`49159`)
- Bug in :meth:`DataFrame.loc` raising ``ValueError`` with ``bool`` indexer and :class:`MultiIndex` (:issue:`47687`)
- Bug in :meth:`DataFrame.__setitem__` raising ``ValueError`` when right hand side is :class:`DataFrame` with :class:`MultiIndex` columns (:issue:`49121`)
- Bug in :meth:`DataFrame.loc` when expanding an empty DataFrame and setting an array-like value (:issue:`49972`)
- Bug in :meth:`DataFrame.reindex` casting dtype to ``object`` when :class:`DataFrame` has single extension array column when re-indexing ``columns`` and ``index`` (:issue:`48190`)
- Bug in :func:`~DataFrame.describe` when formatting percentiles in the resulting index showed more decimals than needed (:issue:`46362`)
- Bug in :meth:`DataFrame.compare` does not recognize differences when comparing ``NA`` with value in nullable dtypes (:issue:`48939`)
Expand Down
4 changes: 3 additions & 1 deletion pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1741,7 +1741,9 @@ def _setitem_with_indexer(self, indexer, value, name: str = "iloc"):
arr = extract_array(value, extract_numpy=True)
taker = -1 * np.ones(len(self.obj), dtype=np.intp)
empty_value = algos.take_nd(arr, taker)
if not isinstance(value, ABCSeries):
if not isinstance(value, ABCSeries) and not isinstance(
indexer[0], dict
):
# if not Series (in which case we need to align),
# we can short-circuit
empty_value[indexer[0]] = arr
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/frame/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1737,6 +1737,13 @@ def test_getitem_preserve_object_index_with_dates(self, indexer):

assert ser.index.dtype == object

def test_loc_setitem_empty_frame_expansion(self):
# GH#49972
result = DataFrame()
result.loc[0, 0] = np.asarray([0])
expected = DataFrame({0: [0.0]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So expected.iloc[0] will return a list and not necessarily the np.array? i.e. the ndarray is coerced to a list and then stored?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think expected.iloc[0, 0] here is a scalar, which is probably 1) what the suer in 49972 wanted and 2) technically incorrect. i'd expect 1-D ndarray containing a zero

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm in principal I agree, but this is consistent now with

df = DataFrame({0: [0]})
df.loc[1, 0] = np.asarray([0])

and

df = DataFrame({0: [0]})
df.loc[0, 1] = np.asarray([0])

which are similar expanding cases. This points to the bug being somewhere else regarding scalar vs array

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats reasonable. maybe let's see what it would take to fix the two cases you mention here and whether those are isolate-ish sketchy cases before adding another one?

if moving forward with this, should add a comment in test/code about living with technically-wrong behavior

Copy link
Member Author

@phofl phofl Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a general inconsistency that might be expected? Not Sure.

If you have a column with a numeric dtype

df = DataFrame({0: [0]})

then

df.loc[1, 0] = np.asarray([0])

unpacks and sets a scalar. If the column has object dtype,

df = DataFrame({0: ["a"]})

we set the array. Is this expected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected?

uhh maybe? with object-dtype it is more plausible that the user is intentionally trying to set an array as a value.

tm.assert_frame_equal(result, expected)

def test_loc_on_multiindex_one_level(self):
# GH#45779
df = DataFrame(
Expand Down