Skip to content

BUG: loc casting to float for scalar with MultiIndex df #41374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 25, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -872,6 +872,7 @@ Indexing
- Bug in :meth:`DataFrame.__setitem__` and :meth:`DataFrame.iloc.__setitem__` raising ``ValueError`` when trying to index with a row-slice and setting a list as values (:issue:`40440`)
- Bug in :meth:`DataFrame.loc` not raising ``KeyError`` when key was not found in :class:`MultiIndex` when levels contain more values than used (:issue:`41170`)
- Bug in :meth:`DataFrame.loc.__setitem__` when setting-with-expansion incorrectly raising when the index in the expanding axis contains duplicates (:issue:`40096`)
- Bug in :meth:`DataFrame.loc.__getitem__` with :class:`MultiIndex` casting to float when at least one column is from has float dtype and we retrieve a scalar (:issue:`41369`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is from has" typo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thanks, opened #41808

- Bug in :meth:`DataFrame.loc` incorrectly matching non-boolean index elements (:issue:`20432`)
- Bug in :meth:`Series.__delitem__` with ``ExtensionDtype`` incorrectly casting to ``ndarray`` (:issue:`40386`)
- Bug in :meth:`DataFrame.__setitem__` raising ``TypeError`` when using a str subclass as the column name with a :class:`DatetimeIndex` (:issue:`37366`)
Expand Down
16 changes: 6 additions & 10 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -886,26 +886,22 @@ def _getitem_nested_tuple(self, tup: tuple):
# handle the multi-axis by taking sections and reducing
# this is iterative
obj = self.obj
axis = 0
for key in tup:
# GH#41369 Loop in reverse order ensures indexing along columns before rows
# which selects only necessary blocks which avoids dtype conversion if possible
axis = len(tup) - 1
for key in tup[::-1]:

if com.is_null_slice(key):
axis += 1
axis -= 1
continue

current_ndim = obj.ndim
obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
axis += 1
axis -= 1

# if we have a scalar, we are done
if is_scalar(obj) or not hasattr(obj, "ndim"):
break

# has the dim of the obj changed?
# GH 7199
if obj.ndim < current_ndim:
axis -= 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i take it this is unreachable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in theory this is reachable, but does not make sense anymore.

We are counting from the maximum number of dimensions backwards, so even if we reduce the dimension we have already reduced our axis to the new maximum number.

DataFrame example:
axis is 1 -> we are reducing to a series here -> we reduce our axis with one -> we are already at 0 and don't need this case anymore


return obj

def _convert_to_indexer(self, key, axis: int, is_setter: bool = False):
Expand Down
13 changes: 13 additions & 0 deletions pandas/tests/indexing/multiindex/test_loc.py
Original file line number Diff line number Diff line change
Expand Up @@ -831,3 +831,16 @@ def test_mi_add_cell_missing_row_non_unique():
columns=MultiIndex.from_product([[1, 2], ["A", "B"]]),
)
tm.assert_frame_equal(result, expected)


def test_loc_get_scalar_casting_to_float():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the .iloc example as well (to assert that its also an int as on master).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# GH#41369
df = DataFrame(
{"a": 1.0, "b": 2}, index=MultiIndex.from_arrays([[3], [4]], names=["c", "d"])
)
result = df.loc[(3, 4), "b"]
assert result == 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test passes on master. 2.0 == 2 is True.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an isinstance check

assert isinstance(result, np.int64)
result = df.loc[[(3, 4)], "b"].iloc[0]
assert result == 2
assert isinstance(result, np.int64)