-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: loc casting to float for scalar with MultiIndex df #41374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# has the dim of the obj changed? | ||
# GH 7199 | ||
if obj.ndim < current_ndim: | ||
axis -= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i take it this is unreachable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, in theory this is reachable, but does not make sense anymore.
We are counting from the maximum number of dimensions backwards, so even if we reduce the dimension we have already reduced our axis to the new maximum number.
DataFrame example:
axis is 1 -> we are reducing to a series here -> we reduce our axis with one -> we are already at 0 and don't need this case anymore
pandas/core/indexing.py
Outdated
axis = 0 | ||
for key in tup: | ||
# GH#41369 Loop in reverse order to avoid dtype conversion when converting df | ||
# row to a series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the reason this works is bc this ensures we index along columns before rows, so that in multi-block cases we select only necessary blocks, avoiding the fast_xs/interleave/whatever call that is doing the casting ATM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you flesh out the comment to that effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pending that, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
can you rebase |
� Conflicts: � pandas/tests/indexing/multiindex/test_loc.py
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test request. lgtm otherwise, ping on green.
@@ -788,3 +788,12 @@ def test_mi_columns_loc_list_label_order(): | |||
columns=MultiIndex.from_tuples([("B", 1), ("B", 2), ("A", 1), ("A", 2)]), | |||
) | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
|
|||
def test_loc_get_scalar_casting_to_float(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the .iloc example as well (to assert that its also an int as on master).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@jreback greenish, failure unrelated |
@phofl can you merge master to resolve conflicts |
yeah this looks good (just resolved conflicts) |
� Conflicts: � pandas/tests/indexing/multiindex/test_loc.py
merged master |
� Conflicts: � pandas/tests/indexing/multiindex/test_loc.py
{"a": 1.0, "b": 2}, index=MultiIndex.from_arrays([[3], [4]], names=["c", "d"]) | ||
) | ||
result = df.loc[(3, 4), "b"] | ||
assert result == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test passes on master. 2.0 == 2 is True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an isinstance check
Thanks @phofl |
@@ -872,6 +872,7 @@ Indexing | |||
- Bug in :meth:`DataFrame.__setitem__` and :meth:`DataFrame.iloc.__setitem__` raising ``ValueError`` when trying to index with a row-slice and setting a list as values (:issue:`40440`) | |||
- Bug in :meth:`DataFrame.loc` not raising ``KeyError`` when key was not found in :class:`MultiIndex` when levels contain more values than used (:issue:`41170`) | |||
- Bug in :meth:`DataFrame.loc.__setitem__` when setting-with-expansion incorrectly raising when the index in the expanding axis contains duplicates (:issue:`40096`) | |||
- Bug in :meth:`DataFrame.loc.__getitem__` with :class:`MultiIndex` casting to float when at least one column is from has float dtype and we retrieve a scalar (:issue:`41369`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"is from has" typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thanks, opened #41808
In theory this would solve the issue, but I am not sure if this is desirable. We cast the row to a series, which forces the dtype conversion. If we loop in reverse we retrieve a column as a series which would avoid the conversion. Would you mind having a look @jbrockmendel ?