Skip to content

BUG: loc casting to float for scalar with MultiIndex df #41374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 25, 2021

Conversation

phofl
Copy link
Member

@phofl phofl commented May 7, 2021

In theory this would solve the issue, but I am not sure if this is desirable. We cast the row to a series, which forces the dtype conversion. If we loop in reverse we retrieve a column as a series which would avoid the conversion. Would you mind having a look @jbrockmendel ?

@phofl phofl added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels May 7, 2021
# has the dim of the obj changed?
# GH 7199
if obj.ndim < current_ndim:
axis -= 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i take it this is unreachable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in theory this is reachable, but does not make sense anymore.

We are counting from the maximum number of dimensions backwards, so even if we reduce the dimension we have already reduced our axis to the new maximum number.

DataFrame example:
axis is 1 -> we are reducing to a series here -> we reduce our axis with one -> we are already at 0 and don't need this case anymore

axis = 0
for key in tup:
# GH#41369 Loop in reverse order to avoid dtype conversion when converting df
# row to a series
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the reason this works is bc this ensures we index along columns before rows, so that in multi-block cases we select only necessary blocks, avoiding the fast_xs/interleave/whatever call that is doing the casting ATM?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you flesh out the comment to that effect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending that, LGTM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jreback
Copy link
Contributor

jreback commented May 10, 2021

can you rebase

� Conflicts:
�	pandas/tests/indexing/multiindex/test_loc.py
@phofl
Copy link
Member Author

phofl commented May 11, 2021

Done

@jreback jreback added this to the 1.3 milestone May 12, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test request. lgtm otherwise, ping on green.

@@ -788,3 +788,12 @@ def test_mi_columns_loc_list_label_order():
columns=MultiIndex.from_tuples([("B", 1), ("B", 2), ("A", 1), ("A", 2)]),
)
tm.assert_frame_equal(result, expected)


def test_loc_get_scalar_casting_to_float():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the .iloc example as well (to assert that its also an int as on master).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@phofl
Copy link
Member Author

phofl commented May 12, 2021

@jreback greenish, failure unrelated

@simonjayhawkins
Copy link
Member

@phofl can you merge master to resolve conflicts

@jreback
Copy link
Contributor

jreback commented May 24, 2021

yeah this looks good (just resolved conflicts)

� Conflicts:
�	pandas/tests/indexing/multiindex/test_loc.py
@phofl
Copy link
Member Author

phofl commented May 24, 2021

merged master

� Conflicts:
�	pandas/tests/indexing/multiindex/test_loc.py
{"a": 1.0, "b": 2}, index=MultiIndex.from_arrays([[3], [4]], names=["c", "d"])
)
result = df.loc[(3, 4), "b"]
assert result == 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test passes on master. 2.0 == 2 is True.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an isinstance check

@simonjayhawkins simonjayhawkins merged commit 4f7da2c into pandas-dev:master May 25, 2021
@simonjayhawkins
Copy link
Member

Thanks @phofl

@phofl phofl deleted the 41369 branch May 25, 2021 14:43
TLouf pushed a commit to TLouf/pandas that referenced this pull request Jun 1, 2021
@@ -872,6 +872,7 @@ Indexing
- Bug in :meth:`DataFrame.__setitem__` and :meth:`DataFrame.iloc.__setitem__` raising ``ValueError`` when trying to index with a row-slice and setting a list as values (:issue:`40440`)
- Bug in :meth:`DataFrame.loc` not raising ``KeyError`` when key was not found in :class:`MultiIndex` when levels contain more values than used (:issue:`41170`)
- Bug in :meth:`DataFrame.loc.__setitem__` when setting-with-expansion incorrectly raising when the index in the expanding axis contains duplicates (:issue:`40096`)
- Bug in :meth:`DataFrame.loc.__getitem__` with :class:`MultiIndex` casting to float when at least one column is from has float dtype and we retrieve a scalar (:issue:`41369`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is from has" typo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thanks, opened #41808

@phofl phofl mentioned this pull request Jun 3, 2021
1 task
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Accessing a cell in a pandas DataFrame with MultiIndex loses type information
4 participants