-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Ensure series/frame mode() keeps int index #38732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if data.empty: | ||
data.index = Index([], dtype="int64") | ||
|
||
return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty result case doesn't hit algorithms.mode
, so those changes aren't enough to ensure correct dtype
df = DataFrame([], columns=["a", "b"]) | ||
result = df.mode() | ||
expected = DataFrame([], columns=["a", "b"], index=Index([], dtype="int64")) | ||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't find any DataFrame.mode
testing - guessing it relies on correctness from Series.mode()
testing (in pandas/tests/test_algos.py::TestMode
). However, empty result case doesn't hit Series.mode
, so some DataFrame
testing seems necessary. Is there a better place for this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We keep them in pandas/tests/frame/test_reductions.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks! Sorry for missing that
df = DataFrame([], columns=["a", "b"]) | ||
result = df.mode() | ||
expected = DataFrame([], columns=["a", "b"], index=Index([], dtype="int64")) | ||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We keep them in pandas/tests/frame/test_reductions.py
pandas/core/frame.py
Outdated
data = data.apply(f, axis=axis) | ||
# Ensure index is type stable (should always use int index) | ||
if data.empty: | ||
data.index = Index([], dtype="int64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this right for 32-bit machines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to int
, thanks for catching that
pandas/core/algorithms.py
Outdated
@@ -954,7 +954,9 @@ def mode(values, dropna: bool = True) -> Series: | |||
warn(f"Unable to sort modes: {err}") | |||
|
|||
result = _reconstruct_data(result, original.dtype, original) | |||
return Series(result) | |||
# Ensure index is type stable (should always use int index) | |||
index = None if len(result) else Index([], dtype=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should use ibase.default_index
instead of manual construction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm ping on green
green |
thanks @mzeitlin11 |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff