-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fixed segfaults and incorrect results in GroupBy.quantile with NA Values in Grouping #29173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
cd9d872
ff6fe6a
50e7091
69ca2b4
b9a89cd
c8b6675
57889db
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1373,6 +1373,29 @@ def test_quantile_out_of_bounds_q_raises(): | |
g.quantile(-1) | ||
|
||
|
||
def test_quantile_missing_group_values_no_segfaults(): | ||
# GH 28662 | ||
data = np.array([1.0, np.nan, 1.0]) | ||
df = pd.DataFrame(dict(key=data, val=range(3))) | ||
|
||
# Random segfaults; would have been guaranteed in loop | ||
grp = df.groupby("key") | ||
for _ in range(100): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i take it this is doesn't reliably segfault on the first try? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... or i could just read the comment three lines up. never mind |
||
grp.quantile() | ||
|
||
|
||
def test_quantile_missing_group_values_correct_results(): | ||
# GH 28662 | ||
data = np.array([1.0, np.nan, 3.0, np.nan]) | ||
df = pd.DataFrame(dict(key=data, val=range(4))) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. any reason not to re-use the setup from the other test? could just assert the result after the loop? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Different frame shapes. I think segfaults were occurring with oddly sized frames and bad results with evenly sized. Might be some other conflating factor |
||
|
||
result = df.groupby("key").quantile() | ||
expected = pd.DataFrame( | ||
[1.0, 3.0], index=pd.Index([1.0, 3.0], name="key"), columns=["val"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the expected should be [0, 2], but also on 0.24.2 the columns index name was the q value. not sure if this is important.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Keep in mind that this is a groupby and each element belongs to its own group. [1, 3] is essentially the identity of the non-NA groupings and should be correct w.r.t. the column index name that probably goes back to https://github.com/pandas-dev/pandas/pull/20405/files#r208366338 which was an inconsistency in 0.24.2 |
||
) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
# pipe | ||
# -------------------------------- | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have removed this before merging; will clean up after backport