-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: groupby.shift returns different columns when fill_value is specified #41858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…path to be used with fill_value!=None. The existing test was extracting only the values column and ignoring that the index columns were also returned, which masked the bug reported in the issue
Thanks for the PR. I think that dispatching to We likely need to patch elsewhere in the groupby code to make ensure that the correct structure is returned |
@WillAyd, I think there is a misunderstanding here. I have removed the dispatch to Also, the structure of the result no longer depends on whether you specify I've updated the original PR description to make the changes clearer. |
Yes, it does, @simonjayhawkins . Using the example from #26615, they now run at same speed even when
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. This lgtm
@WillAyd Is there anything else we need to have this merged in? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs tests that reproduce the original issues
release note needs to be update to 1.4
needs review
This PR includes a modification to an existing test which ensures the test recreates the original issue. The modified test fails on master because of the original issue (see below). The test passes with the code changes in this PR. Previously, the test extracted the value column (
Done.
Ready for review, @jreback |
@jreback can you take another look at this? I think it is ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. can you make sure we have an asv for this case (or add one) and show the results. ping on green.
I think testing is ok, though if you can look in the original issue and make sure we have a replicated test.
@jreback please have another look, I think this is good to go. With this PR, the testing now replicates the original issue. I've looked into it. asv is added. Below, you can see that on master, using master
PR
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, minor request for the whatsnew
@smithto1 - friendly ping for resolving conflict and whatsnew request. |
That is addressed. I think this is ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. test coverage question. ping on green.
@@ -55,7 +55,7 @@ def test_group_shift_with_fill_value(): | |||
columns=["Z"], | |||
index=None, | |||
) | |||
result = g.shift(-1, fill_value=0)[["Z"]] | |||
result = g.shift(-1, fill_value=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have sufficient coverage of the tests from the OP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @jreback. I have investigated it.
The original problem was that the grouping columns were improperly returned. The old form of this test would extract the value column ([["Z"]]
), so the test did not detect that the grouping columns were returned. I have modified this test so it no longer extracts the value column; if the grouping columns are returned (the OP) this test will now fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kk great
if u can resolve conflicts and merge master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback conflicts resolved. I think it can be merged.
thanks @smithto1 for the patch and the patience! keep em coming! |
* TST: Fix doctests for pandas.io.formats.style * Modified: pandas/io/formats/style.py * Added some expected results * Skipped some tests * TST: Add link to redirect to Table Visualization user guide * Modified style.py * Updated the doctest of the apply() * Updated the doctest of the applymap() * Updated the doctest of the set_table_styles() * Updated the doctest of the set_properties() * TST: Add image to pipe function result * Modified style.py * Updated the doctest of the pipe() * TST: Remove unnecessary outputs * Modified pandas/io/formats/style.py * Updated the doctests of the set_tooltips() * Updated the doctests of the to_latex() * Updated the doctests of the set_td_classes() * Updated the doctests of the set_table_attributes() * TST: Add the output to the Styler.format doctest in to_latex() * REG: DataFrame.agg where func returns lists and axis=1 (#42762) * Fix typing issues for CI (#42770) * BUG: groupby.shift returns different columns when fill_value is specified (#41858) * PERF: extract_array earlier in DataFrame construction (#42774) * ENH: `sparse_columns` and `sparse_index` added to `Styler.to_html` (#41946) * TYP: Fix typing for searchsorted (#42788) * DOC GH42756 Update documentation for pandas.DataFrame.drop to clarify tuples. (#42789) * CI: Fix doctests (#42790) * REGR: nanosecond timestamp comparisons to OOB datetimes (#42796) * COMPAT: MPL 3.4.0 (#42803) * Delete duplicates and unused code from reshape tests (#42802) * REGR: ValueError raised when both prefix and names are set to None (#42690) * REGR: ValueError raised when both prefix and names are set to None * Update readers.py * whitespace * Update v1.3.1.rst * Update v1.3.2.rst * Update readers.py * Update readers.py Co-authored-by: Jeff Reback <[email protected]> * TST: Add style.py to the doctest check * TST: fixed eng_formatter doctest for #42671 (#42705) * TST: Revert x and y position in some doctests * Updated the doctest of the hide_columns() Co-authored-by: Richard Shadrach <[email protected]> Co-authored-by: Irv Lustig <[email protected]> Co-authored-by: Thomas Smith <[email protected]> Co-authored-by: jbrockmendel <[email protected]> Co-authored-by: attack68 <[email protected]> Co-authored-by: Mike Phung <[email protected]> Co-authored-by: Matthew Zeitlin <[email protected]> Co-authored-by: Thomas Li <[email protected]> Co-authored-by: Patrick Hoefler <[email protected]> Co-authored-by: Jeff Reback <[email protected]> Co-authored-by: Krishna Chivukula <[email protected]>
* TST: Fix doctests for pandas.io.formats.style * Modified: pandas/io/formats/style.py * Added some expected results * Skipped some tests * TST: Add link to redirect to Table Visualization user guide * Modified style.py * Updated the doctest of the apply() * Updated the doctest of the applymap() * Updated the doctest of the set_table_styles() * Updated the doctest of the set_properties() * TST: Add image to pipe function result * Modified style.py * Updated the doctest of the pipe() * TST: Remove unnecessary outputs * Modified pandas/io/formats/style.py * Updated the doctests of the set_tooltips() * Updated the doctests of the to_latex() * Updated the doctests of the set_td_classes() * Updated the doctests of the set_table_attributes() * TST: Add the output to the Styler.format doctest in to_latex() * REG: DataFrame.agg where func returns lists and axis=1 (pandas-dev#42762) * Fix typing issues for CI (pandas-dev#42770) * BUG: groupby.shift returns different columns when fill_value is specified (pandas-dev#41858) * PERF: extract_array earlier in DataFrame construction (pandas-dev#42774) * ENH: `sparse_columns` and `sparse_index` added to `Styler.to_html` (pandas-dev#41946) * TYP: Fix typing for searchsorted (pandas-dev#42788) * DOC GH42756 Update documentation for pandas.DataFrame.drop to clarify tuples. (pandas-dev#42789) * CI: Fix doctests (pandas-dev#42790) * REGR: nanosecond timestamp comparisons to OOB datetimes (pandas-dev#42796) * COMPAT: MPL 3.4.0 (pandas-dev#42803) * Delete duplicates and unused code from reshape tests (pandas-dev#42802) * REGR: ValueError raised when both prefix and names are set to None (pandas-dev#42690) * REGR: ValueError raised when both prefix and names are set to None * Update readers.py * whitespace * Update v1.3.1.rst * Update v1.3.2.rst * Update readers.py * Update readers.py Co-authored-by: Jeff Reback <[email protected]> * TST: Add style.py to the doctest check * TST: fixed eng_formatter doctest for pandas-dev#42671 (pandas-dev#42705) * TST: Revert x and y position in some doctests * Updated the doctest of the hide_columns() Co-authored-by: Richard Shadrach <[email protected]> Co-authored-by: Irv Lustig <[email protected]> Co-authored-by: Thomas Smith <[email protected]> Co-authored-by: jbrockmendel <[email protected]> Co-authored-by: attack68 <[email protected]> Co-authored-by: Mike Phung <[email protected]> Co-authored-by: Matthew Zeitlin <[email protected]> Co-authored-by: Thomas Li <[email protected]> Co-authored-by: Patrick Hoefler <[email protected]> Co-authored-by: Jeff Reback <[email protected]> Co-authored-by: Krishna Chivukula <[email protected]>
fill_value
whenfill_value
is specified #41556This fixes the minimal reproducing examples from the original bug report #41556.
With master, specifying
fill_value
causes the index columns to be returned and to include thefill_value
in the grouping columns.With this PR, only the value columns are returned in both cases, and the fill is applied correctly.
On master, if
fill_value=None
then_get_cythonized_result
was used. But iffill_value
was specified, thenself.apply
was used, because_get_cythonized_result
couldn't take thefill_value
. I've updated_get_cythonized_result
so it can handle thefill_value
itself. This meansGroupby.shift
follows the same code path and returns the same structure of output, whether or notfill_value
is specified.Test
The bug reported in the original issue should have been caught by an existing test, except that the existing test extracts the values columns (
[['Z']]
), before doing its comparison to the expected output. This was hiding the fact that the structure of the result depended onfill_value
being specified.https://github.com/pandas-dev/pandas/blob/1.2.x/pandas/tests/groupby/test_groupby_shift_diff.py#L52
Removing the extraction of the data column (
[['Z']]
) means we now have a test that catches the bug reported in the original issue. (The test fails on master, but passes on this PR).