-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Groupby.nth includes group key inconsistently #12839 #13316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 84.34%@@ master #13316 diff @@
==========================================
Files 138 138
Lines 51126 51133 +7
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43122 43130 +8
+ Misses 8004 8003 -1
Partials 0 0
|
df = DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B']) | ||
g = df.groupby('A') | ||
g.head() | ||
assert_frame_equal(g.nth(0), df.iloc[[0, 2]].set_index('A')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put blank lines between testing new things
g.head() | ||
result = g.head(n=2) | ||
assert_frame_equal(result, df) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you meant with doing .head()
twice in a row, is this what you are looking for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no I mean
g = df.groupby('A')
result = g.head()
assert_frame_.....
result = g.head()
assert_frame....
e.g. it was 2 of the same calls in a row that was triggering yes? want to have an exact replica of the original failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compare each result as well (e.g. don't just g.nth(0)
compare it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the issue was switching between two methods that either use a group selection (nth
) or don't (head
and tail
). The bug is that once one of these methods are called, _selected_obj
is placed into the cache either with or without the group selection (self.obj[self._group_selection]
or self.obj
), and further retrievals from the cache will retrieve the same object without paying attention to whether the group selection has changed or not. The proposed pull request clears the cache for _selected_obj
whenever the group selection changes. This happens in _set_selection_from_grouper
in nth
, and it also needs to happen at the beginning of head
and tail
(after making sure _group_selection
is None
)
@@ -457,6 +457,11 @@ def _selected_obj(self): | |||
else: | |||
return self.obj[self._selection] | |||
|
|||
def _reset_group_selection(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here (and to _set_selection
) to what these are doing
@adneu ok just some minor comments. ping when updated / green. |
@adneu Do you have some time to update this? |
@adneu ok only minor changes. can you update |
Sorry for the delay. I added some docstrings and tests based on your comments. Pls check out the latest commit and let me know if ok. |
Small changes Added tests
@@ -457,8 +457,21 @@ def _selected_obj(self): | |||
else: | |||
return self.obj[self._selection] | |||
|
|||
def _set_selection_from_grouper(self): | |||
""" we may need create a selection if we have non-level groupers """ | |||
def _clear_group_selection(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this _reset_group_selection
. sorry to be a pain about the names, but naming is very important for consistency.
naming change but otherwise lgtm. ping when green. |
@adneu Looks good! |
@jreback good to go? |
thanks @adneu |
closes pandas-dev#12839 Author: adneu <[email protected]> Closes pandas-dev#13316 from adneu/12839 and squashes the following commits: 16f5cd3 [adneu] Name change ac1851a [adneu] Added docstrings/comments, and new tests. 4d73cbf [adneu] Updated tests 9b75df4 [adneu] BUG: Groupby.nth includes group key inconsistently pandas-dev#12839
git diff upstream/master | flake8 --diff
When the group selection changes, the cache for
_selected_obj
needs to be reset so thatnth
,head
, andtail
can return consistent results.