-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: additional keys in groupby indices when NAs are present #38861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…oup_indices � Conflicts: � doc/source/whatsnew/v1.3.0.rst
pandas/_libs/lib.pyx
Outdated
if labels[j] != -1: | ||
break | ||
else: | ||
raise ValueError("cannot handle all null values") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, just return result no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do any tests hit this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, returning result is much better than raising. Currently no tests can possibly hit this because of short-circuit logic in get_indexer_dict
(I changed the comment around it, see the diff), I merely added it to be defensive. Doesn't seem advantageous to remove the short-circuit logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return of an empty dictionary is hit by several tests however.
@jreback thanks for the review; responses above. |
start = 0 | ||
cur = labels[0] | ||
for i in range(1, n): | ||
# Start at the first non-null entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess L891 is redundant now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but its fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is, thanks. Removed.
thanks @rhshadrach |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
lib.indices_fast
included logic to skip over null values (-1's), but did not include the case where the first index is null. Currently this function is only used insorting.get_indexer_dict
where the case of all null group_index is handled separately.