Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ENH/BUG: Use Kleene logic for groupby any/all #40819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH/BUG: Use Kleene logic for groupby any/all #40819
Changes from 22 commits
088ca14
2554921
9a8f9c9
5ca9c4b
68fd995
6530491
26146c2
20f475d
924b38e
423f43f
b1408ac
47ef037
4415060
bb04c1c
9c90886
ef3fbe2
f4c8a8a
1c3cb7d
7cbf85b
809b8a4
58fd33a
80a65bb
c9b9d5f
7514568
740ad7b
a116bed
8a428d4
8e3c5be
b627618
23b3b64
a30496c
3051a99
4cd2833
98cd401
a92c637
7c5c8e6
c66d1fd
0950234
c81c1a5
d2b8ad0
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm is this what we do elsewhere to encode Kleene logic? (e.g. would rather use -1 on nulls)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other algo is a bit different since it can be vectorized and computes both data and mask (so doesn't need to store something signifying NULL). Agree that -1 is more intuitive, have changed to use -1 as the mask signal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, it might be interesting to look into directly filling in the mask as well instead of using the temporary -1 (but not for this PR to be clear ;)).
Personally, I would maybe leave it at using 2, so you don't need to change the uint8 type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I don't think that would be too bad, that's how I had approached it originally, but changes were slightly more invasive, so stuck with this.
For the -1 vs 2 question, I think I slightly prefer -1 because it seems more immediately apparent what it represents. Although it forces changing the dtype, still the same number of bytes, so memory layout is unaffected and constructing the view looks equivalent speed-wise:
But not a strong opinion, fine with using 2 as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If speed-wise the int8 vs uint8 doesn't matter, then indeed the -1 is fine as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can import both from
from pandas.core.arrays import ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this set to always be False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had originally used
masked
being included in kwargs as a signal for whether to passmasked
into the groupby func. But the original value here wasn't relevant since it gets set before being used.This is definitely confusing and a poor solution, changed to instead use a positional arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the update looks clearer to follow now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you maybe also write out what that results in practice? So [skipna=False/any, skipna=False / all], [skipna=True / any, skipna=True / all], if I understand correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep have changed, that is a much clearer way to describe it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this expected is really hard to parse can you do in multiple steps / make simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressing your comment above makes this simpler, let me know if you think any piece is still confusing