Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ENH: A new GroupBy method to slice rows preserving index and order #42947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: A new GroupBy method to slice rows preserving index and order #42947
Changes from 46 commits
72fd66d
d0ebbeb
33d7992
78e9ced
4d098cd
f84c365
d937757
e206912
1788f1b
f6977fa
bca4fdd
66536b1
d075c67
df1a767
f2e9f79
a9f9848
e42c86d
bab88c9
a74bd33
c77de1d
0d750bb
e952c25
2a6aafc
b7f8bfe
86e0c2e
6f75502
8de5ff2
f51fa88
3063f3a
4228251
41b1c73
ce36210
70dcdb5
c024e41
8abcac3
add5727
bcd1dd9
25459f7
fa6b86c
fefbacf
fa9f7e3
b589420
89deee3
e28cdfb
424ab14
258530d
d49e48f
1dd6258
f84f5c0
c068162
536298e
0e73278
4cfde7b
6343c9f
33a2225
6ca80c2
e94d4a8
0d91dca
acc3993
df52694
f42ae41
898fad4
ffaaf25
02ec03c
0691f99
7cad2c0
44120e1
88b8ac5
0ee53cd
945a482
9412e3e
138b791
6b29c82
ea45bc6
179912e
19edf00
94f6e99
ae21059
5b8142b
c8e0950
6ce90c4
4f6cbe1
7d92c79
19b21bb
1a055e4
ca164cf
337b15c
69d8956
cecc674
98a9460
95eb548
4c8644b
d0e9aa0
10cca16
9ccebf1
a3db969
4c4ba92
13ff29f
acf67b1
a3db6d1
ee8a86b
f4b24b0
82360f5
ba836dc
8abcad7
86c8e20
ee33df0
4a1aac9
d9671a6
a6dbc61
f65093c
534ea54
511c8fd
97c3ac0
90a4cb8
b58b235
f5ed6bf
21b3637
88613a9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are you importing this? (when this file itself is imported by group)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah this looks like a circular import that is going to be really fragile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only did this as a mixin because I was copying the style of df.iloc. Which does not attempt to type itself.
In fact, I do need access to GroupBy attributes, so I had to use a cast to get it past the tests.
The simple solution would be to abandon trying to statically type this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this is just for typing, so if you put it in a TYPE_CHECKING block will be ok (e.g .as the groupby.GroupBy should be separated out to avoid deps)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will wait until my
group_selection_context
change is tested and reviewed before sorting out this import and the type checking.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It tested OK, so I have added the conditional type checking.;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its ok and preferred to use capitalcase, e.g. RowsGroupByIndexer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Unfortunately, I need groupby because I need
groupby.group_selection_context(self.groupby_object)
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Reback @jbrockmendel
OK, I have tested making
group_selection_context
a method of the GroupBy class, rather than a module scoped method. As far as I can see everything works OK. But it is a bit drastic. My little project seems to be creeping in scope...Unless you say otherwise, I will make that change. Of course, it effects the entire system, but I suppose we could always back it out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've committed it. Fingers crossed...
Now it has passed all the tests.
My only reservation is that
groupby.group_selection_context()
is sort-of public, if undocumented. So this would break anyone's application code that used it. Do we care? You tell me!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no we don't care, this is an internal routine (its not on a groupby rather a grouper which is private)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the name here be more obvious? i had to double-check that this didnt correspond to
grouped.obj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really struggled with that! I have renamed it groupByObject, which hopefully sums it up...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it is renamed
groupby_object
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is the reason you need
groupby
then we have to move that somewhere else or the imports are convoluted.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also needed it in line 35
return _rowsGroupByIndexer(cast(groupby.GroupBy, self))
The class is a mixin but I had to cast it to the "mixed-in" class since I needed access to GroupBy attributes and methods.
I copied the code style from DataFrame.iloc, but since that does not use types, so does not have the problem:
class _iLocIndexer(_LocationIndexer):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the
cast
bits you can put the import inside aif TYPE_CHECKING:
blockThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a weird extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want me to get rid of all the blank lines in if-elif-else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does _selected_obj vs. _obj_with_exclusions make a difference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. This is the method used by head, tail and nth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no
_obj_with_exclusions
on groupby objects._obj_with_exclusions
and_selected_obj
are defined on frames and series, but just_selected_obj
for groupby.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
produces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. It comes in from SelectionMixin and seems to lose the grouped column.
But I think that _selected_obj gives the best behaviour. Especially to emulate head and tail, that essentially give subsets of the original df.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't understand. All my if-then-elses have blank lines. Do you want them all removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed most of the blank lines before elif and else