-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
FIX use selected_obj rather the obj throughout groupby #6570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ah, something subtle is that df.groupby(as_index=False)[col] is a DataFrameGroupby, hence the need for |
This should pass now, will add test iterating over those in 5264 and see what happens. |
# TODO check groupby with > 1 col ? | ||
|
||
methods = ['count', 'corr', 'cummax', 'cummin', | ||
'cumprod', 'describe', 'rank'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixes for these (at least)
Passes travis, and I think this probably fixes for resample and maybe a few others too. Need to add some tests later. Can't get vbench working on this machine, perf change shouldn't be too bad... |
@jreback time's look ok... (tests not that consistent), what do you think?
|
these r fine |
I've included the other tests, as I realised that quite a few other functions use groupyby under the hood, quite a bit of fluctuation here...
|
maybe repeat and see if last 3 show up |
If you would, that'd be great!
|
Here's a patch to fix the perf issue; its that When I run this (even w/o this patch), their are some failing tests....?
|
Thanks, will apply this, and see what stuff is failing tonight... (weird travis wouldn't be picking them up :s ). Will also add some more basic tests for some of the other methods in the checklist. |
I had to rebase off master |
@jreback @TomAugspurger Do you mind taking a look at this PR, I see you put some serious time in before trying to sort out this behaviour so would be great to hear what I'm missing (sorry if you were part way there / w duplicated effort here, I'd missed your previous discussion in #5264 until just now). (I had example here with corr, but think it's not an issue) |
travis is being flaky. |
rebase and I think this is good to go, pls squash a bit too I assume release notes you will do all at once (ok..) |
TST dont ignore subselection in groupby
reordered commits and squished (and passes travis). Will add to release when we close main issue. |
FIX use selected_obj rather the obj throughout groupby
…#44821) Fixes issue pandas-dev#44821. When trying to iterate on a subset of columns in a GroupBy object, it returned all columns, instead of the selected subset. GroupBy.__iter__ used self.obj instead of self._selected_obj (see PR pandas-dev#6570).
fixes many parts of for #5264... Changes lots of
self.obj
toself._selected_obj
.probably should vbench before merging?
cc @jreback @TomAugspurger