Skip to content

groupby(group_keys=True) ignored when apply returns unsliced data #8467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kay1793 opened this issue Oct 5, 2014 · 5 comments · Fixed by #8484
Closed

groupby(group_keys=True) ignored when apply returns unsliced data #8467

kay1793 opened this issue Oct 5, 2014 · 5 comments · Fixed by #8484

Comments

@kay1793
Copy link

kay1793 commented Oct 5, 2014

I ran into this unexplained behaviour with groupby when using group_keys=True (the default),
it's not clear why using x vs. x[:] causes the group_keys argument to be ignored.

In [86]: df = DataFrame({'key': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    ...:                 'value': range(9)})
    ...: df
Out[86]: 
   key  value
0    1      0
1    1      1
2    1      2
3    2      3
4    2      4
5    2      5
6    3      6
7    3      7
8    3      8

In [87]: df.groupby('key', group_keys=True).apply(lambda x: x[:].key)
Out[87]: 
key   
1    0    1
     1    1
     2    1
2    3    2
     4    2
     5    2
3    6    3
     7    3
     8    3
Name: key, dtype: int64

In [88]: df.groupby('key', group_keys=True).apply(lambda x: x.key)
Out[88]: 
0    1
1    1
2    1
3    2
4    2
5    2
6    3
7    3
8    3
Name: key, dtype: int64
@jreback
Copy link
Contributor

jreback commented Oct 5, 2014

has nothing to do with group_keys (which is a very odd option anyhow).

Has to do with whether you are returning something that is exactly identical or not

x[:].key is NOT identical to x.key. They are equal, but the 2nd is the exact object, while the first is a copy. These aggregate differently because that's how groupby determines mutation. (e.g. even though you didn't actually mutate, it looks like you are).

Impossible to disambiguate. Nor even sure why you would.

What are you actually trying to do?

@kay1793
Copy link
Author

kay1793 commented Oct 5, 2014

Actually. I asked why x cased doesn't behave like what you call the mutated case, not the other way round.
Also, I'm returning a series when I get in a dataframe in the apply function. How is x[:2] "mutation"
while x.foo is not?
If it's impossible to "disambiguate" when the return types are different, well. I don't know what you mean.

The docstring for group_keys is:

group_keys : boolean, default True
    When calling apply, add group keys to index to identify pieces

What I'm trying to do is to get the group keys included in the index, when I call apply.

I already have a workaround, It just seemed like a bug. If you're sure this is all just fine, feel free to close.

@jreback
Copy link
Contributor

jreback commented Oct 6, 2014

@kay1793 ok this is a bug, fixed by #8484

the mutation issue was a red herring (and shouldn't have affected the results)

@kay1793
Copy link
Author

kay1793 commented Oct 6, 2014

👍 @jreback

This went from impossible and useless to bug fixed in record time, my whiplash says to tell you: Thanks!

@jreback
Copy link
Contributor

jreback commented Oct 6, 2014

hahah

np

I may argue if their is a bug
but when detected they get squashed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants