groupby(group_keys=True) ignored when apply returns unsliced data #8467

kay1793 · 2014-10-05T05:07:32Z

I ran into this unexplained behaviour with groupby when using group_keys=True (the default),
it's not clear why using x vs. x[:] causes the group_keys argument to be ignored.

In [86]: df = DataFrame({'key': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    ...:                 'value': range(9)})
    ...: df
Out[86]: 
   key  value
0    1      0
1    1      1
2    1      2
3    2      3
4    2      4
5    2      5
6    3      6
7    3      7
8    3      8

In [87]: df.groupby('key', group_keys=True).apply(lambda x: x[:].key)
Out[87]: 
key   
1    0    1
     1    1
     2    1
2    3    2
     4    2
     5    2
3    6    3
     7    3
     8    3
Name: key, dtype: int64

In [88]: df.groupby('key', group_keys=True).apply(lambda x: x.key)
Out[88]: 
0    1
1    1
2    1
3    2
4    2
5    2
6    3
7    3
8    3
Name: key, dtype: int64

The text was updated successfully, but these errors were encountered:

jreback · 2014-10-05T13:57:57Z

has nothing to do with group_keys (which is a very odd option anyhow).

Has to do with whether you are returning something that is exactly identical or not

x[:].key is NOT identical to x.key. They are equal, but the 2nd is the exact object, while the first is a copy. These aggregate differently because that's how groupby determines mutation. (e.g. even though you didn't actually mutate, it looks like you are).

Impossible to disambiguate. Nor even sure why you would.

What are you actually trying to do?

kay1793 · 2014-10-05T18:07:52Z

Actually. I asked why x cased doesn't behave like what you call the mutated case, not the other way round.
Also, I'm returning a series when I get in a dataframe in the apply function. How is x[:2] "mutation"
while x.foo is not?
If it's impossible to "disambiguate" when the return types are different, well. I don't know what you mean.

The docstring for group_keys is:

group_keys : boolean, default True
    When calling apply, add group keys to index to identify pieces

What I'm trying to do is to get the group keys included in the index, when I call apply.

I already have a workaround, It just seemed like a bug. If you're sure this is all just fine, feel free to close.

jreback · 2014-10-06T13:20:38Z

@kay1793 ok this is a bug, fixed by #8484

the mutation issue was a red herring (and shouldn't have affected the results)

kay1793 · 2014-10-06T22:54:57Z

👍 @jreback

This went from impossible and useless to bug fixed in record time, my whiplash says to tell you: Thanks!

jreback · 2014-10-06T22:56:15Z

hahah

np

I may argue if their is a bug
but when detected they get squashed :)

jreback added Groupby Usage Question labels Oct 5, 2014

jreback added this to the 0.15.0 milestone Oct 6, 2014

jreback mentioned this issue Oct 6, 2014

BUG: Bug in groupby .apply with a non-affecting mutation in the function (GH8467) #8484

Merged

jreback closed this as completed in #8484 Oct 6, 2014

jreback mentioned this issue Oct 29, 2014

updating row data within apply from other rows doesn't work #8662

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupby(group_keys=True) ignored when apply returns unsliced data #8467

groupby(group_keys=True) ignored when apply returns unsliced data #8467

kay1793 commented Oct 5, 2014

jreback commented Oct 5, 2014

kay1793 commented Oct 5, 2014

jreback commented Oct 6, 2014

kay1793 commented Oct 6, 2014

jreback commented Oct 6, 2014

groupby(group_keys=True) ignored when apply returns unsliced data #8467

groupby(group_keys=True) ignored when apply returns unsliced data #8467

Comments

kay1793 commented Oct 5, 2014

jreback commented Oct 5, 2014

kay1793 commented Oct 5, 2014

jreback commented Oct 6, 2014

kay1793 commented Oct 6, 2014

jreback commented Oct 6, 2014