ENH: groupby().apply(f) accepts combine=0 arg, to return results unmolested #3241 #3242

ghost · 2013-04-02T17:42:34Z

Right now:

In [16]: df=mkdf(10,2,data_gen_f=lambda x,y: randint(1,10))
    ...: df
    ...: 
    ...: 
    ...: 
Out[16]: 
C0       C_l0_g0  C_l0_g1
R0                       
R_l0_g0        9        1
R_l0_g1        3        7
R_l0_g2        8        1
R_l0_g3        4        3
R_l0_g4        5        3
R_l0_g5        7        2
R_l0_g6        4        1
R_l0_g7        5        4
R_l0_g8        9        7
R_l0_g9        4        8

In [17]: def f1(g):
    ...:     return g.sort('C_l0_g0')
    ...: # group on the suffix of the running index 
    ...: g=df.groupby(lambda key: int(key.split("g")[-1]) >= 5)
    ...: r=g.apply(f1)
    ...: 

# we want to return each group dataframe sorted, but we get concatted against our will
In [18]: r
Out[18]: 
C0             C_l0_g0  C_l0_g1
      R0                       
False R_l0_g1        3        7
      R_l0_g3        4        3
      R_l0_g4        5        3
      R_l0_g2        8        1
      R_l0_g0        9        1
True  R_l0_g6        4        1
      R_l0_g9        4        8
      R_l0_g7        5        4
      R_l0_g5        7        2
      R_l0_g8        9        7

# what we want really, is a couple of sorted dataframes:

In [20]: map(lambda r: r[1].sort('C_l0_g0'),g)
Out[20]: 
[C0       C_l0_g0  C_l0_g1
R0                       
R_l0_g1        3        7
R_l0_g3        4        3
R_l0_g4        5        3
R_l0_g2        8        1
R_l0_g0        9        1,
 C0       C_l0_g0  C_l0_g1
R0                       
R_l0_g6        4        1
R_l0_g9        4        8
R_l0_g7        5        4
R_l0_g5        7        2
R_l0_g8        9        7]

With this PR:

In [21]: def f1(g): # same f1 as above
    ...:     return g.sort('C_l0_g0')
    ...: def f2(g,raw=None):
    ...:     return g.sort('C_l0_g0')
    ...: def f3(g,**kwds):
    ...:     return g.sort('C_l0_g0')
    ...: # the  `raw` keyword is the new bit
    ...: r1=g.apply(f1,raw=True)
    ...: r2=g.apply(f2,raw=True)
    ...: r3=g.apply(f2,raw=True)
    ...: 
# a bunch of sorted frames
In [22]: print r1
[(False, C0       C_l0_g0  C_l0_g1
R0                       
R_l0_g1        3        7
R_l0_g3        4        3
R_l0_g4        5        3
R_l0_g2        8        1
R_l0_g0        9        1), (True, C0       C_l0_g0  C_l0_g1
R0                       
R_l0_g6        4        1
R_l0_g9        4        8
R_l0_g7        5        4
R_l0_g5        7        2
R_l0_g8        9        7)]

# but not if the transformer function signature uses **kwds, or 'raw' already
In [23]: print r2
C0             C_l0_g0  C_l0_g1
      R0                       
False R_l0_g1        3        7
      R_l0_g3        4        3
      R_l0_g4        5        3
      R_l0_g2        8        1
      R_l0_g0        9        1
True  R_l0_g6        4        1
      R_l0_g9        4        8
      R_l0_g7        5        4
      R_l0_g5        7        2
      R_l0_g8        9        7

In [24]: print r3
C0             C_l0_g0  C_l0_g1
      R0                       
False R_l0_g1        3        7
      R_l0_g3        4        3
      R_l0_g4        5        3
      R_l0_g2        8        1
      R_l0_g0        9        1
True  R_l0_g6        4        1
      R_l0_g9        4        8
      R_l0_g7        5        4
      R_l0_g5        7        2
      R_l0_g8        9        7

jreback · 2013-04-02T17:54:20Z

doesn't apply already have a raw argument?

    def apply(self, func, axis=0, broadcast=False, raw=False,
              args=(), **kwds):
        """
        Applies function along input axis of DataFrame. Objects passed to
        functions are Series objects having index either the DataFrame's index
        (axis=0) or the columns (axis=1). Return type depends on whether passed
        function aggregates

        Parameters
        ----------
        func : function
            Function to apply to each column
        axis : {0, 1}
            0 : apply function to each column
            1 : apply function to each row
        broadcast : bool, default False
            For aggregation functions, return object of same size with values
            propagated
        raw : boolean, default False
            If False, convert each row or column into a Series. If raw=True the
            passed function will receive ndarray objects instead. If you are
            just applying a NumPy reduction function this will achieve much
            better performance
        args : tuple
            Positional arguments to pass to function in addition to the
            array/series
        Additional keyword arguments will be passed as keywords to the function

jreback · 2013-04-02T17:59:05Z

sorry....you mean the groupby apply....retract my comment

ghost · 2013-04-02T18:10:07Z

No, that's a good comment, I wasn't aware of that arg which has a different meaning.
Would prefer to have consistent arg names across pandas, if raw means something else in
another apply function it's probably better to find another name. Are you aware of similar
functionality + name elsewhere in the API?

jreback · 2013-04-02T18:19:25Z

NTMK, maybe rename your to combine=True ? (and the behavior you exibit be combine=False)

ghost · 2013-04-02T18:26:40Z

sold.

…unmolested

jreback · 2013-04-02T19:32:24Z

this looks good....maybe add to whatsnew/docs? (as its an interesting case)

ghost · 2013-04-02T19:36:13Z

will do, but in 0.12. I'm not putting in anything new at this point in the release cycle.

ghost · 2013-04-02T19:40:13Z

Now that I think of it, would be partly mitigated by a df.split_on(nlevels=1) or something similar
(related #3066), although this is still useful I think.

jreback · 2013-04-02T19:42:44Z

yep....

jreback · 2013-09-21T23:51:20Z

@y-p forgot you did this....hmm....let's resurrect in 0.14.....

ghost · 2013-12-19T22:58:06Z

pandas/core/groupby.py

@@ -307,6 +307,9 @@ def apply(self, func, *args, **kwargs):
        Parameters
        ----------
        func : function
+        combine : (default: True), You may pass in a combine=True argument to get back


should be combine=False

ghost · 2013-12-19T23:02:01Z

like #5655 I'm hesitatnt because this jumps through hoops to compensate for a fundamentally problematic
choice that's too established to correct (optionated apply - meh, but capturing all kwds to prevent extentions
to the signature is nasty).

Still, I think this is solid and i'm for brushing off the dust and merging in 0.14.
Perhaps a nice synergy with #4059 (comment) if it makes it in as well.

ENH: groupby().apply(f) accepts combine=False arg, to return results …

77dd9a1

…unmolested

ghost reviewed Dec 19, 2013
View reviewed changes

ghost closed this Jan 26, 2014

ghost deleted the feature/groupby_apply_raw_mode branch January 26, 2014 21:32

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: groupby().apply(f) accepts combine=0 arg, to return results unmolested #3241 #3242

ENH: groupby().apply(f) accepts combine=0 arg, to return results unmolested #3241 #3242

ghost commented Apr 2, 2013

jreback commented Apr 2, 2013

jreback commented Apr 2, 2013

ghost commented Apr 2, 2013

jreback commented Apr 2, 2013

ghost commented Apr 2, 2013

jreback commented Apr 2, 2013

ghost commented Apr 2, 2013

ghost commented Apr 2, 2013

jreback commented Apr 2, 2013

jreback commented Sep 21, 2013

ghost Dec 19, 2013

ghost commented Dec 19, 2013

ENH: groupby().apply(f) accepts combine=0 arg, to return results unmolested #3241 #3242

ENH: groupby().apply(f) accepts combine=0 arg, to return results unmolested #3241 #3242

Conversation

ghost commented Apr 2, 2013

With this PR:

jreback commented Apr 2, 2013

jreback commented Apr 2, 2013

ghost commented Apr 2, 2013

jreback commented Apr 2, 2013

ghost commented Apr 2, 2013

jreback commented Apr 2, 2013

ghost commented Apr 2, 2013

ghost commented Apr 2, 2013

jreback commented Apr 2, 2013

jreback commented Sep 21, 2013

ghost Dec 19, 2013

Choose a reason for hiding this comment

ghost commented Dec 19, 2013