Skip to content

select() within a function closure not working as agg function #1423

Closed
@dalejung

Description

@dalejung

I'm running into a weird issue with groupby and function closure. For some reason the function closure doesn't work unless I access the grouped series. You can see in agg_before I have a fix flag that will just access the data var.

from pandas import *                                                                                  
import numpy as np                                                                                    

periods = 1000                                                                                        
ind = DatetimeIndex(start='2012/1/1', freq='5min', periods=periods)                                   
df = DataFrame({'high': np.arange(periods), 'low': np.arange(periods)}, index=ind)                    

def agg_before(hour, func, fix=False):                                                                
    """                                                                                               
        Run an aggregate func on the subset of data.                                                  
    """                                                                                               
    def _func(data):                                                                                  
        d = data.select(lambda x: x.hour < 11).dropna()                                               
        if fix:                                                                                       
            data[data.index[0]]                                                                       
        if len(d) == 0:                                                                               
            return None                                                                               
        return func(d)                                                                                
    return _func                                                                                      

def afunc(data):                                                                                      
    d = data.select(lambda x: x.hour < 11).dropna()                                                   
    return np.max(d)                                                                                  

grouped = df.groupby(lambda x: datetime(x.year, x.month, x.day))                                      

closure_bad = grouped.agg({'high': agg_before(11, np.max)})                                           
closure_good = grouped.agg({'high': agg_before(11, np.max, True)})                                    
lambda_good = grouped.agg({'high': afunc})                         
In [33]: np.__version__
Out[39]: '1.6.2'

In [34]: pandas.__version__
Out[34]: '0.8.0.dev-dc6ce90'

In [35]: closure_bad
Out[35]: 
            high
2012-01-01   131
2012-01-02   NaN
2012-01-03   NaN
2012-01-04   NaN

In [36]: closure_good
Out[36]: 
            high
2012-01-01   131
2012-01-02   419
2012-01-03   707
2012-01-04   995

In [37]: lambda_good
Out[37]: 
            high
2012-01-01   131
2012-01-02   419
2012-01-03   707
2012-01-04   995

Running an agg function that isn't a closure works fine. Any ideas on this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions