Closed
Description
I'm running into a weird issue with groupby and function closure. For some reason the function closure doesn't work unless I access the grouped series. You can see in agg_before I have a fix flag that will just access the data var.
from pandas import *
import numpy as np
periods = 1000
ind = DatetimeIndex(start='2012/1/1', freq='5min', periods=periods)
df = DataFrame({'high': np.arange(periods), 'low': np.arange(periods)}, index=ind)
def agg_before(hour, func, fix=False):
"""
Run an aggregate func on the subset of data.
"""
def _func(data):
d = data.select(lambda x: x.hour < 11).dropna()
if fix:
data[data.index[0]]
if len(d) == 0:
return None
return func(d)
return _func
def afunc(data):
d = data.select(lambda x: x.hour < 11).dropna()
return np.max(d)
grouped = df.groupby(lambda x: datetime(x.year, x.month, x.day))
closure_bad = grouped.agg({'high': agg_before(11, np.max)})
closure_good = grouped.agg({'high': agg_before(11, np.max, True)})
lambda_good = grouped.agg({'high': afunc})
In [33]: np.__version__
Out[39]: '1.6.2'
In [34]: pandas.__version__
Out[34]: '0.8.0.dev-dc6ce90'
In [35]: closure_bad
Out[35]:
high
2012-01-01 131
2012-01-02 NaN
2012-01-03 NaN
2012-01-04 NaN
In [36]: closure_good
Out[36]:
high
2012-01-01 131
2012-01-02 419
2012-01-03 707
2012-01-04 995
In [37]: lambda_good
Out[37]:
high
2012-01-01 131
2012-01-02 419
2012-01-03 707
2012-01-04 995
Running an agg function that isn't a closure works fine. Any ideas on this?