sparse resampling not working with dictionary of columns? #15386

randomgambit · 2017-02-13T16:25:55Z

Hello there,

Have I said that Pandas is awesome? yes, many times ;-)

I have a question, I am working with a very large dataframe of trades, timestamped at the millisecond precision. Latest Pandas 19.2 here.

I need to resample the dataframe every 200 ms, but given that my data spans several years and I am only interested in resampling data between 10:00 am and 12:00 am every day (handled by between_time()), using a plain resample will crash and burn my machine.

Instead, I tried the sparse resampling shown in the http://pandas.pydata.org/pandas-docs/stable/timeseries.html#sparse-resampling, but it fails when i provide it with a dictionary of columns.

Is that expected? Is it a bug?

import pandas as pd
import numpy as np

rng = pd.date_range('2014-1-1', periods=100, freq='D') + pd.Timedelta('1s')
ts = pd.DataFrame({'value' : range(100)}, index=rng)


from functools import partial
from pandas.tseries.frequencies import to_offset

def round(t, freq):
 freq = to_offset(freq)
 return pd.Timestamp((t.value // freq.delta.value) * freq.delta.value)

# works
ts.groupby(partial(round, freq='3T')).value.sum()

# does not work
ts.groupby(partial(round, freq='3T')).apply({'value' : 'sum'})

ts.groupby(partial(round, freq='3T')).apply({'value' : 'sum'})
Traceback (most recent call last):

  File "<ipython-input-104-6004b307a469>", line 1, in <module>
    ts.groupby(partial(round, freq='3T')).apply({'value' : 'sum'})

  File "C:\Users\m1hxb02\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\groupby.py", line 674, in apply
    func = self._is_builtin_func(func)

  File "C:\Users\m1hxb02\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\base.py", line 644, in _is_builtin_func
    return self._builtin_table.get(arg, arg)

TypeError: unhashable type: 'dict'

Problem is: I need to resample several columns at once in my dataframe, eventually using different functions (sum, mean, max). Is anything wrong here?

Thanks~

The text was updated successfully, but these errors were encountered:

chris-b1 · 2017-02-13T16:50:40Z

You want to be using .agg here. e.g.

ts.groupby(partial(round, freq='3T')).agg({'value' : ['sum', 'mean']})

To re-purpose this issue - not sure when, but DatetimeIndex now has a vectorized round method which will be significantly faster - doc example should be updated.

In [149]: %timeit ts.groupby(partial(round, freq='3T')).agg({'value' : 'sum'})
100 loops, best of 3: 6.56 ms per loop

In [150]: %timeit ts.groupby(ts.index.round('3T')).agg({'value' : 'sum'})
1000 loops, best of 3: 1.83 ms per loop

randomgambit · 2017-02-13T17:01:38Z

@chris-b1 thanks! but the syntax for the regular resample is with apply right?

ts.resample('5Min').apply({'value' : 'sum'})

seems to work correctly

chris-b1 · 2017-02-13T17:10:43Z

To be honest I had no idea that worked, I think .agg would also be the idiomatic way with resample. @jreback ?

randomgambit · 2017-02-13T19:38:26Z

@chris-b1 summoning the great master @jreback
in my experience, pandas is smart enough (most of the time) to guess what apply is doing. That is, an agg or a transform. But Jeff knows better here

jreback · 2017-02-14T13:36:29Z

this will be handled in #14668

.apply does not accept a dictionary, see #14464

randomgambit · 2017-02-14T13:42:57Z

@chris-b1 @jreback nice. it DOES appear to work, though, in the case of resample

ts.resample('5Min').apply({'value' : 'sum'}) gives the same output as
ts.resample('5Min').agg({'value' : 'sum'})

jreback closed this as completed Feb 14, 2017

jreback added this to the No action milestone Feb 14, 2017

jreback added API Design Groupby Resample resample method labels Feb 14, 2017

jreback mentioned this issue Feb 14, 2017

ENH: add Series & DataFrame .agg/.aggregate #14668

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sparse resampling not working with dictionary of columns? #15386

sparse resampling not working with dictionary of columns? #15386

randomgambit commented Feb 13, 2017

chris-b1 commented Feb 13, 2017

randomgambit commented Feb 13, 2017

chris-b1 commented Feb 13, 2017

randomgambit commented Feb 13, 2017 •

edited

Loading

jreback commented Feb 14, 2017

randomgambit commented Feb 14, 2017 •

edited

Loading

sparse resampling not working with dictionary of columns? #15386

sparse resampling not working with dictionary of columns? #15386

Comments

randomgambit commented Feb 13, 2017

chris-b1 commented Feb 13, 2017

randomgambit commented Feb 13, 2017

chris-b1 commented Feb 13, 2017

randomgambit commented Feb 13, 2017 • edited Loading

jreback commented Feb 14, 2017

randomgambit commented Feb 14, 2017 • edited Loading

randomgambit commented Feb 13, 2017 •

edited

Loading

randomgambit commented Feb 14, 2017 •

edited

Loading