Skip to content

API: change .resample to be a groupby-like API #11732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Dec 1, 2015 · 7 comments
Closed

API: change .resample to be a groupby-like API #11732

jreback opened this issue Dec 1, 2015 · 7 comments
Labels
API Design Resample resample method
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Dec 1, 2015

similar to #11603

this would transform:

s.resample('D',how='max')

to

s.resample('D').max()

This would be a breaking API change, as the default is how='mean', meaning, that s.resample('D') returns the mean of the resampled data. However it would be visible at the very least and not simply change working code.

This would bring .resample (which is just a groupby type operation under the hood anyhow) into the API syntax for .groupby and .rolling et. al.

Furthermore this would allow geitem / aggregate type operations with minimal effort
e.g.

s.resample('D').agg(['min','max'])

@jreback jreback added this to the 0.18.0 milestone Dec 1, 2015
@jreback jreback changed the title API: change .resample to be a groupby-like operation API: change .resample to be a groupby-like API Dec 1, 2015
@shoyer
Copy link
Member

shoyer commented Dec 1, 2015

This change would also eliminate the need many of the current use cases for pd.TimeGrouper, which is a nice thing because that API is pretty well hidden right now.

This API will work well for downsampling (to a coarser time resolution), but it's not clear to me how it would work for upsampling or combined down/upsampling. For example, how would you upsample from daily to hourly data using forward filling with the new API? s.resample('H').mean(fill_method='pad')? Using a method like mean is a bit confusing in this context.

@jreback
Copy link
Contributor Author

jreback commented Dec 1, 2015

s.resample('H').pad()

@jreback
Copy link
Contributor Author

jreback commented Dec 1, 2015

I am not sure that combined up/downsampling is even possible now?

@jreback
Copy link
Contributor Author

jreback commented Dec 1, 2015

or maybe to be more in-line

s.resample('H').ffill()
s.resample('H').fillna(method='pad')

(or all the above)

I guess

s.upsample('H').ffill() is also possible :)

@shoyer
Copy link
Member

shoyer commented Dec 1, 2015

Here's a simple example of combined up/downsampling:

In [25]: idx = pd.to_datetime(['2000-01-01T06', '2000-01-01T12', '2000-01-03T00'])

In [26]: s = pd.Series(range(3), idx)

In [27]: s
Out[27]:
2000-01-01 06:00:00    0
2000-01-01 12:00:00    1
2000-01-03 00:00:00    2
dtype: int64

In [28]: s.resample('1D')
Out[28]:
2000-01-01    0.5
2000-01-02    NaN
2000-01-03    2.0
Freq: D, dtype: float64

In [29]: s.resample('1D', fill_method='pad')
Out[29]:
2000-01-01    0.5
2000-01-02    0.5
2000-01-03    2.0
Freq: D, dtype: float64

@jreback
Copy link
Contributor Author

jreback commented Dec 1, 2015

I suppose we could have an optional fill_method kw in the Resample object
e.g. in s.resample('D',fill_method='pad') if necessary (similar to how .reindex has this, but normally you would do a: .reindex().ffill()

e.g.

In [23]: s.resample('1D',how='mean').ffill()
Out[23]: 
2000-01-01    0.5
2000-01-02    0.5
2000-01-03    2.0
Freq: D, dtype: float64

which I would do like:
s.resample('1D').mean().ffill()

I guess fill_method would apply while doing the mean intra-day I guess (though I don't think I can see a case for this).

@jreback
Copy link
Contributor Author

jreback commented Dec 2, 2015

POC

In [3]: s = Series(np.random.rand(1000), pd.date_range('20130101 09:00:00',freq='Min',periods=1000))

In [4]: r = s.resample2('H')

In [5]: r
Out[5]: DatetimeIndexResampler [freq-><Hour>,axis->0,closed->left,label->left,convention->start,base->0]

In [6]: r.
r.agg        r.aggregate  r.ax         r.mean       r.name       

In [6]: r.mean()
Out[6]: 
2013-01-01 09:00:00    0.463474
2013-01-01 10:00:00    0.496552
2013-01-01 11:00:00    0.467690
2013-01-01 12:00:00    0.542037
2013-01-01 13:00:00    0.500808
2013-01-01 14:00:00    0.541115
2013-01-01 15:00:00    0.549489
2013-01-01 16:00:00    0.567870
2013-01-01 17:00:00    0.466067
2013-01-01 18:00:00    0.468675
2013-01-01 19:00:00    0.520051
2013-01-01 20:00:00    0.495800
2013-01-01 21:00:00    0.496541
2013-01-01 22:00:00    0.437051
2013-01-01 23:00:00    0.514727
2013-01-02 00:00:00    0.517313
2013-01-02 01:00:00    0.501945
Freq: H, dtype: float64

jreback added a commit to jreback/pandas that referenced this issue Feb 2, 2016
original API detection & warning

support for isinstance / numeric ops

support for comparison ops

DOC: documentation updates w.r.t. aggregation
@jreback jreback closed this as completed in 1dc49f5 Feb 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Resample resample method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants