-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: Expanded resample #13961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Expanded resample #13961
Changes from 6 commits
def74de
c4db0e7
b55309a
7f9add4
5fd97d9
c7b299e
384026b
e203fcf
10c7280
b8dd114
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -377,6 +377,20 @@ Other enhancements | |
|
||
pd.Timestamp(year=2012, month=1, day=1, hour=8, minute=30) | ||
|
||
- the ``.resample()`` function now accepts a ``on=`` or ``level=`` parameter for resampling on a column or ``MultiIndex`` level (:issue:`13500`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. again would say datetimelike as well. Further I think the doc-string of what we have now
|
||
|
||
.. ipython:: python | ||
|
||
df = pd.DataFrame({'date': pd.date_range('2015-01-01', freq='W', periods=5), | ||
'a': np.arange(5)}, | ||
index=pd.MultiIndex.from_arrays([ | ||
[1,2,3,4,5], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would add to the main docs a similar example |
||
pd.date_range('2015-01-01', freq='W', periods=5)], | ||
names=['v','d'])) | ||
df | ||
df.resample('M', on='date').sum() | ||
df.resample('M', level='d').sum() | ||
|
||
- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``decimal`` option (:issue:`12933`) | ||
- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``na_filter`` option (:issue:`13321`) | ||
- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``memory_map`` option (:issue:`13381`) | ||
|
@@ -934,7 +948,7 @@ Bug Fixes | |
- Bug in ``pd.read_hdf()`` returns incorrect result when a ``DataFrame`` with a ``categorical`` column and a query which doesn't match any values (:issue:`13792`) | ||
- Bug in ``pd.to_datetime()`` raise ``AttributeError`` with NaN and the other string is not valid when errors='ignore' (:issue:`12424`) | ||
|
||
|
||
- Bug in ``groupby`` where a ``TimeGrouper`` selection is used with the ``key`` or ``level`` arguments with a ``PeriodIndex`` (:issue:`14008`) | ||
- Bug in ``Series`` comparison operators when dealing with zero dim NumPy arrays (:issue:`13006`) | ||
- Bug in ``groupby`` where ``apply`` returns different result depending on whether first result is ``None`` or not (:issue:`12824`) | ||
- Bug in ``groupby(..).nth()`` where the group key is included inconsistently if called after ``.head()/.tail()`` (:issue:`12839`) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4038,10 +4038,12 @@ def between_time(self, start_time, end_time, include_start=True, | |
|
||
def resample(self, rule, how=None, axis=0, fill_method=None, closed=None, | ||
label=None, convention='start', kind=None, loffset=None, | ||
limit=None, base=0): | ||
limit=None, base=0, on=None, level=None): | ||
""" | ||
Convenience method for frequency conversion and resampling of regular | ||
time-series data. | ||
Convenience method for frequency conversion and resampling of time | ||
series. Object must have a datetime-like index (DatetimeIndex, | ||
PeriodIndex, or TimedeltaIndex), or pass datetime-like values | ||
to the on or level keyword. | ||
|
||
Parameters | ||
---------- | ||
|
@@ -4059,7 +4061,12 @@ def resample(self, rule, how=None, axis=0, fill_method=None, closed=None, | |
For frequencies that evenly subdivide 1 day, the "origin" of the | ||
aggregated intervals. For example, for '5min' frequency, base could | ||
range from 0 through 4. Defaults to 0 | ||
|
||
on : string, optional | ||
For a DataFrame, column to use instead of index for resampling. | ||
Column must be datetime-like. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add versionadded tags |
||
level : string or int, optional | ||
For a MultiIndex, level (name or number) to use for | ||
resampling. Level must be datetime-like. | ||
|
||
To learn more about the offset strings, please see `this link | ||
<http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases>`__. | ||
|
@@ -4164,12 +4171,11 @@ def resample(self, rule, how=None, axis=0, fill_method=None, closed=None, | |
""" | ||
from pandas.tseries.resample import (resample, | ||
_maybe_process_deprecations) | ||
|
||
axis = self._get_axis_number(axis) | ||
r = resample(self, freq=rule, label=label, closed=closed, | ||
axis=axis, kind=kind, loffset=loffset, | ||
convention=convention, | ||
base=base) | ||
base=base, key=on, level=level) | ||
return _maybe_process_deprecations(r, | ||
how=how, | ||
fill_method=fill_method, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -247,15 +247,18 @@ def _get_grouper(self, obj): | |
sort=self.sort) | ||
return self.binner, self.grouper, self.obj | ||
|
||
def _set_grouper(self, obj, sort=False): | ||
def _set_grouper(self, obj, sort=False, converter=None): | ||
""" | ||
given an object and the specifications, setup the internal grouper | ||
for this particular specification | ||
|
||
Parameters | ||
---------- | ||
obj : the subject object | ||
|
||
sort : bool, default False | ||
whether the resulting grouper should be sorted | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this was missing I guess? |
||
converter : callable, optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is this? this is very confusing now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The root (existing) problem is that sometimes In the case where we are not using the index, but a selection, there needs to be a way to convert the selected column or level. I'm using a callback to the selection function to get there - do you see a better way? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this needs to be addressed separately in another which can fix the core issues |
||
conversion to apply the grouper after selection | ||
""" | ||
|
||
if self.key is not None and self.level is not None: | ||
|
@@ -295,6 +298,8 @@ def _set_grouper(self, obj, sort=False): | |
convert=False, is_copy=False) | ||
|
||
self.obj = obj | ||
if converter is not None: | ||
ax = converter(ax) | ||
self.grouper = ax | ||
return self.grouper | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -64,7 +64,7 @@ class Resampler(_GroupBy): | |
'binner', 'grouper', 'groupby', | ||
'sort', 'kind', 'squeeze', 'keys', | ||
'group_keys', 'as_index', 'exclusions', | ||
'_groupby'] | ||
'_groupby', 'from_selection'] | ||
|
||
# don't raise deprecation warning on attributes starting with these | ||
# patterns - prevents warnings caused by IPython introspection | ||
|
@@ -85,9 +85,14 @@ def __init__(self, obj, groupby=None, axis=0, kind=None, **kwargs): | |
self.exclusions = set() | ||
self.binner = None | ||
self.grouper = None | ||
self.from_selection = False | ||
|
||
if self.groupby is not None: | ||
self.groupby._set_grouper(self._convert_obj(obj), sort=True) | ||
# bookeeping to disallow upsampling if not resampling on index | ||
self.from_selection = (self.groupby.key is not None or | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what the heck is this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As noted in the discussion above, I'm disallowing upsampling if using a selection. This is used to catch and raise an error. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here don't need to address these in this PR There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm with you on the period stuff, but are you sure about this? If I pull this out, it's going to break error reporting. with:
without:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would make this a private property, |
||
self.groupby.level is not None) | ||
obj, converter = self._convert_obj(obj) | ||
self.groupby._set_grouper(obj, sort=True, converter=converter) | ||
|
||
def __unicode__(self): | ||
""" provide a nice str repr of our rolling object """ | ||
|
@@ -203,13 +208,20 @@ def __setitem__(self, attr, value): | |
def _convert_obj(self, obj): | ||
""" | ||
provide any conversions for the object in order to correctly handle | ||
and returns a converter function to be applied to grouping selection | ||
|
||
Parameters | ||
---------- | ||
obj : the object to be resampled | ||
|
||
Returns | ||
------- | ||
obj : converted object | ||
converter : callable, optional | ||
converter to apply after selection | ||
""" | ||
obj = obj.consolidate() | ||
return obj | ||
return obj, None | ||
|
||
def _get_binner_for_time(self): | ||
raise AbstractMethodError(self) | ||
|
@@ -706,6 +718,11 @@ def _upsample(self, method, limit=None): | |
self._set_binner() | ||
if self.axis: | ||
raise AssertionError('axis must be 0') | ||
if self.from_selection: | ||
raise NotImplementedError("Upsampling from level= or on= selection" | ||
" is not supported, use .set_index(...)" | ||
" to explicitly set index to" | ||
" datetime-like") | ||
|
||
ax = self.ax | ||
obj = self._selected_obj | ||
|
@@ -751,7 +768,7 @@ def _resampler_for_grouping(self): | |
return PeriodIndexResamplerGroupby | ||
|
||
def _convert_obj(self, obj): | ||
obj = super(PeriodIndexResampler, self)._convert_obj(obj) | ||
obj, _ = super(PeriodIndexResampler, self)._convert_obj(obj) | ||
|
||
offset = to_offset(self.freq) | ||
if offset.n > 1: | ||
|
@@ -761,10 +778,17 @@ def _convert_obj(self, obj): | |
# Cannot have multiple of periods, convert to timestamp | ||
self.kind = 'timestamp' | ||
|
||
converter = None | ||
# convert to timestamp | ||
if not (self.kind is None or self.kind == 'period'): | ||
obj = obj.to_timestamp(how=self.convention) | ||
return obj | ||
# if periondindex is the actual index obj, just convert it | ||
# otherwise, converter callback will be used on selection | ||
if self.from_selection: | ||
converter = lambda x: x.to_timestamp(how=self.convention) | ||
else: | ||
obj = obj.to_timestamp(how=self.convention) | ||
|
||
return obj, converter | ||
|
||
def aggregate(self, arg, *args, **kwargs): | ||
result, how = self._aggregate(arg, *args, **kwargs) | ||
|
@@ -840,6 +864,11 @@ def _upsample(self, method, limit=None): | |
.fillna | ||
|
||
""" | ||
if self.from_selection: | ||
raise NotImplementedError("Upsampling from level= or on= selection" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this should be a |
||
" is not supported, use .set_index(...)" | ||
" to explicitly set index to" | ||
" datetime-like") | ||
# we may need to actually resample as if we are timestamps | ||
if self.kind == 'timestamp': | ||
return super(PeriodIndexResampler, self)._upsample(method, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it clear that the
on
(currently) still must be a datetimelike (so we of course acceptPeriodIndex/TimedeltaIndex
here as well (add tests if we don't have them for those as well)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use
datetimelike
rather thanDatetimeIndex