You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I try to aggregate measurements into 5-second intervals using df.resample('5s').median(), I get this traceback:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\groupby.py in median(self)
980 try:
--> 981 return self._cython_agg_general('median')
982 except GroupByError:
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\groupby.py in _cython_agg_general(self, how, numeric_only)
3047 new_items, new_blocks = self._cython_agg_blocks(
-> 3048 how, numeric_only=numeric_only)
3049 return self._wrap_agged_blocks(new_items, new_blocks)
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\groupby.py in _cython_agg_blocks(self, how, numeric_only)
3084 result, _ = self.grouper.aggregate(
-> 3085 block.values, how, axis=agg_axis)
3086
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\groupby.py in aggregate(self, values, how, axis)
1821 def aggregate(self, values, how, axis=0):
-> 1822 return self._cython_operation('aggregate', values, how, axis)
1823
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\groupby.py in _cython_operation(self, kind, values, how, axis)
1757 func, dtype_str = self._get_cython_function(
-> 1758 kind, how, values, is_numeric)
1759 except NotImplementedError:
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\groupby.py in _get_cython_function(self, kind, how, values, is_numeric)
1698
-> 1699 ftype = self._cython_functions[kind][how]
1700
KeyError: 'median'
During handling of the above exception, another exception occurred:
AssertionError Traceback (most recent call last)
<ipython-input-55-c17a77e187f3> in <module>()
1 df = chamber_k30['20160908':'20160908T0000']
----> 2 df.resample('5s').median()
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\tseries\resample.py in f(self, _method)
508
509 def f(self, _method=method):
--> 510 return self._downsample(_method)
511 f.__doc__ = getattr(GroupBy, method).__doc__
512 setattr(Resampler, method, f)
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\tseries\resample.py in _downsample(self, how, **kwargs)
661 # we want to call the actual grouper method here
662 result = obj.groupby(
--> 663 self.grouper, axis=self.axis).aggregate(how, **kwargs)
664
665 result = self._apply_loffset(result)
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\groupby.py in aggregate(self, arg, *args, **kwargs)
3595 @Appender(SelectionMixin._agg_doc)
3596 def aggregate(self, arg, *args, **kwargs):
-> 3597 return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
3598
3599 agg = aggregate
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\groupby.py in aggregate(self, arg, *args, **kwargs)
3112
3113 _level = kwargs.pop('_level', None)
-> 3114 result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
3115 if how is None:
3116 return result
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\base.py in _aggregate(self, arg, *args, **kwargs)
426 _level = kwargs.pop('_level', None)
427 if isinstance(arg, compat.string_types):
--> 428 return getattr(self, arg)(*args, **kwargs), None
429
430 if isinstance(arg, dict):
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\groupby.py in median(self)
990 x = Series(x)
991 return x.median(axis=self.axis)
--> 992 return self._python_agg_general(f)
993
994 @Substitution(name='groupby')
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\groupby.py in _python_agg_general(self, func, *args, **kwargs)
775 for name, obj in self._iterate_slices():
776 try:
--> 777 result, counts = self.grouper.agg_series(obj, f)
778 output[name] = self._try_cast(result, obj)
779 except TypeError:
C:\Users\pokeeffe\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\groupby.py in agg_series(self, obj, func)
2063 dummy = obj[:0]
2064 grouper = lib.SeriesBinGrouper(obj, func, self.bins, dummy)
-> 2065 return grouper.get_result()
2066
2067 # ----------------------------------------------------------------------
pandas\src\reduce.pyx in pandas.lib.SeriesBinGrouper.get_result (pandas\lib.c:35367)()
pandas\src\reduce.pyx in pandas.lib.Slider.__init__ (pandas\lib.c:40335)()
AssertionError:
The other documented dispatching methods (sum, mean, std, sem, max, min, first, last) work just fine (except for ohlc, which produces an InvalidIndexError).
I can work around the problem like so: df.resample('5s').apply(lambda x: x.median()). But it seems like dispatching should work here...
In [21]: df = pd.DataFrame(np.random.randn(20,3), columns=list('abc'), index=pd.date_range('2012-01-01', periods=20, freq='s'))
In [23]: df.resample('5s').median()
Out[23]:
a b c
2012-01-01 00:00:00 -0.209421 -0.649436 -0.857474
2012-01-01 00:00:05 0.304136 0.335305 0.639129
2012-01-01 00:00:10 -0.228682 -0.803259 -0.615048
2012-01-01 00:00:15 0.121994 -0.214258 0.520752
In [24]: df = pd.DataFrame(np.random.randn(20,3), columns=list('aaa'), index=pd.date_range('2012-01-01', periods=20, freq='s'))
In [25]: df.resample('5s').median()
...
KeyError: 'median'
So it is caused by the duplicate column names.
This seems like a bug, but in any case you can for now solve for you by renaming the columns as a work-around.
Simple fix for median issue. Should use cython implementation.
closespandas-dev#14233
Author: Dr-Irv <[email protected]>
Closespandas-dev#15202 from Dr-Irv/Issue14233 and squashes the following commits:
6e0d900 [Dr-Irv] Use randn in test
1a3b4aa [Dr-Irv] BUG: GH14233 resample().median() failed if duplicate column names were present
I start with a dataframe (
df
) containing staggered measurements (select->copy->from_clipboard()
):When I try to aggregate measurements into 5-second intervals using
df.resample('5s').median()
, I get this traceback:The other documented dispatching methods (
sum
,mean
,std
,sem
,max
,min
,first
,last
) work just fine (except forohlc
, which produces anInvalidIndexError
).I can work around the problem like so:
df.resample('5s').apply(lambda x: x.median())
. But it seems like dispatching should work here...output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: