-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: resample doesn't work for non-numeric types #3087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Since the actual usecase which exposed the issue was a TimeSeries of dates the below test also tests that functionality in case it takes a different in codepath. def test_resample_nonnumeric():
import numpy as np
import pandas as pd
dates = pd.date_range('01-Jan-2014','05-Jan-2014', freq='D')
series = pd.TimeSeries(['a','b','c','d','e'], index=dates)
resampled_series = series[[0,1,3,4]].resample('D', fill_method='ffill')
assert (resampled_series.index == dates).all()
assert (resampled_series.values == np.asarray(['a','b','c','d','e'], dtype=object)).all()
series = pd.TimeSeries(dates, index=dates)
resampled_series = series[[0,1,3,4]].resample('D', fill_method='ffill')
assert (resampled_series.index == dates).all()
assert (resampled_series.values == series[[0,1,1,3,4]]).all() |
@dhirschfeld What would the |
The docstring says OT: This has always bothered me - can't one argument suffice? I'm constantly having to check the docstring to find out which argument I'm supposed to be using depending on whether I happen to be upsampling or downsampling. Also, In either case I'd expect that if you do something that's not well defined for the type of your object it's reasonable to let whatever exception is created filter through. i.e. if you passed a (theoretical) linear argument to In [28]: ('a'+'b')/2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-28-5523b85b2ca5> in <module>()
----> 1 ('a'+'b')/2
TypeError: unsupported operand type(s) for /: 'str' and 'int' if it's possible for it to work it should. Because summing strings is possible you'd expect that to work, but not mean. This in fact seems to be the case: In [39]: dates = pd.date_range('01-Jan-2014', periods=26, freq='D')
...: series = pd.TimeSeries(map(chr, range(97, 123)), index=dates)
...:
In [40]: series
Out[40]:
2014-01-01 a
2014-01-02 b
2014-01-03 c
2014-01-04 d
2014-01-05 e
2014-01-06 f
2014-01-07 g
2014-01-08 h
2014-01-09 i
2014-01-10 j
2014-01-11 k
2014-01-12 l
2014-01-13 m
2014-01-14 n
2014-01-15 o
2014-01-16 p
2014-01-17 q
2014-01-18 r
2014-01-19 s
2014-01-20 t
2014-01-21 u
2014-01-22 v
2014-01-23 w
2014-01-24 x
2014-01-25 y
2014-01-26 z
Freq: D, dtype: object
In [41]: series.resample('W', how='sum')
Out[41]:
2014-01-05 abcde
2014-01-12 fghijkl
2014-01-19 mnopqrs
2014-01-26 tuvwxyz
Freq: W-SUN, dtype: object
series.resample('W', how='mean')
---------------------------------------------------------------------------
DataError Traceback (most recent call last)
<ipython-input-42-e4ce7d0b1edd> in <module>()
----> 1 series.resample('W', how='mean')
C:\dev\bin\Python27\lib\site-packages\pandas\core\generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
255 fill_method=fill_method, convention=convention,
256 limit=limit, base=base)
--> 257 return sampler.resample(self)
258
259 def first(self, offset):
C:\dev\bin\Python27\lib\site-packages\pandas\tseries\resample.pyc in resample(self, obj)
81
82 if isinstance(axis, DatetimeIndex):
---> 83 rs = self._resample_timestamps(obj)
84 elif isinstance(axis, PeriodIndex):
85 offset = to_offset(self.freq)
C:\dev\bin\Python27\lib\site-packages\pandas\tseries\resample.pyc in _resample_timestamps(self, obj)
206 if len(grouper.binlabels) < len(axlabels) or self.how is not None:
207 grouped = obj.groupby(grouper, axis=self.axis)
--> 208 result = grouped.aggregate(self._agg_method)
209 else:
210 # upsampling shortcut
C:\dev\bin\Python27\lib\site-packages\pandas\core\groupby.pyc in aggregate(self, func_or_funcs, *args, **kwargs)
1409 """
1410 if isinstance(func_or_funcs, basestring):
-> 1411 return getattr(self, func_or_funcs)(*args, **kwargs)
1412
1413 if hasattr(func_or_funcs, '__iter__'):
C:\dev\bin\Python27\lib\site-packages\pandas\core\groupby.pyc in mean(self)
351 """
352 try:
--> 353 return self._cython_agg_general('mean')
354 except GroupByError:
355 raise
C:\dev\bin\Python27\lib\site-packages\pandas\core\groupby.pyc in _cython_agg_general(self, how, numeric_only)
466
467 if len(output) == 0:
--> 468 raise DataError('No numeric types to aggregate')
469
470 return self._wrap_aggregated_output(output, names)
DataError: No numeric types to aggregate Stranger, it seems that
|
Original example is no longer reproducible - if you have an updated example feel free to reopen! |
After discussion on the ML it was decided the below function should work in pandas.
It fails on the current master (0.11.0.dev-308beb1) with 32bit Python 2.7 on windows
https://groups.google.com/forum/?fromgroups=#!topic/pydata/NFA10wTVNu8
addtl example (related), how doing weird things:
The text was updated successfully, but these errors were encountered: