Skip to content

BUG: resample doesn't work for non-numeric types #3087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dhirschfeld opened this issue Mar 19, 2013 · 4 comments
Closed

BUG: resample doesn't work for non-numeric types #3087

dhirschfeld opened this issue Mar 19, 2013 · 4 comments
Labels
Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Resample resample method

Comments

@dhirschfeld
Copy link
Contributor

After discussion on the ML it was decided the below function should work in pandas.

It fails on the current master (0.11.0.dev-308beb1) with 32bit Python 2.7 on windows

def test_resample_nonnumeric():
    import numpy as np
    import pandas as pd
    dates = pd.date_range('01-Jan-2014','05-Jan-2014', freq='D')
    series = pd.TimeSeries(['a','b','c','d','e'], index=dates)
    resampled_series = series[[0,1,3,4]].resample('D', fill_method='ffill')
    assert (resampled_series.index == dates).all()
    assert (resampled_series.values == np.asarray(['a','b','b','d','e'], dtype=object)).all()

https://groups.google.com/forum/?fromgroups=#!topic/pydata/NFA10wTVNu8

addtl example (related), how doing weird things:

In [30]: df = pd.DataFrame({
'Buyer': 'Carl Mark Carl Joe Joe Carl'.split(),
'Quantity': [1,3,5,8,9,3],
'Date' : [
    DT.datetime(2013,9,1,13,0),
    DT.datetime(2013,9,1,13,5),
    DT.datetime(2013,10,1,20,0),
    DT.datetime(2013,10,3,10,0),
    DT.datetime(2013,12,2,12,0),                                      
    DT.datetime(2013,12,2,14,0),
    ]}).set_index(['Date'])

In [31]: df
Out[31]: 
                    Buyer  Quantity
Date                               
2013-09-01 13:00:00  Carl         1
2013-09-01 13:05:00  Mark         3
2013-10-01 20:00:00  Carl         5
2013-10-03 10:00:00   Joe         8
2013-12-02 12:00:00   Joe         9
2013-12-02 14:00:00  Carl         3


In [33]: df['Quantity'].resample('10D',how=sum)
Out[33]: 
Date
2013-09-01 13:00:00     4.000000e+00
2013-09-11 13:00:00              NaN
2013-09-21 13:00:00              NaN
2013-10-01 13:00:00     1.300000e+01
2013-10-11 13:00:00              NaN
2013-10-21 13:00:00              NaN
2013-10-31 13:00:00              NaN
2013-11-10 13:00:00              NaN
2013-11-20 13:00:00              NaN
2013-11-30 13:00:00     1.200000e+01
2013-12-10 13:00:00    7.637868e-317
Freq: 10D, dtype: float64
@dhirschfeld
Copy link
Contributor Author

Since the actual usecase which exposed the issue was a TimeSeries of dates the below test also tests that functionality in case it takes a different in codepath.

def test_resample_nonnumeric():
    import numpy as np
    import pandas as pd
    dates = pd.date_range('01-Jan-2014','05-Jan-2014', freq='D')
    series = pd.TimeSeries(['a','b','c','d','e'], index=dates)
    resampled_series = series[[0,1,3,4]].resample('D', fill_method='ffill')
    assert (resampled_series.index == dates).all()
    assert (resampled_series.values == np.asarray(['a','b','c','d','e'], dtype=object)).all()
    series = pd.TimeSeries(dates, index=dates)
    resampled_series = series[[0,1,3,4]].resample('D', fill_method='ffill')
    assert (resampled_series.index == dates).all()
    assert (resampled_series.values == series[[0,1,1,3,4]]).all()

@cpcloud
Copy link
Member

cpcloud commented May 13, 2013

@dhirschfeld What would the how argument do in this case? Be ignored?

@dhirschfeld
Copy link
Contributor Author

The docstring says how is for downsampling whilst fill_method is for upsampling.

OT: This has always bothered me - can't one argument suffice? I'm constantly having to check the docstring to find out which argument I'm supposed to be using depending on whether I happen to be upsampling or downsampling. Also, reindex calls the same argument method further adding to the confusion.

In either case I'd expect that if you do something that's not well defined for the type of your object it's reasonable to let whatever exception is created filter through. i.e. if you passed a (theoretical) linear argument to fill_method you'd expect a TypeError because you can't divide a string by a numeric value:

In [28]: ('a'+'b')/2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-5523b85b2ca5> in <module>()
----> 1 ('a'+'b')/2

TypeError: unsupported operand type(s) for /: 'str' and 'int'

if it's possible for it to work it should. Because summing strings is possible you'd expect that to work, but not mean. This in fact seems to be the case:

In [39]: dates = pd.date_range('01-Jan-2014', periods=26, freq='D')
    ...: series = pd.TimeSeries(map(chr, range(97, 123)), index=dates)
    ...: 

In [40]: series
Out[40]: 
2014-01-01    a
2014-01-02    b
2014-01-03    c
2014-01-04    d
2014-01-05    e
2014-01-06    f
2014-01-07    g
2014-01-08    h
2014-01-09    i
2014-01-10    j
2014-01-11    k
2014-01-12    l
2014-01-13    m
2014-01-14    n
2014-01-15    o
2014-01-16    p
2014-01-17    q
2014-01-18    r
2014-01-19    s
2014-01-20    t
2014-01-21    u
2014-01-22    v
2014-01-23    w
2014-01-24    x
2014-01-25    y
2014-01-26    z
Freq: D, dtype: object

In [41]: series.resample('W', how='sum')
Out[41]: 
2014-01-05      abcde
2014-01-12    fghijkl
2014-01-19    mnopqrs
2014-01-26    tuvwxyz
Freq: W-SUN, dtype: object

series.resample('W', how='mean')
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-42-e4ce7d0b1edd> in <module>()
----> 1 series.resample('W', how='mean')

C:\dev\bin\Python27\lib\site-packages\pandas\core\generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
    255                               fill_method=fill_method, convention=convention,
    256                               limit=limit, base=base)
--> 257         return sampler.resample(self)
    258 
    259     def first(self, offset):

C:\dev\bin\Python27\lib\site-packages\pandas\tseries\resample.pyc in resample(self, obj)
     81 
     82         if isinstance(axis, DatetimeIndex):
---> 83             rs = self._resample_timestamps(obj)
     84         elif isinstance(axis, PeriodIndex):
     85             offset = to_offset(self.freq)

C:\dev\bin\Python27\lib\site-packages\pandas\tseries\resample.pyc in _resample_timestamps(self, obj)
    206             if len(grouper.binlabels) < len(axlabels) or self.how is not None:
    207                 grouped = obj.groupby(grouper, axis=self.axis)
--> 208                 result = grouped.aggregate(self._agg_method)
    209             else:
    210                 # upsampling shortcut

C:\dev\bin\Python27\lib\site-packages\pandas\core\groupby.pyc in aggregate(self, func_or_funcs, *args, **kwargs)
   1409         """
   1410         if isinstance(func_or_funcs, basestring):
-> 1411             return getattr(self, func_or_funcs)(*args, **kwargs)
   1412 
   1413         if hasattr(func_or_funcs, '__iter__'):

C:\dev\bin\Python27\lib\site-packages\pandas\core\groupby.pyc in mean(self)
    351         """
    352         try:
--> 353             return self._cython_agg_general('mean')
    354         except GroupByError:
    355             raise

C:\dev\bin\Python27\lib\site-packages\pandas\core\groupby.pyc in _cython_agg_general(self, how, numeric_only)
    466 
    467         if len(output) == 0:
--> 468             raise DataError('No numeric types to aggregate')
    469 
    470         return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate

Stranger, it seems that ffill does work, but only if you don't index it first:

In [43]: dates = pd.date_range('01-Jan-2014','05-Jan-2014', freq='D')
    ...: series = pd.TimeSeries(['a','b','c','d','e'], index=dates)
    ...: series.resample('H', fill_method='ffill')
    ...: 
Out[43]: 
2014-01-01 00:00:00    a
2014-01-01 01:00:00    a
2014-01-01 02:00:00    a
2014-01-01 03:00:00    a
2014-01-01 04:00:00    a
2014-01-01 05:00:00    a
2014-01-01 06:00:00    a
2014-01-01 07:00:00    a
2014-01-01 08:00:00    a
2014-01-01 09:00:00    a
2014-01-01 10:00:00    a
2014-01-01 11:00:00    a
2014-01-01 12:00:00    a
2014-01-01 13:00:00    a
2014-01-01 14:00:00    a
...
2014-01-04 10:00:00    d
2014-01-04 11:00:00    d
2014-01-04 12:00:00    d
2014-01-04 13:00:00    d
2014-01-04 14:00:00    d
2014-01-04 15:00:00    d
2014-01-04 16:00:00    d
2014-01-04 17:00:00    d
2014-01-04 18:00:00    d
2014-01-04 19:00:00    d
2014-01-04 20:00:00    d
2014-01-04 21:00:00    d
2014-01-04 22:00:00    d
2014-01-04 23:00:00    d
2014-01-05 00:00:00    e
Freq: H, Length: 97, dtype: object

In [44]: series[[0,1,3,4]].resample('H', fill_method='ffill')
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-44-6621beea2243> in <module>()
----> 1 series[[0,1,3,4]].resample('H', fill_method='ffill')

C:\dev\bin\Python27\lib\site-packages\pandas\core\generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
    255                               fill_method=fill_method, convention=convention,
    256                               limit=limit, base=base)
--> 257         return sampler.resample(self)
    258 
    259     def first(self, offset):

C:\dev\bin\Python27\lib\site-packages\pandas\tseries\resample.pyc in resample(self, obj)
     81 
     82         if isinstance(axis, DatetimeIndex):
---> 83             rs = self._resample_timestamps(obj)
     84         elif isinstance(axis, PeriodIndex):
     85             offset = to_offset(self.freq)

C:\dev\bin\Python27\lib\site-packages\pandas\tseries\resample.pyc in _resample_timestamps(self, obj)
    221             # Irregular data, have to use groupby
    222             grouped = obj.groupby(grouper, axis=self.axis)
--> 223             result = grouped.aggregate(self._agg_method)
    224 
    225             if self.fill_method is not None:

C:\dev\bin\Python27\lib\site-packages\pandas\core\groupby.pyc in aggregate(self, func_or_funcs, *args, **kwargs)
   1409         """
   1410         if isinstance(func_or_funcs, basestring):
-> 1411             return getattr(self, func_or_funcs)(*args, **kwargs)
   1412 
   1413         if hasattr(func_or_funcs, '__iter__'):

C:\dev\bin\Python27\lib\site-packages\pandas\core\groupby.pyc in mean(self)
    351         """
    352         try:
--> 353             return self._cython_agg_general('mean')
    354         except GroupByError:
    355             raise

C:\dev\bin\Python27\lib\site-packages\pandas\core\groupby.pyc in _cython_agg_general(self, how, numeric_only)
    466 
    467         if len(output) == 0:
--> 468             raise DataError('No numeric types to aggregate')
    469 
    470         return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate

In [45]: pd.__version__
Out[45]: '0.11.0'

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Mar 9, 2014
@jreback jreback modified the milestones: Someday, 0.16.0 Jan 26, 2015
@jreback jreback modified the milestones: 0.18.2, Someday May 9, 2016
@jreback jreback modified the milestones: 0.18.2, Next Major Release Jul 6, 2016
@jreback jreback added 32bit 32-bit systems and removed 32bit 32-bit systems labels Mar 17, 2017
@WillAyd
Copy link
Member

WillAyd commented Jul 6, 2018

Original example is no longer reproducible - if you have an updated example feel free to reopen!

@WillAyd WillAyd closed this as completed Jul 6, 2018
@WillAyd WillAyd modified the milestones: Contributions Welcome, No action Jul 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Resample resample method
Projects
None yet
Development

No branches or pull requests

4 participants