Skip to content

df.rolling(...).sum() on DataFrame containing string columns returns a TypeError #23467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alporter08 opened this issue Nov 2, 2018 · 1 comment
Labels
Duplicate Report Duplicate issue or pull request

Comments

@alporter08
Copy link

Code Sample

df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
                   index = [pd.Timestamp('20130101 09:00:00'),
                            pd.Timestamp('20130101 09:00:02'),
                            pd.Timestamp('20130101 09:00:03'),
                            pd.Timestamp('20130101 09:00:05'),
                            pd.Timestamp('20130101 09:00:06')])

df = df.reset_index().rename(columns={'index':'timestamp'})

df['id'] = 'abcd'


df.rolling('2s', on='timestamp').sum()

Original dataframe:

  timestamp B id
2013-01-01 09:00:00 0.0 abcd
2013-01-01 09:00:02 1.0 abcd
2013-01-01 09:00:03 2.0 abcd
2013-01-01 09:00:05 NaN abcd
2013-01-01 09:00:06 4.0 abcd

Problem description

Rolling used to work in prior versions (e.g. 0.20.3) on a dataframe that included string columns. When running the above code in version 0.23.4, the following error is raised:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/Users/aporter/anaconda3/lib/python3.6/site-packages/pandas/core/window.py in _prep_values(self, values, kill_inf)
    221             try:
--> 222                 values = _ensure_float64(values)
    223             except (ValueError, TypeError):

pandas/_libs/algos_common_helper.pxi in pandas._libs.algos.ensure_float64()

pandas/_libs/algos_common_helper.pxi in pandas._libs.algos.ensure_float64()

ValueError: could not convert string to float: 'abcd'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-11-e98474769a85> in <module>()
----> 1 df.rolling('2s', on='timestamp').sum()

/Users/aporter/anaconda3/lib/python3.6/site-packages/pandas/core/window.py in sum(self, *args, **kwargs)
   1584     def sum(self, *args, **kwargs):
   1585         nv.validate_rolling_func('sum', args, kwargs)
-> 1586         return super(Rolling, self).sum(*args, **kwargs)
   1587 
   1588     @Substitution(name='rolling')

/Users/aporter/anaconda3/lib/python3.6/site-packages/pandas/core/window.py in sum(self, *args, **kwargs)
   1005     def sum(self, *args, **kwargs):
   1006         nv.validate_window_func('sum', args, kwargs)
-> 1007         return self._apply('roll_sum', 'sum', **kwargs)
   1008 
   1009     _shared_docs['max'] = dedent("""

/Users/aporter/anaconda3/lib/python3.6/site-packages/pandas/core/window.py in _apply(self, func, name, window, center, check_minp, **kwargs)
    842         results = []
    843         for b in blocks:
--> 844             values = self._prep_values(b.values)
    845 
    846             if values.size == 0:

/Users/aporter/anaconda3/lib/python3.6/site-packages/pandas/core/window.py in _prep_values(self, values, kill_inf)
    223             except (ValueError, TypeError):
    224                 raise TypeError("cannot handle this type -> {0}"
--> 225                                 "".format(values.dtype))
    226 
    227         if kill_inf:

TypeError: cannot handle this type -> object

Expected Output

In 0.20.3, the output is:

  timestamp B id
2013-01-01 09:00:00 0.0 abcd
2013-01-01 09:00:02 1.0 abcd
2013-01-01 09:00:03 3.0 abcd
2013-01-01 09:00:05 NaN abcd
2013-01-01 09:00:06 4.0 abcd

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.0.7
pip: 18.1
setuptools: 40.0.0
Cython: 0.25.2
numpy: 1.14.2
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@alporter08 alporter08 changed the title Rolling().sum() on DataFrame containing string columns returns a TypeError df.rolling(...).sum() on DataFrame containing string columns returns a TypeError Nov 2, 2018
@mroeschke
Copy link
Member

Thanks for the report. Another user reported a similar issue #23002. Closing in favor of that issue, but feel free to chime in there.

@mroeschke mroeschke added the Duplicate Report Duplicate issue or pull request label Nov 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants