rolling.apply or rolling.agg doesn't work on string columns #20773

enfeizhan · 2018-04-20T23:14:44Z

Code Sample, a copy-pastable example if possible

dat = pd.DataFrame(
    {'number': [2, 2, 3], 'string': ['a', 'a', 'b']},
    index=pd.date_range('2018-01-01', periods=3, freq='1s')
)
dat.rolling('2min').apply(lambda x: len(np.unique(x)))
                      number	string
2018-01-01 00:00:00	1.0	a
2018-01-01 00:00:01	1.0	a
2018-01-01 00:00:02	2.0	b

Problem description

What I am trying to do is counting how many unique values in a rolling window. This works well for the numeric column dat.number but the string column dat.string simply only returns what it was.

In the above example, I expect to see the two columns in the output are the same as the number of unique values are 1, 1, 2 starting from the first row. However the string column returns a, a, b.

Expected Output

                    number	string
2018-01-01 00:00:00	1.0	1.0
2018-01-01 00:00:01	1.0	1.0
2018-01-01 00:00:02	2.0	2.0

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-38-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.3.0
sphinx: 1.7.2
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2018-04-21T17:47:18Z

added it to the master tracker, but this would require some work. you are welcome to submit a PR.

jreback · 2018-04-21T17:55:02Z

you can do this (the raw=True is new in 0.23, but the rest should work in other versions)

In [28]:  dat.apply(lambda x: x.astype('category').cat.codes).rolling('2min').apply(lambda x: len(pd.unique(x)), raw=True)
Out[28]: 
                     number  string
2018-01-01 00:00:00     1.0     1.0
2018-01-01 00:00:01     1.0     1.0
2018-01-01 00:00:02     2.0     2.0

enfeizhan · 2018-04-23T06:10:07Z

@jreback Thanks. Think another solution is get a hash on the string column and then counting unique hashing values.

mroeschke · 2020-05-08T19:36:44Z

This looks like a duplicate of #23002

leeong05 mentioned this issue Apr 21, 2018

API/ENH: master issue for pd.rolling_apply #8659

Closed

14 tasks

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions API Design Difficulty Intermediate labels Apr 21, 2018

jreback added this to the Next Major Release milestone Apr 21, 2018

mroeschke mentioned this issue Jul 26, 2019

rolling agg raise error #27601

Closed

jbrockmendel removed Effort Medium labels Oct 21, 2019

mroeschke added Apply Apply, Aggregate, Transform, Map Window rolling, ewma, expanding and removed API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 27, 2019

mroeschke added the Enhancement label May 8, 2020

mroeschke closed this as completed May 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rolling.apply or rolling.agg doesn't work on string columns #20773

rolling.apply or rolling.agg doesn't work on string columns #20773

enfeizhan commented Apr 20, 2018

INSTALLED VERSIONS

jreback commented Apr 21, 2018

jreback commented Apr 21, 2018

enfeizhan commented Apr 23, 2018 •

edited

Loading

mroeschke commented May 8, 2020

rolling.apply or rolling.agg doesn't work on string columns #20773

rolling.apply or rolling.agg doesn't work on string columns #20773

Comments

enfeizhan commented Apr 20, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Apr 21, 2018

jreback commented Apr 21, 2018

enfeizhan commented Apr 23, 2018 • edited Loading

mroeschke commented May 8, 2020

Output of `pd.show_versions()`

enfeizhan commented Apr 23, 2018 •

edited

Loading