Skip to content

rolling.apply or rolling.agg doesn't work on string columns #20773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
enfeizhan opened this issue Apr 20, 2018 · 4 comments
Closed

rolling.apply or rolling.agg doesn't work on string columns #20773

enfeizhan opened this issue Apr 20, 2018 · 4 comments
Labels
Apply Apply, Aggregate, Transform, Map Dtype Conversions Unexpected or buggy dtype conversions Enhancement Window rolling, ewma, expanding

Comments

@enfeizhan
Copy link

Code Sample, a copy-pastable example if possible

dat = pd.DataFrame(
    {'number': [2, 2, 3], 'string': ['a', 'a', 'b']},
    index=pd.date_range('2018-01-01', periods=3, freq='1s')
)
dat.rolling('2min').apply(lambda x: len(np.unique(x)))
                      number	string
2018-01-01 00:00:00	1.0	a
2018-01-01 00:00:01	1.0	a
2018-01-01 00:00:02	2.0	b

Problem description

What I am trying to do is counting how many unique values in a rolling window. This works well for the numeric column dat.number but the string column dat.string simply only returns what it was.

In the above example, I expect to see the two columns in the output are the same as the number of unique values are 1, 1, 2 starting from the first row. However the string column returns a, a, b.

Expected Output

                    number	string
2018-01-01 00:00:00	1.0	1.0
2018-01-01 00:00:01	1.0	1.0
2018-01-01 00:00:02	2.0	2.0

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-38-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.3.0
sphinx: 1.7.2
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Apr 21, 2018

added it to the master tracker, but this would require some work. you are welcome to submit a PR.

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions API Design Difficulty Intermediate labels Apr 21, 2018
@jreback jreback added this to the Next Major Release milestone Apr 21, 2018
@jreback
Copy link
Contributor

jreback commented Apr 21, 2018

you can do this (the raw=True is new in 0.23, but the rest should work in other versions)

In [28]:  dat.apply(lambda x: x.astype('category').cat.codes).rolling('2min').apply(lambda x: len(pd.unique(x)), raw=True)
Out[28]: 
                     number  string
2018-01-01 00:00:00     1.0     1.0
2018-01-01 00:00:01     1.0     1.0
2018-01-01 00:00:02     2.0     2.0

@enfeizhan
Copy link
Author

enfeizhan commented Apr 23, 2018

@jreback Thanks. Think another solution is get a hash on the string column and then counting unique hashing values.

@mroeschke mroeschke added Apply Apply, Aggregate, Transform, Map Window rolling, ewma, expanding and removed API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 27, 2019
@mroeschke
Copy link
Member

This looks like a duplicate of #23002

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Dtype Conversions Unexpected or buggy dtype conversions Enhancement Window rolling, ewma, expanding
Projects
None yet
Development

No branches or pull requests

4 participants