-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame.rolling does nothing when values are in a list #18129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@tdpetrou : Thanks for reporting this! Wow, this is a lot to unpack here:
|
lists are not first class as elements; there is almost no support in rolling for what you are trying to do; nor likely to add as this is non idiomatic and non performant |
this is also a duplicate issue for much of this ; @gfyoung if you would find would be appreciated |
@jreback There is no other easy way to do what I am trying to do. |
and why doesn’t rolling.mean do it? that seems exactly what you are trying to do |
rolling.mean is not close to as functional as I need it. It only uses the current row and not all the rows with the current date whenever there are multiple rows with the same date.
You see how all the days that have the same date have different numbers? There are four rows for October 28 and they each have a different mean. I can't specify to use all the days of the current date. I also can't specify to look up at the next n days and simultaneously the previous m days. Nor is there a size method, and apply gets sent a one dimensional array. Seems very broken to me. |
of course this is how rolling works |
This wouldn't work if you wanted an evenly-weighted mean, which is why I wanted to collect all the values together first in a list. You could do it in a roundabout way with |
I have a simmilar problem, but with sets: import pandas as pd
pd.Series(data=[{1},{2},{3},{4}], index=[1,2,3,4]).rolling(2).apply(list)
# yields:
# 1 {1}
# 2 {2}
# 3 {3}
# 4 {4}
# dtype: object
# yet I should be something like:
# 1 None
# 2 [{1},{2}]
# 3 [{2},{3}]
# 4 [{3},{4}]
# dtype: object @jreback if |
Having the same issue with string data, no mention of what is wrong anywhere, no mention of unsupported datatypes in the docs, and no error, just wrong output. I disagree very much with the "resample" tag and request the "bug" tag be added again. |
@NightFantomJ2 I am building a similar library to pandas called dexplo that will allow you to specify a window size in either direction for all rolling operations. For instance, you could want a window that was from 3 to the left to 5 to the right, or from 5 to the right to 10 to the right. It's about 1 month from official release. |
@tdpetrou was there an official release? |
Not yet. Should happen middle of 2019.
…On Sun, Jan 20, 2019 at 5:14 AM ron819 ***@***.***> wrote:
@tdpetrou <https://github.com/tdpetrou> was there an official release?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18129 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AG5FfxHLpJYFkg6muh0hltzvqSYFA8V2ks5vFEGDgaJpZM4QSpSo>
.
|
This issue is based on data from this Stack Overflow post.
First, I get all the values for each day into a list using
resample
. I then try and apply a five-day rolling function and the original DataFrame is returned. No calculation happens.Problem description
To give more context, I wanted to find the five-day rolling average, but the rolling method does not include all the rows for the current date if there are multiple rows with the same date. See this output:
The first row for 10-25-2012 has a count of 1 and the second has a count of 2. You can see this pattern continue for all rows that have the same date. Because of this, I decided to group all the values of the same day into a list with
resample
and then userolling
on that frame to get the desired result. Strangely, the original DataFrame is being returned when there are lists as values.Also, I think
rolling
has lots of room for improvement. I think it would be great to have the following:groupby
andresample
- have same methods (there is no size method) and have it pass in pandas objects to the agg/apply methods. Currently, it passes in numpy arrays.I believe SAS has the capability to do the custom window size.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.21.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: