-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Rolling sum (closed='left') with duplicate Timestamp indices. #20712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not sure I agree with your expected output - is there a generalizable use case you are trying to cover here? |
Let me be more explicit in my logic. I have events (indexed by time) and each event has a duration. For every event I'm interested in calculating the sum of the durations of all preceding events over a four hour window. I am also interested in excluding the duration of the present event from that running sum. Thus, I use the rolling method with closed='left'. Some events occur simultaneously, but they are not identical as they have different durations. Yet, they share the same prior events, and thus should have the same total duration of all preceding events. However, this is not the result that I get when I apply rolling. I get that simultaneously occurring events are treated as consecutive. If I switched their order I would get a different result, which is absurd. The total duration of all events preceding two simultaneous events should be the same, and the calculation should reflect that. |
I have the same issue here. The 'closed' parameter is not really working when the value/date is duplicated. Input: rolling(window='90d', on='date', closed='neither') Output: Expected output: |
I am having the same problem, have anyone discussed it already? |
Same issue here. Multiple entries on the same day, and would like to calculate the sum of the last 30 days, EXCLUDING the entries on that day (having an open right parameter here). If I have two entries on 2019-05-10, I would then expect their 30 day rolling sums to be equal, but right now the second entry on 2019-05-10's 30d sum includes the previous entry on 2019-05-10. |
Same issue here. Would really appreciate to find a workaround/solution |
This does look like a bug. Let us know if you're interested in
investigating.
…On Fri, May 17, 2019 at 2:27 AM ramondalmau ***@***.***> wrote:
Same issue here. Would really appreciate to find a workaround/solution
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#20712?email_source=notifications&email_token=AAKAOIQLLR6VFKFV2UAPZUDPVZM5TA5CNFSM4E2Z63M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVT7GNA#issuecomment-493351732>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIQJG7PGS6BFRDWXKETPVZM5TANCNFSM4E2Z63MQ>
.
|
Same issue as explained in this comment. I think the original behavior also has some use scenarios so it would be better to add a new keyword argument to support this expected behaviour. |
I'm also facing this problem. In my case I want to compute the number of unique values in each window. The workaround proposed by OP doesn't work because I have to use the unique function in the groupby, which returns lists. I can create a lambda that counts the number of unique values in the lists, but it happens that rolling().apply() doesn't support object columns. So, currently there's no way of doing that without using two for loop O(n^2). |
I would also like the feature whereas setting |
Code Sample, a copy-pastable example if possible
2011-01-01 00:00:00 0
2011-01-01 00:00:00 1
2011-01-01 01:00:00 2
2011-01-01 01:00:00 3
2011-01-01 02:00:00 4
2011-01-01 02:00:00 5
2011-01-01 03:00:00 6
2011-01-01 03:00:00 7
2011-01-01 04:00:00 8
2011-01-01 04:00:00 9
Unexpected behavior: rolling treats duplicate times as consecutive
Problem description
Rolling treats duplicate times as consecutive rather than equal.
Expected Output
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.1
pytest: 3.4.0
pip: 9.0.3
setuptools: 36.6.0
Cython: None
numpy: 1.13.3
scipy: 0.19.0
xarray: 0.9.5
IPython: 5.5.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: 0.1.0
pandas_gbq: None
pandas_datareader: None
import sys; print('Python %s on %s' % (sys.version, sys.platform))
The text was updated successfully, but these errors were encountered: