-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Business day resampling #11123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pls show a specific example |
In [2]: pd.__version__
Out[2]: '0.16.2'
In [3]: s = pd.Series(list(range(10)), pd.date_range('2015-09-01', '2015-09-10'))
In [4]: s
Out[4]:
2015-09-01 0
2015-09-02 1
2015-09-03 2
2015-09-04 3 <--- Friday
2015-09-05 4
2015-09-06 5
2015-09-07 6 <--- Monday
2015-09-08 7
2015-09-09 8
2015-09-10 9
Freq: D, dtype: int64
In [5]: s.resample('B', how='last')
Out[5]:
2015-09-01 0
2015-09-02 1
2015-09-03 2
2015-09-04 5 <--- expected 3
2015-09-07 6
2015-09-08 7
2015-09-09 8
2015-09-10 9
Freq: B, dtype: int64 |
see also #11128
This is equivalent to
You can do this
|
I think you should need to create another business day offset to actually do this with a freq. E.g. you could sub-class |
I didn't dig into the code yet, but maybe we can use the keyword arg On a more general level, I'm not sure the present behaviour should be the default one. It really doesn't feel right to have a peek into the future in the time series context. |
you misunderstand, you would need to create a new frequency. This is a long time convention. You could add to, but not change this. |
However, In [35]: s2.to_period('B')
Out[35]:
value weekday
2015-09-01 0 tuesday
2015-09-02 1 wednesday
2015-09-03 2 thursday
2015-09-04 3 friday
2015-09-07 4 saturday
2015-09-07 5 sunday
2015-09-07 6 monday
2015-09-08 7 tuesday
2015-09-09 8 wednesday
2015-09-10 9 thursday groups together Saturdays, Sundays and Mondays which is the sane default behaviour IMO. Don't you agree this is a bit inconsistent ? |
@0x0L you are talking about 2 different things here. However, resample is a point-in-time operation (at least on a DatetimeIndex). So the issue is what convention to use. The existing convention is to include fri-sat-sun as indicated. doing |
@jreback I understand. That's what I was trying to say when I earlier mentioned we could use the convention keyword arg. Right now, In [10]: x = s2.resample('B', how='last', convention='start')
In [11]: y = s2.resample('B', how='last', convention='end') are the same. Why not use it to make |
convention is implied by the freq for a time-stamp, e.g |
Ok. So I guess we really do need a new frequency. I'll get to it ASAP, thanks for the clarification |
@0x0L Did a new frequency ever get added that fixes this? |
@andyljones No sorry, I never found the time to do. I defaulted to using groupby |
hi @jreback , that behavior of resmple with freq='B' is exactly what i need (adding non-business day 's value to the previous business day). |
Actually @jreback the default behaviour might be a source of a confusion for some users (me included). Should a note be added somewhere in the |
you can certainly add an example / note to the doc string if u think it would be helpful |
Hi everyone,
I just stumbled across this odd behaviour:
For a daily time series, I would expect
resample('B', how='last')
to be equivalent to selecting rows whose weekday is not Saturday or Sunday. However,resample
groups together Fridays, Saturdays and Sundays and hence we get Sunday's value on Friday.In the time series context, I guess most people would expect
resample('B', ...)
to bin Saturdays, Sundays and Mondays, so that no looking into the future occurs.Am I missing something?
The text was updated successfully, but these errors were encountered: