Skip to content

Business day resampling #11123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
0x0L opened this issue Sep 16, 2015 · 16 comments · Fixed by #26754
Closed

Business day resampling #11123

0x0L opened this issue Sep 16, 2015 · 16 comments · Fixed by #26754
Labels
Enhancement Frequency DateOffsets Resample resample method

Comments

@0x0L
Copy link
Contributor

0x0L commented Sep 16, 2015

Hi everyone,

I just stumbled across this odd behaviour:

For a daily time series, I would expect resample('B', how='last') to be equivalent to selecting rows whose weekday is not Saturday or Sunday. However, resample groups together Fridays, Saturdays and Sundays and hence we get Sunday's value on Friday.

In the time series context, I guess most people would expect resample('B', ...) to bin Saturdays, Sundays and Mondays, so that no looking into the future occurs.

Am I missing something?

@jreback
Copy link
Contributor

jreback commented Sep 16, 2015

pls show a specific example

@0x0L
Copy link
Contributor Author

0x0L commented Sep 16, 2015

In [2]: pd.__version__
Out[2]: '0.16.2'

In [3]: s = pd.Series(list(range(10)), pd.date_range('2015-09-01', '2015-09-10'))

In [4]: s
Out[4]: 
2015-09-01    0
2015-09-02    1
2015-09-03    2
2015-09-04    3  <--- Friday
2015-09-05    4
2015-09-06    5
2015-09-07    6  <--- Monday
2015-09-08    7
2015-09-09    8
2015-09-10    9
Freq: D, dtype: int64

In [5]: s.resample('B', how='last')
Out[5]: 
2015-09-01    0
2015-09-02    1
2015-09-03    2
2015-09-04    5  <--- expected 3
2015-09-07    6
2015-09-08    7
2015-09-09    8
2015-09-10    9
Freq: B, dtype: int64

@jreback
Copy link
Contributor

jreback commented Sep 16, 2015

see also #11128

In [34]: import calendar
In [35]: weekdays = [calendar.day_name[i].lower() for i in range(7)]
In [36]: s = pd.Series(list(range(10)), pd.date_range('2015-09-01', '2015-09-10'))
In [37]: s2 = s.to_frame('value').assign(weekday=Series(s.index.weekday).map(Series(weekdays)).values)

In [38]: s2
Out[38]: 
            value    weekday
2015-09-01      0    tuesday
2015-09-02      1  wednesday
2015-09-03      2   thursday
2015-09-04      3     friday
2015-09-05      4   saturday
2015-09-06      5     sunday
2015-09-07      6     monday
2015-09-08      7    tuesday
2015-09-09      8  wednesday
2015-09-10      9   thursday

This is equivalent to .resample('B')

In [39]: s2.groupby(pd.Grouper(freq='B')).last()
Out[39]: 
            value    weekday
2015-09-01      0    tuesday
2015-09-02      1  wednesday
2015-09-03      2   thursday
2015-09-04      5     sunday
2015-09-07      6     monday
2015-09-08      7    tuesday
2015-09-09      8  wednesday
2015-09-10      9   thursday

You can do this

In [40]: def f(x):
    # x is a Timestamp
    if x.weekday() < 5:
        return x
    return x + pd.offsets.BusinessDay(0)
   ....: 

In [41]: s2.groupby(f).last()
Out[41]: 
            value    weekday
2015-09-01      0    tuesday
2015-09-02      1  wednesday
2015-09-03      2   thursday
2015-09-04      3     friday
2015-09-07      6     monday
2015-09-08      7    tuesday
2015-09-09      8  wednesday
2015-09-10      9   thursday

@jreback
Copy link
Contributor

jreback commented Sep 16, 2015

I think you should need to create another business day offset to actually do this with a freq. E.g. you could sub-class BusinessDay and create 2 new offsets, e.g. BF and BM (or somesuch), where BF would be the same as BusinessDay. So i'll mark this as an enhancement.

@jreback jreback added Frequency DateOffsets Resample resample method labels Sep 16, 2015
@jreback jreback added this to the Next Major Release milestone Sep 16, 2015
@0x0L
Copy link
Contributor Author

0x0L commented Sep 17, 2015

I didn't dig into the code yet, but maybe we can use the keyword arg convention ?

On a more general level, I'm not sure the present behaviour should be the default one. It really doesn't feel right to have a peek into the future in the time series context.

@jreback
Copy link
Contributor

jreback commented Sep 17, 2015

you misunderstand, you would need to create a new frequency. This is a long time convention. You could add to, but not change this.

@0x0L
Copy link
Contributor Author

0x0L commented Oct 5, 2015

However,

In [35]: s2.to_period('B')
Out[35]: 
            value    weekday
2015-09-01      0    tuesday
2015-09-02      1  wednesday
2015-09-03      2   thursday
2015-09-04      3     friday
2015-09-07      4   saturday
2015-09-07      5     sunday
2015-09-07      6     monday
2015-09-08      7    tuesday
2015-09-09      8  wednesday
2015-09-10      9   thursday

groups together Saturdays, Sundays and Mondays which is the sane default behaviour IMO.

Don't you agree this is a bit inconsistent ?

@jreback
Copy link
Contributor

jreback commented Oct 5, 2015

@0x0L you are talking about 2 different things here. periods are timespans which encompass a start-end. timestamps are points-in-time. conversions between one and another must use a convention (e.g. start/end).

However, resample is a point-in-time operation (at least on a DatetimeIndex). So the issue is what convention to use. The existing convention is to include fri-sat-sun as indicated.

doing to_period just marks the intervals, and is NOT a resample

@0x0L
Copy link
Contributor Author

0x0L commented Oct 5, 2015

@jreback I understand. That's what I was trying to say when I earlier mentioned we could use the convention keyword arg. Right now,

In [10]: x = s2.resample('B', how='last', convention='start')

In [11]: y = s2.resample('B', how='last', convention='end')

are the same. Why not use it to make resample group sat-sun-mon ?

@jreback
Copy link
Contributor

jreback commented Oct 5, 2015

convention is implied by the freq for a time-stamp, e.g B -> implies include fri-sat-sunday. So its a part of the frequency, not a keyword. A keyword is needed however for Period because that is a Timespan and it is ambiguous which frequency to go to.

@0x0L
Copy link
Contributor Author

0x0L commented Oct 5, 2015

Ok. So I guess we really do need a new frequency. I'll get to it ASAP, thanks for the clarification

@andyljones
Copy link

andyljones commented Feb 20, 2017

@0x0L Did a new frequency ever get added that fixes this?

@0x0L
Copy link
Contributor Author

0x0L commented Feb 20, 2017

@andyljones No sorry, I never found the time to do. I defaulted to using groupby

@rambo-yuanbo
Copy link

hi @jreback , that behavior of resmple with freq='B' is exactly what i need (adding non-business day 's value to the previous business day).
HOWEVER, business days might be quite different from place to place. Is there any way to pass a custom list of business day into the resample() function? Alghouth I find the freq='C' for custom business day in the possible offset alieases list here (http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases), how do i psss in the actual business day list?
right now, i can only iterate over the serie and do it myself.

@0x0L
Copy link
Contributor Author

0x0L commented Jun 9, 2019

Actually s.resample('B', label='right', closed='right') does exactly what we wanted all along :)

@jreback the default behaviour might be a source of a confusion for some users (me included). Should a note be added somewhere in the Resampling section ?

@0x0L 0x0L closed this as completed Jun 9, 2019
@jreback
Copy link
Contributor

jreback commented Jun 9, 2019

you can certainly add an example / note to the doc string if u think it would be helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Frequency DateOffsets Resample resample method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants