Skip to content

Week Start/End Resamping: WS/WE frequencies #7049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
quasiben opened this issue May 5, 2014 · 20 comments
Open

Week Start/End Resamping: WS/WE frequencies #7049

quasiben opened this issue May 5, 2014 · 20 comments
Labels
Enhancement Frequency DateOffsets Resample resample method

Comments

@quasiben
Copy link

quasiben commented May 5, 2014

There are a number of offset aliases listed here:
http://pandas.pydata.org/pandas-docs/dev/timeseries.html#offset-aliases

I'm currently trying to resample a timeseries where the buckets are return with the beginning and/or end of the week. There are a number of Start Month/End Month frequency aliases. Am I missing something or should we include Week End/Week Start resampling aliases as well?

@jreback
Copy link
Contributor

jreback commented May 5, 2014

look at the next section
u can specify w-mon for example

@quasiben
Copy link
Author

quasiben commented May 5, 2014

Yes, the anchored offsets are nice. But these will change the values:

In [1]: import pandas as pd

In [2]: import datetime as dt

In [3]: daily_daterange = pd.date_range(start=dt.datetime(2013, 1, 31),
   ...:                                 end=dt.datetime(2013,2,28),
   ...:                                 freq='B')

In [4]: daily_dataset = pd.DataFrame({'value' : range(len(daily_daterange))}, index=daily_daterange)

In [5]: daily_dataset.resample('W-MON')
Out[5]: 
            value
2013-02-04      1
2013-02-11      5
2013-02-18     10
2013-02-25     15
2013-03-04     19

In [6]: daily_dataset.resample('W-SUN')
Out[6]: 
            value
2013-02-03    0.5
2013-02-10    4.0
2013-02-17    9.0
2013-02-24   14.0
2013-03-03   18.5

I want the values to stay the same and the buckets returned to be the beginning of the week.

@jreback
Copy link
Contributor

jreback commented May 5, 2014

ok

so give an example then of what u want

@quasiben
Copy link
Author

quasiben commented May 6, 2014

Thu  2013-01-31  val_1
Fri  2013-02-01  val_2
Sat  2013-02-02  val_3
Sun  2013-02-03  val_4
In [4]: daily_dataset.resample('W-THU')
Out[4]: 
            value
2013-01-31      0
2013-02-07      3
2013-02-14      8
2013-02-21     13
2013-02-28     18

In [5]: daily_dataset.resample('W-MON')
Out[5]: 
            value
2013-02-04      1
2013-02-11      5
2013-02-18     10
2013-02-25     15
2013-03-04     19

In [6]: daily_dataset.resample('W-SUN')
Out[6]: 
            value
2013-02-03    0.5
2013-02-10    4.0
2013-02-17    9.0
2013-02-24   14.0
2013-03-03   18.5

I want to resample weekly but the bucket returned should be the weeks beginning:

daily_dataset.resample('WBEGIN') 
            value
2013-02-03    0
2013-02-10    3
2013-02-17    8
2013-02-24   13
2013-03-03   18

Essentially, adding a timedelta to the index.

@jreback
Copy link
Contributor

jreback commented May 6, 2014

how is this diffferent from W-SUN ? (I mean your values are), but conceptually

@quasiben
Copy link
Author

quasiben commented May 6, 2014

Conceptually, nothing is different but the resampling of the values is significant. I believe month end and month start frequency don't change the values but it does return the a bucket at the beginning of the month or the end of month

You could do something like:

daily_dataset.resample('7D')

Then fix the index with an appropriate offset. Maybe I should submit a PR and not waste too much more time :)

@jreback
Copy link
Contributor

jreback commented May 16, 2014

why wouldn't you reindex to a daily freq, then simply divide by 7 starting on the day you want?

@jreback jreback added this to the Someday milestone Mar 8, 2015
@aiguofer
Copy link
Contributor

aiguofer commented Jan 27, 2017

+1 on this. I keep running into issues, and none of the solutions are nice.... a simple WS would make my life easier.

Take this for example. Feb in 2016 started on a MON, and the last week of march 31 was on THURS.

I want to group by week (starting on MON and ending on SUN), count the number of days in the week, but use the first day of the week as the label. That means that for the month of Feb in 2016, my expected output is:

2016-02-01    7
2016-02-08    7
2016-02-15    7
2016-02-22    7
2016-02-29    1

I can do this a few different ways, but they all seem cumbersome and not very intuitive.

import pandas as pd
df = pd.DataFrame(pd.date_range('2016-02-01', '2016-02-29').rename('dt'))
df['val'] = 1

# option 1
a = df.groupby(pd.Grouper(key='dt',freq='W-MON', closed='left', label='left'))['val'].sum()

# option 2
a = df.groupby(pd.Grouper(key='dt',freq='W'))['val'].sum()
a.index = a.index + pd.DateOffset(weekday=0, days=-7)

# option 3
a = df.groupby(pd.Grouper(key='dt',freq='W-MON', closed='left'))['val'].sum()
a.index = a.index + pd.DateOffset(days=-7)

# option 4
a = df.groupby(pd.Grouper(key='dt',freq='W', label='left'))['val'].sum()
a.index = a.index + pd.DateOffset(days=1)

# ideal solution
a = df.groupby(pd.Grouper(key='dt',freq='WS'))['val'].sum()

@jreback jreback modified the milestones: Next Major Release, Someday Jan 27, 2017
@jreback
Copy link
Contributor

jreback commented Jan 31, 2017

FYI you can do this with .resample directly (looking at your option 1). I will comment more following

In [30]: df.groupby(pd.Grouper(key='dt',freq='W-MON', closed='left', label='left'))['val'].sum()
    ...: 
Out[30]: 
dt
2016-02-01    7
2016-02-08    7
2016-02-15    7
2016-02-22    7
2016-02-29    1
Freq: W-MON, Name: val, dtype: int64

In [32]: df.resample('W-MON', on='dt', closed='left', label='left').val.sum()
Out[32]: 
dt
2016-02-01    7
2016-02-08    7
2016-02-15    7
2016-02-22    7
2016-02-29    1
Freq: W-MON, Name: val, dtype: int64

@jreback
Copy link
Contributor

jreback commented Jan 31, 2017

why is this not the solution you want?

In [65]: df.resample('7D', on='dt').val.sum()
Out[65]: 
dt
2016-02-01    7
2016-02-08    7
2016-02-15    7
2016-02-22    7
2016-02-29    1
Freq: 7D, Name: val, dtype: int64

@aiguofer
Copy link
Contributor

Thanks! I actually used groupby because in my real example I'm also adding other groupby cols, I just simplified for posting here.

The issue with the 7D approach is that I would have to make sure the first date available is on a Monday.

For example:

df = pd.DataFrame(pd.date_range('2016-01-29', '2016-02-29').rename('dt'))
df['val'] = 1
df.resample('7D', on='dt').val.sum()

dt
2016-01-29    7
2016-02-05    7
2016-02-12    7
2016-02-19    7
2016-02-26    4
Freq: 7D, Name: val, dtype: int64

I want a solution that will always group by weeks starting on Monday and using that Monday as the label, regardless of what dates are available (in my real data, the dates come from elsewhere).

@jreback
Copy link
Contributor

jreback commented Jan 31, 2017

@aiguofer

why is this not reasonable? (or the .resample version)

# option 1
a = df.groupby(pd.Grouper(key='dt',freq='W-MON', closed='left', label='left'))['val'].sum()

you are esentially saying that the defaults for anchored are not what you want.

@aiguofer
Copy link
Contributor

aiguofer commented Jan 31, 2017

In my mind, that's essentialy saying, "end the week on Monday, but close the inclusion on the left (so that the monday is included on the 'following' week), and use the previous week's label"

Additionally, since there is a MS and QS, it seems odd to have to do it differently with weeks.

In my app, I have a dropdown choice for how to group and I have to add a check for W to then add the other parameters, it'd be much simpler to do WS.

@jreback
Copy link
Contributor

jreback commented Jan 31, 2017

ahh, I c, you are angling for a WS and WE frequencies. IOW, like W-SUN, but have a tiny bit of special handling on inclusions in resampling.

@jreback
Copy link
Contributor

jreback commented Jan 31, 2017

this would not be very hard to implement actually. Its much more work to properly test this :>

ok if you would like to put up a PR for WS/WE (I actually think these should just work in resampling directly). that would be great!

@aiguofer
Copy link
Contributor

aiguofer commented Jan 31, 2017

Exactly. And I think it could even work with a specific day anchoring.

So for example, if someone wanted to bucket on weeks starting on SUN using the first day as label, they could do WS-SUN. If they wanted the end then they'd do W-SAT.

@jreback jreback changed the title Week Start/End Resamping Week Start/End Resamping: WS/WE frequencies Jan 31, 2017
aiguofer pushed a commit to aiguofer/pandas that referenced this issue Feb 17, 2017
@vipasha-cashify
Copy link

Thanks...it helped a lot :)

@jgrajos
Copy link

jgrajos commented Jan 22, 2021

Hi, Thanks for these comments. @aiguofer @jreback
Where i can find this changes? I would like add this feature to my code.

@aiguofer
Copy link
Contributor

Hey, unfortunately I never had time to finish this, and I believe a lot has changed since then. You can see the original PR here: #15449

@jgrajos
Copy link

jgrajos commented Feb 7, 2021

Thanks for your response.
I have used this lines for the purpose:

data = data.groupby(pd.Grouper(freq='W-SUN')).last()
new_index = data.index - pd.DateOffset(days=6)
data.index = new_index

It is works for me!

@mroeschke mroeschke removed API Design Datetime Datetime data dtype labels Apr 11, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Frequency DateOffsets Resample resample method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants