-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Week Start/End Resamping: WS/WE frequencies #7049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
look at the next section |
Yes, the anchored offsets are nice. But these will change the values: In [1]: import pandas as pd
In [2]: import datetime as dt
In [3]: daily_daterange = pd.date_range(start=dt.datetime(2013, 1, 31),
...: end=dt.datetime(2013,2,28),
...: freq='B')
In [4]: daily_dataset = pd.DataFrame({'value' : range(len(daily_daterange))}, index=daily_daterange)
In [5]: daily_dataset.resample('W-MON')
Out[5]:
value
2013-02-04 1
2013-02-11 5
2013-02-18 10
2013-02-25 15
2013-03-04 19
In [6]: daily_dataset.resample('W-SUN')
Out[6]:
value
2013-02-03 0.5
2013-02-10 4.0
2013-02-17 9.0
2013-02-24 14.0
2013-03-03 18.5 I want the values to stay the same and the buckets returned to be the beginning of the week. |
ok so give an example then of what u want |
In [4]: daily_dataset.resample('W-THU')
Out[4]:
value
2013-01-31 0
2013-02-07 3
2013-02-14 8
2013-02-21 13
2013-02-28 18
In [5]: daily_dataset.resample('W-MON')
Out[5]:
value
2013-02-04 1
2013-02-11 5
2013-02-18 10
2013-02-25 15
2013-03-04 19
In [6]: daily_dataset.resample('W-SUN')
Out[6]:
value
2013-02-03 0.5
2013-02-10 4.0
2013-02-17 9.0
2013-02-24 14.0
2013-03-03 18.5 I want to resample weekly but the bucket returned should be the weeks beginning: daily_dataset.resample('WBEGIN')
value
2013-02-03 0
2013-02-10 3
2013-02-17 8
2013-02-24 13
2013-03-03 18 Essentially, adding a timedelta to the index. |
how is this diffferent from |
Conceptually, nothing is different but the resampling of the values is significant. I believe month end and month start frequency don't change the values but it does return the a bucket at the beginning of the month or the end of month You could do something like: daily_dataset.resample('7D') Then fix the index with an appropriate offset. Maybe I should submit a PR and not waste too much more time :) |
why wouldn't you reindex to a daily freq, then simply divide by 7 starting on the day you want? |
+1 on this. I keep running into issues, and none of the solutions are nice.... a simple Take this for example. Feb in 2016 started on a MON, and the last week of march 31 was on THURS. I want to group by week (starting on MON and ending on SUN), count the number of days in the week, but use the first day of the week as the label. That means that for the month of Feb in 2016, my expected output is:
I can do this a few different ways, but they all seem cumbersome and not very intuitive. import pandas as pd
df = pd.DataFrame(pd.date_range('2016-02-01', '2016-02-29').rename('dt'))
df['val'] = 1
# option 1
a = df.groupby(pd.Grouper(key='dt',freq='W-MON', closed='left', label='left'))['val'].sum()
# option 2
a = df.groupby(pd.Grouper(key='dt',freq='W'))['val'].sum()
a.index = a.index + pd.DateOffset(weekday=0, days=-7)
# option 3
a = df.groupby(pd.Grouper(key='dt',freq='W-MON', closed='left'))['val'].sum()
a.index = a.index + pd.DateOffset(days=-7)
# option 4
a = df.groupby(pd.Grouper(key='dt',freq='W', label='left'))['val'].sum()
a.index = a.index + pd.DateOffset(days=1)
# ideal solution
a = df.groupby(pd.Grouper(key='dt',freq='WS'))['val'].sum() |
FYI you can do this with
|
why is this not the solution you want?
|
Thanks! I actually used The issue with the For example: df = pd.DataFrame(pd.date_range('2016-01-29', '2016-02-29').rename('dt'))
df['val'] = 1
df.resample('7D', on='dt').val.sum()
dt
2016-01-29 7
2016-02-05 7
2016-02-12 7
2016-02-19 7
2016-02-26 4
Freq: 7D, Name: val, dtype: int64 I want a solution that will always group by weeks starting on Monday and using that Monday as the label, regardless of what dates are available (in my real data, the dates come from elsewhere). |
why is this not reasonable? (or the
you are esentially saying that the defaults for anchored are not what you want. |
In my mind, that's essentialy saying, "end the week on Monday, but close the inclusion on the left (so that the monday is included on the 'following' week), and use the previous week's label" Additionally, since there is a In my app, I have a dropdown choice for how to group and I have to add a check for |
ahh, I c, you are angling for a |
this would not be very hard to implement actually. Its much more work to properly test this :> ok if you would like to put up a PR for WS/WE (I actually think these should just work in resampling directly). that would be great! |
Exactly. And I think it could even work with a specific day anchoring. So for example, if someone wanted to bucket on weeks starting on SUN using the first day as label, they could do |
Thanks...it helped a lot :) |
Hey, unfortunately I never had time to finish this, and I believe a lot has changed since then. You can see the original PR here: #15449 |
Thanks for your response.
It is works for me! |
There are a number of offset aliases listed here:
http://pandas.pydata.org/pandas-docs/dev/timeseries.html#offset-aliases
I'm currently trying to resample a timeseries where the buckets are return with the beginning and/or end of the week. There are a number of Start Month/End Month frequency aliases. Am I missing something or should we include Week End/Week Start resampling aliases as well?
The text was updated successfully, but these errors were encountered: