-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Timestamp origin takes no effect in resample for 'MS' frequency #53662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looks interesting; will pick this one up. |
take |
After digging in, I've got a few questions, @MarcoGorelli. Your comment regarding the window only makes sense to me if you're referring to It does appear that there is an issue where See these results: df = pd.DataFrame({
'ts': [dt.datetime(1999, 12, 31, 13, 0,0), dt.datetime(1999, 12, 31, 14,0,0)],
'values': [10., 20.]
})
print(df.resample('2MS', on='ts', origin=pd.Timestamp('1999-11-01'))['values'].sum())
>> ts
>> 1999-12-01 30.0
>> Freq: 2MS, Name: values, dtype: float64 Similarly: df = pd.DataFrame({
'ts': [dt.datetime(1999, 12, 31, 13, 0,0), dt.datetime(1999, 12, 31, 14,0,0)],
'values': [10., 20.]
})
print(df.resample('2MS', on='ts')['values'].sum())
>> ts
>> 1999-12-01 30.0
>> Freq: 2MS, Name: values, dtype: float64 meanwhile running: df = pd.DataFrame({
'ts': [dt.datetime(1999, 12, 31, 13, 0,0), dt.datetime(1999, 12, 31, 14,0,0)],
'values': [10., 20.]
})
print(df.resample('2S', on='ts', origin=pd.Timestamp('1999-11-01'))['values'].sum())
>> ts
>> 1999-12-31 13:00:00 10.0
>> 1999-12-31 13:00:02 0.0
>> 1999-12-31 13:00:04 0.0
>> 1999-12-31 13:00:06 0.0
>> 1999-12-31 13:00:08 0.0
>> ...
>> 1999-12-31 13:59:52 0.0
>> 1999-12-31 13:59:54 0.0
>> 1999-12-31 13:59:56 0.0
>> 1999-12-31 13:59:58 0.0
>> 1999-12-31 14:00:00 20.0
>> Freq: 2S, Name: values, Length: 1801, dtype: float64 I'm going to focus on why |
thanks for looking into this
MS is "month start", https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects And M isn't "month", but "month end". If you think that's confusing - I agree! And there's a big effort going on to change that #52064 |
Haha Got it; Your expected output makes a lot more sense, thanks for the context. |
Noticed while reviewing: here's another test case which needs to pass: If we have In [39]: df = pd.DataFrame({'ts': [datetime(1999, 12, 31, 20)], 'values': [10.]})
In [40]: df
Out[40]:
ts values
0 1999-12-31 20:00:00 10.0 and do df.resample('3YS', on='ts', closed='left', label='left', origin=datetime(1995, 1, 1))['values'].sum() then I'd expect the windows to be:
But instead, I see: In [42]: df.resample('3YS', on='ts', closed='left', label='left', origin=datetime(1995, 1, 1))['values'].sum()
Out[42]:
ts
1999-01-01 10.0
Freq: 3AS-JAN, Name: values, dtype: float64 |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The code above produces
I'm confused though - if
origin
was set to1999-11-01
, then shouldn't the window be[1999-11-01, 2000-01-01)
?Expected Behavior
ts
1999-11-01 10.0
Freq: 2MS, Name: values, dtype: float64
The text was updated successfully, but these errors were encountered: