Skip to content

BUG: Is offsets.YearBegin working as expected? #52105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
mirkosavasta opened this issue Mar 21, 2023 · 7 comments · Fixed by #52352
Closed
3 tasks done

BUG: Is offsets.YearBegin working as expected? #52105

mirkosavasta opened this issue Mar 21, 2023 · 7 comments · Fixed by #52352
Assignees
Labels
Bug Frequency DateOffsets

Comments

@mirkosavasta
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

pd.to_datetime("2017-09-18") - pd.offsets.YearBegin()
#> Timestamp('2017-01-01 00:00:00') - My expectation was correct as I expected exactly the output of the function

pd.to_datetime("2018-01-01") - pd.offsets.YearBegin()
#> Timestamp('2017-01-01 00:00:00') -  My expectation was wrong as I expected Timestamp('2018-01-01 00:00:00')

Issue Description

Hello,

The documentation for offsets.YearBegin states "DateOffset increments between calendar year begin dates."
I do not understand if my expectations about the behaviour of offsets.YearBegin are correct or not. Please see the code below:

pd.to_datetime("2017-09-18") - pd.offsets.YearBegin()
#> Timestamp('2017-01-01 00:00:00') - My expectation was correct as I expected exactly the output of the function

pd.to_datetime("2018-01-01") - pd.offsets.YearBegin()
#> Timestamp('2017-01-01 00:00:00') -  My expectation was wrong as I expected Timestamp('2018-01-01 00:00:00')

Why if I subtract offsets.YearBegin() from the first day of the year I get the first day of the previous year? Is this expected?
If so, I could improve the documentation. If not, I guess this could be a bug.

Thanks in advance.

Expected Behavior

pd.to_datetime("2018-01-01") - pd.offsets.YearBegin()
#> Timestamp('2018-01-01 00:00:00')

Installed Versions

pandas 1.4.4
Python 3.9.16

@mirkosavasta mirkosavasta added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 21, 2023
@MarcoGorelli
Copy link
Member

MarcoGorelli commented Mar 21, 2023

Thanks for the report,

I'd have expected - pd.offsets.YearBegin(n=0) to go to the beginning of the current year (just like how + pd.offsets.YearBegin(n=0)) goes to the end of the current year

+ pd.offsets.YearBegin() - pd.offsets.YearBegin() seems to work for what you're trying to do:

In [74]: to_datetime(["2017-09-18", "2018-01-01"]) + pd.offsets.YearBegin() - pd.offsets.YearBegin()
Out[74]: DatetimeIndex(['2017-01-01', '2018-01-01'], dtype='datetime64[ns]', freq=None)

Is this expected?

Not sure, I'll take a look this week


ps. Ciao Mirko!

@jbrockmendel
Copy link
Member

instead of addition/subtraction you want rollforward/rollback

@MarcoGorelli
Copy link
Member

sure, but that can't be vectorised, can it? I can do

to_datetime(["2017-09-18", "2018-01-01"]) + pd.offsets.YearBegin()

but

pd.offsets.YearBegin().rollbackward(to_datetime(["2017-09-18", "2018-01-01"]))

doesn't work, I'd need to do it element-by-element

@jbrockmendel
Copy link
Member

Yah we do need rollfoward/rollbackward for arrays, xref #7449. Also it would help for some of the recent issues with users trying to do dt64.astype("M8[Y]") expecting that to round down to the nearest YearBegin.

@MarcoGorelli
Copy link
Member

Right, the docs for DateOffset say

Zero presents a problem. Should it roll forward or back? We arbitrarily have it rollforward:
date + BDay(0) == BDay.rollforward(date)
Since 0 is a bit weird, we suggest avoiding its use.

So, we probably shouldn't recommend it in the docs.

From #7449, it seems the simplest solution is

In [28]: to_datetime(["2017-09-18", "2018-01-01"]).to_period('Y').to_timestamp()
Out[28]: DatetimeIndex(['2017-01-01', '2018-01-01'], dtype='datetime64[ns]', freq=None)

If so, I could improve the documentation.

I think it'd be really good to include this example in the YearBegin docs (at least, until rollback and rollforward are available for arrays)

@jbrockmendel jbrockmendel mentioned this issue Mar 25, 2023
5 tasks
@phofl phofl added Frequency DateOffsets and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 26, 2023
@MarcoGorelli
Copy link
Member

Re - the docs, it might be good to mirror the MonthBegin ones, which mention rollback https://pandas.pydata.org/docs/dev/reference/api/pandas.tseries.offsets.MonthBegin.html

@natmokval
Copy link
Contributor

natmokval commented Mar 27, 2023

Hi @mirkosavasta, do you want to do the PR for this issue? If you don't want to, I'll add examples toYearBegin docs. Could you let me know please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Frequency DateOffsets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants