-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DateTimeIndex.is_year_start unexpected behavior when constructed with freq 'MS' date_range (#57377) #57494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
wait sorry, I think this is my fault for not having got round to reviewing it reopening for now, hoping to get to it at some point before 3.0 @natmokval fancy taking a look too? |
Thank you for working on this @mattheeter. It looks good to me. I think the suggested changes will fix the bug in
These changes won't fix a problem with some other frequencies such as:
It may not be necessary to correct this in the current pull request. |
Of course! @natmokval Yes, the original issue did not have the QS freq mentioned. I did point it out on the issue, and went ahead with what I thought would fix it there. I think that when I tested with other types of QS (e.g. November) I assumed that these should set the year start attribute to that of the quarter start, making me miss the fact that it was incorrect. Should I continue to work on the QS bug, or just remove any changes that attempted to fix that one since the issue didn't reference it? |
I think it's okay to leave the fix for This PR is still |
Hi @natmokval, I was wondering if you knew anything about the Numpy Dev tests that are failing? I know when I submitted this PR a while ago there were no issues with unit tests, but I just got some time to continue working on this but haven't made any code changes since. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your PR - I think a more general fix is needed here, rather than papering over already-broken logic
EDIT: I've made a separate issue for the other problem: #58523
pandas/_libs/tslibs/fields.pyx
Outdated
if (freqstr[0:2] in ["MS", "QS", "YS"]) or ( | ||
freqstr[1:3] in ["MS", "QS", "YS"]): | ||
# According to the above comment, the first conditional is for QS and YS only | ||
if (freqstr[0:2] in ["QS", "YS"]) or ( | ||
freqstr[1:3] in ["QS", "YS"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic is already broken on main
- it seems to assume that n
is only a single digit
and so it produces this kind of nonsense:
In [39]: dr = pd.date_range("2017-01-01", periods=2, freq="7YS")
In [40]: dr
Out[40]: DatetimeIndex(['2017-01-01', '2024-01-01'], dtype='datetime64[ns]', freq='7YS-JAN')
In [41]: dr.is_year_start
Out[41]: array([ True, True])
In [42]: dr = pd.date_range("2017-01-01", periods=2, freq="10YS")
In [43]: dr
Out[43]: DatetimeIndex(['2017-01-01', '2027-01-01'], dtype='datetime64[ns]', freq='10YS-JAN')
In [44]: dr.is_year_start
Out[44]: array([False, False])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, looks like this broken logic was introduced 13 years ago... 😭
https://github.com/pandas-dev/pandas/pull/4823/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mattheeter for your PR, thanks for your patience here, and sorry that this took a while
Looks good to me. I've added a hypothesis test too to make sure
closes BUG:
DatetimeIndex.is_year_start
returns incorrect array if.freq == 'MS'
#49606Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.A DateTimeIndex created via date_range with the freq parameter as 'MS' would yield a wrong is_year_start (and later it was discovered is_quarter_start) accessor. In
pandas/_libs/tslibs/fields.pyx
, the way in which the start and end months were set was one way for freq 'MS', 'QS', and 'YS' and another way for other offsets. However, only the latter two had the necessary properties to set start and end this way. As such, removing 'MS' from the list fixed the issue.