-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DatetimeIndex.is_year_start
and DatetimeIndex.is_quarter_start
always return False on double-digit frequencies
#58549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
natmokval
commented
May 3, 2024
•
edited
Loading
edited
- closes BUG: DatetimeIndex.is_year_start breaks on double-digit frequencies #58523
pandas/_libs/tslibs/fields.pyx
Outdated
@@ -253,8 +254,7 @@ def get_start_end_field( | |||
# month of year. Other offsets use month, startingMonth as ending | |||
# month of year. | |||
|
|||
if (freqstr[0:2] in ["MS", "QS", "YS"]) or ( | |||
freqstr[1:3] in ["MS", "QS", "YS"]): | |||
if re.split("[0-9]*", freqstr, maxsplit=1)[1][0:2] in ["MS", "QS", "YS"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know this is still in draft, but is there any existing function that can be re-used here? it seems like a fairly common thing to get the frequency out of a string, I'm sure there's other places that do it - can something be reused? I don't think it's really feasible to have a regex each time this needs doing
pandas/_libs/tslibs/fields.pyx
Outdated
if (freqstr[0:2] in ["MS", "QS", "YS"]) or ( | ||
freqstr[1:3] in ["MS", "QS", "YS"]): | ||
offset = to_offset(freqstr) | ||
if offset.freqstr.replace(str(offset.n), "")[0:2] in ["MS", "QS", "YS"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about offset.name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for updating, this is getting closer
can we do to_offset(freqstr).name
even further up, and pass that down to this function? as in, find where get_start_end_field
is being called, calculate the frequency name from there, and pass that to the function - this way we avoid repeatedly calling to_offset
, which is a bit expensive
pandas/_libs/tslibs/timestamps.pyx
Outdated
@@ -587,7 +587,8 @@ cdef class _Timestamp(ABCTimestamp): | |||
val = self._maybe_convert_value_to_local() | |||
|
|||
out = get_start_end_field(np.array([val], dtype=np.int64), | |||
field, freqstr, month_kw, self._creso) | |||
field, to_offset(freqstr).name , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's already freq
a few lines above, can we just take it from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I am not sure if I understand correctly. I replaced freqstr = freq.freqstr
with freqstr = to_offset(freq.freqstr).name
a few lines above.
pandas/_libs/tslibs/fields.pyx
Outdated
if to_offset(freqstr).name[0:2] in ["MS", "QS", "YS"]: | ||
if freqstr[0:2] in ["MS", "QS", "YS"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also rename the variable name now if something else is being passed in (freq_name
?)
pandas/core/arrays/datetimes.py
Outdated
if self.freqstr is not None: | ||
freqstr = to_offset(self.freqstr).name | ||
else: | ||
freqstr = self.freqstr | ||
result = fields.get_start_end_field( | ||
values, field, self.freqstr, month_kw, reso=self._creso | ||
values, field, freqstr, month_kw, reso=self._creso |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/_libs/tslibs/fields.pyx
Outdated
if freqstr[0:2] in ["MS", "QS", "YS"]: | ||
freq_name = freqstr.lstrip("B")[0:2] | ||
if freq_name in ["MS", "QS", "YS"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry i meant that the argument freqstr
in get_start_end_field
needs renaming, because you're no longer passing in freq.freqstr
but freq.name
, so function argument (line 213) needs renaming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, it's clear now. I renamed the argument freqstr
in get_start_end_field
pandas/core/arrays/datetimes.py
Outdated
if freq is not None: | ||
freqstr = to_offset(freq.freqstr).name | ||
else: | ||
freqstr = freq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs renaming too?
And in the else
branch, you can just set it to =None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I agree, here freqstr
needs renaming too. I replaced it with freq_name
@MarcoGorelli could you please take a look at this PR? I think CI failures are unrelated to my changes. |
pandas/_libs/tslibs/timestamps.pyx
Outdated
freqstr = freq.freqstr | ||
freqstr = to_offset(freq.freqstr).name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it work to directly do freq.name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, it works with freq.name
indeed. I simplified to_offset(freq.freqstr).name
pandas/_libs/tslibs/timestamps.pyx
Outdated
freqstr = freq.freqstr | ||
freqstr = freq.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variable name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, it's my mistake. I renamed the variable freqstr
to freq_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me on green, thanks @natmokval !
thank you for reviewing this PR! |
@@ -419,6 +419,7 @@ Interval | |||
Indexing | |||
^^^^^^^^ | |||
- Bug in :meth:`DataFrame.__getitem__` returning modified columns when called with ``slice`` in Python 3.12 (:issue:`57500`) | |||
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` returning ``False`` on double-digit frequencies (:issue:`58523`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should have been in Datetimelike
, "indexing" is more like .loc
/ .iloc
/ get/setitem stuff
but OK to address this as part of https://github.com/pandas-dev/pandas/pull/58665/files, as that one needs updating anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I addressed this comment in the PR you suggested