BUG: `DatetimeIndex.is_year_start` and `DatetimeIndex.is_quarter_start` always return False on double-digit frequencies #58549

natmokval · 2024-05-03T12:37:53Z

closes BUG: DatetimeIndex.is_year_start breaks on double-digit frequencies #58523

MarcoGorelli · 2024-05-03T20:59:33Z

pandas/_libs/tslibs/fields.pyx

@@ -253,8 +254,7 @@ def get_start_end_field(
        # month of year. Other offsets use month, startingMonth as ending
        # month of year.

-        if (freqstr[0:2] in ["MS", "QS", "YS"]) or (
-                freqstr[1:3] in ["MS", "QS", "YS"]):
+        if re.split("[0-9]*", freqstr, maxsplit=1)[1][0:2] in ["MS", "QS", "YS"]:


i know this is still in draft, but is there any existing function that can be re-used here? it seems like a fairly common thing to get the frequency out of a string, I'm sure there's other places that do it - can something be reused? I don't think it's really feasible to have a regex each time this needs doing

MarcoGorelli · 2024-05-07T16:07:31Z

pandas/_libs/tslibs/fields.pyx

-        if (freqstr[0:2] in ["MS", "QS", "YS"]) or (
-                freqstr[1:3] in ["MS", "QS", "YS"]):
+        offset = to_offset(freqstr)
+        if offset.freqstr.replace(str(offset.n), "")[0:2]  in ["MS", "QS", "YS"]:


how about offset.name?

MarcoGorelli

thanks for updating, this is getting closer

can we do to_offset(freqstr).name even further up, and pass that down to this function? as in, find where get_start_end_field is being called, calculate the frequency name from there, and pass that to the function - this way we avoid repeatedly calling to_offset, which is a bit expensive

MarcoGorelli · 2024-05-09T08:52:54Z

pandas/_libs/tslibs/timestamps.pyx

@@ -587,7 +587,8 @@ cdef class _Timestamp(ABCTimestamp):
        val = self._maybe_convert_value_to_local()

        out = get_start_end_field(np.array([val], dtype=np.int64),
-                                  field, freqstr, month_kw, self._creso)
+                                  field, to_offset(freqstr).name ,


there's already freq a few lines above, can we just take it from there?

sorry, I am not sure if I understand correctly. I replaced freqstr = freq.freqstr with freqstr = to_offset(freq.freqstr).name a few lines above.

MarcoGorelli · 2024-05-09T08:53:20Z

pandas/_libs/tslibs/fields.pyx

-        if to_offset(freqstr).name[0:2]  in ["MS", "QS", "YS"]:
+        if freqstr[0:2]  in ["MS", "QS", "YS"]:


we should also rename the variable name now if something else is being passed in (freq_name?)

MarcoGorelli · 2024-05-09T08:53:39Z

pandas/core/arrays/datetimes.py

+                if self.freqstr is not None:
+                    freqstr = to_offset(self.freqstr).name
+                else:
+                    freqstr = self.freqstr
                result = fields.get_start_end_field(
-                    values, field, self.freqstr, month_kw, reso=self._creso
+                    values, field, freqstr, month_kw, reso=self._creso


…uble-digit-freq

MarcoGorelli · 2024-05-09T10:25:13Z

pandas/_libs/tslibs/fields.pyx

-        if freqstr[0:2]  in ["MS", "QS", "YS"]:
+        freq_name = freqstr.lstrip("B")[0:2]
+        if freq_name  in ["MS", "QS", "YS"]:


sorry i meant that the argument freqstr in get_start_end_field needs renaming, because you're no longer passing in freq.freqstr but freq.name, so function argument (line 213) needs renaming

thanks, it's clear now. I renamed the argument freqstr in get_start_end_field

MarcoGorelli · 2024-05-09T11:06:06Z

pandas/core/arrays/datetimes.py

+                if freq is not None:
+                    freqstr = to_offset(freq.freqstr).name
+                else:
+                    freqstr = freq


I think this needs renaming too?

And in the else branch, you can just set it to =None

thanks, I agree, here freqstr needs renaming too. I replaced it with freq_name

natmokval · 2024-05-09T17:03:44Z

@MarcoGorelli could you please take a look at this PR? I think CI failures are unrelated to my changes.

MarcoGorelli · 2024-05-09T18:24:40Z

pandas/_libs/tslibs/timestamps.pyx

-            freqstr = freq.freqstr
+            freqstr = to_offset(freq.freqstr).name


does it work to directly do freq.name?

thanks, it works with freq.name indeed. I simplified to_offset(freq.freqstr).name

…uble-digit-freq

MarcoGorelli · 2024-05-10T14:29:51Z

pandas/_libs/tslibs/timestamps.pyx

-            freqstr = freq.freqstr
+            freqstr = freq.name


variable name

sorry, it's my mistake. I renamed the variable freqstr to freq_name

MarcoGorelli

looks good to me on green, thanks @natmokval !

natmokval · 2024-05-10T15:14:10Z

looks good to me on green, thanks @natmokval !

thank you for reviewing this PR!

MarcoGorelli · 2024-05-10T20:29:02Z

doc/source/whatsnew/v3.0.0.rst

@@ -419,6 +419,7 @@ Interval
 Indexing
 ^^^^^^^^
 - Bug in :meth:`DataFrame.__getitem__` returning modified columns when called with ``slice`` in Python 3.12 (:issue:`57500`)
+- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` returning ``False`` on double-digit frequencies (:issue:`58523`)


this should have been in Datetimelike, "indexing" is more like .loc / .iloc / get/setitem stuff

but OK to address this as part of https://github.com/pandas-dev/pandas/pull/58665/files, as that one needs updating anyway

thanks, I addressed this comment in the PR you suggested

correct def get_start_end_field, add test, add a note to v3.0.0

8c9aaa8

natmokval added Bug Frequency DateOffsets labels May 3, 2024

MarcoGorelli reviewed May 3, 2024

View reviewed changes

MarcoGorelli mentioned this pull request May 3, 2024

BUG: pandas-dev#58523 & pandas-dev#58524 #58545

Closed

1 task

replace regex with to_offset

21bd212

MarcoGorelli reviewed May 7, 2024

View reviewed changes

replace offset.freqstr.replace with offset.name

9f094d7