-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DEPR: Timestamp.freq #41586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: Timestamp.freq #41586
Conversation
Changed your post to avoid closing the other issue too |
@@ -1309,6 +1515,12 @@ Methods | |||
Nano.__call__ | |||
Nano.apply | |||
Nano.apply_index | |||
Nano.is_month_start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this is going to generate a huge amount of pages, IIRC we do this just on the base class and that becomes generic. cc @jorisvandenbossche @datapythonista
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would prefer not adding those duplicated for each of the offsets (that actually also applies to the existing duplicated methods/attributes). We should follow the pattern used eg by the Index classes where only the subclass-specific ones are listed for non-base Index classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened #41745 to try to trim down the existing properties
@@ -1647,12 +1703,18 @@ default 'raise' | |||
value = tz_localize_to_utc_single(self.value, tz, | |||
ambiguous=ambiguous, | |||
nonexistent=nonexistent) | |||
return Timestamp(value, tz=tz, freq=self.freq) | |||
out = Timestamp(value, tz=tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we some just avoid setting this?
pandas/core/arrays/datetimes.py
Outdated
# filter out warnings about Timestamp.freq | ||
warnings.filterwarnings("ignore", category=FutureWarning) | ||
|
||
return Timestamp(x, freq=self.freq, tz=self.tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not just set this to None? (or we have to because you are deprecating)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we could change this without a deprecation cycle itd be convenient, but that would change what you get from dti[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the internal _set_freq
after construction instead of the catch_warnings?
Iteratively calling catch_warnings many times will give a lot of overhead (although we don't use this ourselves in something like __iter__
where we need to box many objects, but the user could still do something similar themselves with getitem?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing the .freq
attribute of a Timestamp instance itself is not yet deprecated? That should also be deprecated to be able to remove it?
@@ -1309,6 +1515,12 @@ Methods | |||
Nano.__call__ | |||
Nano.apply | |||
Nano.apply_index | |||
Nano.is_month_start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would prefer not adding those duplicated for each of the offsets (that actually also applies to the existing duplicated methods/attributes). We should follow the pattern used eg by the Index classes where only the subclass-specific ones are listed for non-base Index classes.
@@ -682,6 +682,7 @@ Deprecations | |||
- Deprecated passing arguments as positional in :meth:`DataFrame.clip` and :meth:`Series.clip` (other than ``"upper"`` and ``"lower"``) (:issue:`41485`) | |||
- Deprecated special treatment of lists with first element a Categorical in the :class:`DataFrame` constructor; pass as ``pd.DataFrame({col: categorical, ...})`` instead (:issue:`38845`) | |||
- Deprecated passing arguments as positional (except for ``"method"``) in :meth:`DataFrame.interpolate` and :meth:`Series.interpolate` (:issue:`41485`) | |||
- Deprecated the :attr:`Timestamp.freq` attribute. For the properties that use it (``is_month_start``, ``is_month_end``, ``is_quarter_start``, ``is_quarter_end``, ``is_year_start``, ``is_year_end``), when you have a ``freq``, use e.g. ``freq.is_month_start(ts)`` (:issue:`15146`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find the explanation of "for properties that use it .., when you have a freq
, ..." very clear. Adding an actual example might clarify.
First, I assume people who do ts.is_month_start
(with ts
being a Timestamp) don't necessarily know that they are relying on some freq
.
Also, the attributes are not going away, right? It's only the fastpath for computing it based on the freq that goes away? Are there actually cases that will change behaviour, or is it only performance? (and if only performance, that might not matter much for scalars, or maybe not worth warning about).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there actually cases that will change behaviour, or is it only performance?
Yes, e.g.
dti = pd.date_range("2017-01-01", periods=30, freq="B")
dti[0].is_month_start # <- True now, False once freq is removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in principle we can know when this will change? (depending on the freq
, we can know whether it can give different answers?)
Could we warn only in those cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to warn in fewer cases. Still haven't updated the whatsnew note per the first comment here. suggestions on wording?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree here. i think we are simply deprecating freq. why do we need to even say anything about is_month_start
for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bc the result of is_month_start is going to change in some cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One consequence of deprecating Timestamp.freq is that when having a timeseries with a freq, the following equivalency won't no longer always hold:
dtidx.is_month_start[0] == dtidx[0].is_month_start
That's a bit unfortunate, I would say, and probably something we should explicitly document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for this as a followup documentation (but for 1.3)
pandas/_libs/tslibs/timestamps.pyx
Outdated
"Timestamp.freq is deprecated and will be removed in a future version. " | ||
"When you have a freq, use freq.is_month_start(timestamp) instead", | ||
FutureWarning, | ||
stacklevel=2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, I think this warning message is rather vague / unclear (but let's discuss above)
pandas/core/arrays/datetimes.py
Outdated
# filter out warnings about Timestamp.freq | ||
warnings.filterwarnings("ignore", category=FutureWarning) | ||
|
||
return Timestamp(x, freq=self.freq, tz=self.tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the internal _set_freq
after construction instead of the catch_warnings?
Iteratively calling catch_warnings many times will give a lot of overhead (although we don't use this ourselves in something like __iter__
where we need to box many objects, but the user could still do something similar themselves with getitem?)
this would be good to get in the rc |
what's the soft-deadline for this? |
soonish |
Repeating my non-inline review comment again here to ensure it was noticed:
|
Thanks, will get to that shortly. |
I'll put the blocker on this also for now. will release 1.2.5 first. so still got a few days for this. |
@@ -50,6 +50,7 @@ def test_set_index_reset_index_dt64tz(self): | |||
df = result.set_index("foo") | |||
tm.assert_index_equal(df.index, idx) | |||
|
|||
@pytest.mark.filterwarnings("ignore:Timestamp.freq is deprecated:FutureWarning") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's causing this test to raise a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it doesnt after one of the more recent comments, good catch, will remove
@@ -16,6 +16,10 @@ | |||
|
|||
import pandas._testing as tm | |||
|
|||
pytestmark = pytest.mark.filterwarnings( | |||
"ignore:Timestamp.freq is deprecated:FutureWarning" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you apply this mark to only the tests that need it? (I would expect not all tests to need it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, will do
@@ -682,6 +682,7 @@ Deprecations | |||
- Deprecated passing arguments as positional in :meth:`DataFrame.clip` and :meth:`Series.clip` (other than ``"upper"`` and ``"lower"``) (:issue:`41485`) | |||
- Deprecated special treatment of lists with first element a Categorical in the :class:`DataFrame` constructor; pass as ``pd.DataFrame({col: categorical, ...})`` instead (:issue:`38845`) | |||
- Deprecated passing arguments as positional (except for ``"method"``) in :meth:`DataFrame.interpolate` and :meth:`Series.interpolate` (:issue:`41485`) | |||
- Deprecated the :attr:`Timestamp.freq` attribute. For the properties that use it (``is_month_start``, ``is_month_end``, ``is_quarter_start``, ``is_quarter_end``, ``is_year_start``, ``is_year_end``), when you have a ``freq``, use e.g. ``freq.is_month_start(ts)`` (:issue:`15146`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for this as a followup documentation (but for 1.3)
start_i = i * chunksize | ||
end_i = min((i + 1) * chunksize, length) | ||
converted = ints_to_pydatetime( | ||
data[start_i:end_i], tz=self.tz, freq=self.freq, box="timestamp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just NOT pass self.freq?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not without changing the behavior. ATM next(iter(dti)).freq
matches dti.freq
. Not passing freq
here would change that
@@ -1189,7 +1189,11 @@ def _addsub_object_array(self, other: np.ndarray, op): | |||
# Caller is responsible for broadcasting if necessary | |||
assert self.shape == other.shape, (self.shape, other.shape) | |||
|
|||
res_values = op(self.astype("O"), np.asarray(other)) | |||
with warnings.catch_warnings(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouln't these not have freq to begin with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self
here is DTA/TDA/PA, which all have freq. there is an issue about removing freq from DTA/TDA, which this deprecation is a blocker for
merging, @jbrockmendel can you followup with respect to any open comments (i think just the whatsnew but maybe more) |
@@ -421,6 +421,7 @@ def test_reset_index_multiindex_columns(self): | |||
result = df2.rename_axis([("c", "ii")]).reset_index(col_level=1, col_fill="C") | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
@pytest.mark.filterwarnings("ignore:Timestamp.freq is deprecated:FutureWarning") | |||
def test_reset_index_datetime(self, tz_naive_fixture): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question as before: what's causing this test to raise a warning?
@@ -403,6 +403,7 @@ def test_concat_multiple_tzs(self): | |||
expected = DataFrame({"time": [ts2, ts3]}) | |||
tm.assert_frame_equal(results, expected) | |||
|
|||
@pytest.mark.filterwarnings("ignore:Timestamp.freq is deprecated:FutureWarning") | |||
def test_concat_multiindex_with_tz(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, what's causing this test to raise a warning? (I don't directly see a .freq
access)
@@ -516,6 +516,7 @@ def test_pivot_index_with_nan(self, method): | |||
result = pd.pivot(df, "b", "a", "c") | |||
tm.assert_frame_equal(result, pv.T) | |||
|
|||
@pytest.mark.filterwarnings("ignore:Timestamp.freq is deprecated:FutureWarning") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like in all three cases they no longer are needed. will remove in my next follow-up branch
This resulted in a performance regression https://pandas.pydata.org/speed/pandas/#tslibs.timestamp.TimestampProperties.time_freqstr?Cython=0.29.21&jinja2=&python=3.8 however, we will be ignoring this #40008 (comment)
|
This in turn will allow us to remove DatetimeArray.freq and TimedeltaArray.freq, which is needed for #31218