Skip to content

DEPR: Timestamp.freq #41586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jun 7, 2021
Merged

DEPR: Timestamp.freq #41586

merged 20 commits into from
Jun 7, 2021

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented May 20, 2021

This in turn will allow us to remove DatetimeArray.freq and TimedeltaArray.freq, which is needed for #31218

@jbrockmendel jbrockmendel added Deprecate Functionality to remove in pandas Frequency DateOffsets labels May 20, 2021
@phofl
Copy link
Member

phofl commented May 20, 2021

Changed your post to avoid closing the other issue too

@@ -1309,6 +1515,12 @@ Methods
Nano.__call__
Nano.apply
Nano.apply_index
Nano.is_month_start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this is going to generate a huge amount of pages, IIRC we do this just on the base class and that becomes generic. cc @jorisvandenbossche @datapythonista

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would prefer not adding those duplicated for each of the offsets (that actually also applies to the existing duplicated methods/attributes). We should follow the pattern used eg by the Index classes where only the subclass-specific ones are listed for non-base Index classes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened #41745 to try to trim down the existing properties

@@ -1647,12 +1703,18 @@ default 'raise'
value = tz_localize_to_utc_single(self.value, tz,
ambiguous=ambiguous,
nonexistent=nonexistent)
return Timestamp(value, tz=tz, freq=self.freq)
out = Timestamp(value, tz=tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we some just avoid setting this?

# filter out warnings about Timestamp.freq
warnings.filterwarnings("ignore", category=FutureWarning)

return Timestamp(x, freq=self.freq, tz=self.tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not just set this to None? (or we have to because you are deprecating)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we could change this without a deprecation cycle itd be convenient, but that would change what you get from dti[0]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the internal _set_freq after construction instead of the catch_warnings?
Iteratively calling catch_warnings many times will give a lot of overhead (although we don't use this ourselves in something like __iter__ where we need to box many objects, but the user could still do something similar themselves with getitem?)

@simonjayhawkins simonjayhawkins added this to the 1.3 milestone May 24, 2021
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing the .freq attribute of a Timestamp instance itself is not yet deprecated? That should also be deprecated to be able to remove it?

@@ -1309,6 +1515,12 @@ Methods
Nano.__call__
Nano.apply
Nano.apply_index
Nano.is_month_start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would prefer not adding those duplicated for each of the offsets (that actually also applies to the existing duplicated methods/attributes). We should follow the pattern used eg by the Index classes where only the subclass-specific ones are listed for non-base Index classes.

@@ -682,6 +682,7 @@ Deprecations
- Deprecated passing arguments as positional in :meth:`DataFrame.clip` and :meth:`Series.clip` (other than ``"upper"`` and ``"lower"``) (:issue:`41485`)
- Deprecated special treatment of lists with first element a Categorical in the :class:`DataFrame` constructor; pass as ``pd.DataFrame({col: categorical, ...})`` instead (:issue:`38845`)
- Deprecated passing arguments as positional (except for ``"method"``) in :meth:`DataFrame.interpolate` and :meth:`Series.interpolate` (:issue:`41485`)
- Deprecated the :attr:`Timestamp.freq` attribute. For the properties that use it (``is_month_start``, ``is_month_end``, ``is_quarter_start``, ``is_quarter_end``, ``is_year_start``, ``is_year_end``), when you have a ``freq``, use e.g. ``freq.is_month_start(ts)`` (:issue:`15146`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find the explanation of "for properties that use it .., when you have a freq, ..." very clear. Adding an actual example might clarify.

First, I assume people who do ts.is_month_start (with ts being a Timestamp) don't necessarily know that they are relying on some freq.
Also, the attributes are not going away, right? It's only the fastpath for computing it based on the freq that goes away? Are there actually cases that will change behaviour, or is it only performance? (and if only performance, that might not matter much for scalars, or maybe not worth warning about).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there actually cases that will change behaviour, or is it only performance?

Yes, e.g.

dti = pd.date_range("2017-01-01", periods=30, freq="B")
dti[0].is_month_start  # <- True now, False once freq is removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in principle we can know when this will change? (depending on the freq, we can know whether it can give different answers?)
Could we warn only in those cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to warn in fewer cases. Still haven't updated the whatsnew note per the first comment here. suggestions on wording?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree here. i think we are simply deprecating freq. why do we need to even say anything about is_month_start for example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bc the result of is_month_start is going to change in some cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One consequence of deprecating Timestamp.freq is that when having a timeseries with a freq, the following equivalency won't no longer always hold:

dtidx.is_month_start[0] == dtidx[0].is_month_start

That's a bit unfortunate, I would say, and probably something we should explicitly document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for this as a followup documentation (but for 1.3)

"Timestamp.freq is deprecated and will be removed in a future version. "
"When you have a freq, use freq.is_month_start(timestamp) instead",
FutureWarning,
stacklevel=2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, I think this warning message is rather vague / unclear (but let's discuss above)

# filter out warnings about Timestamp.freq
warnings.filterwarnings("ignore", category=FutureWarning)

return Timestamp(x, freq=self.freq, tz=self.tz)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the internal _set_freq after construction instead of the catch_warnings?
Iteratively calling catch_warnings many times will give a lot of overhead (although we don't use this ourselves in something like __iter__ where we need to box many objects, but the user could still do something similar themselves with getitem?)

@jreback
Copy link
Contributor

jreback commented May 31, 2021

this would be good to get in the rc

@jbrockmendel
Copy link
Member Author

this would be good to get in the rc

what's the soft-deadline for this?

@jreback
Copy link
Contributor

jreback commented May 31, 2021

this would be good to get in the rc

what's the soft-deadline for this?

soonish

cc @simonjayhawkins

@jorisvandenbossche
Copy link
Member

Repeating my non-inline review comment again here to ensure it was noticed:

Accessing the .freq attribute of a Timestamp instance itself is not yet deprecated? That should also be deprecated to be able to remove it?

@jbrockmendel
Copy link
Member Author

Repeating my non-inline review comment again here to ensure it was noticed:

Thanks, will get to that shortly.

@simonjayhawkins
Copy link
Member

this would be good to get in the rc

what's the soft-deadline for this?

soonish

I'll put the blocker on this also for now.

will release 1.2.5 first. so still got a few days for this.

@simonjayhawkins simonjayhawkins added the Blocker Blocking issue or pull request for an upcoming release label Jun 2, 2021
@@ -50,6 +50,7 @@ def test_set_index_reset_index_dt64tz(self):
df = result.set_index("foo")
tm.assert_index_equal(df.index, idx)

@pytest.mark.filterwarnings("ignore:Timestamp.freq is deprecated:FutureWarning")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's causing this test to raise a warning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it doesnt after one of the more recent comments, good catch, will remove

@@ -16,6 +16,10 @@

import pandas._testing as tm

pytestmark = pytest.mark.filterwarnings(
"ignore:Timestamp.freq is deprecated:FutureWarning"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you apply this mark to only the tests that need it? (I would expect not all tests to need it?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, will do

@@ -682,6 +682,7 @@ Deprecations
- Deprecated passing arguments as positional in :meth:`DataFrame.clip` and :meth:`Series.clip` (other than ``"upper"`` and ``"lower"``) (:issue:`41485`)
- Deprecated special treatment of lists with first element a Categorical in the :class:`DataFrame` constructor; pass as ``pd.DataFrame({col: categorical, ...})`` instead (:issue:`38845`)
- Deprecated passing arguments as positional (except for ``"method"``) in :meth:`DataFrame.interpolate` and :meth:`Series.interpolate` (:issue:`41485`)
- Deprecated the :attr:`Timestamp.freq` attribute. For the properties that use it (``is_month_start``, ``is_month_end``, ``is_quarter_start``, ``is_quarter_end``, ``is_year_start``, ``is_year_end``), when you have a ``freq``, use e.g. ``freq.is_month_start(ts)`` (:issue:`15146`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for this as a followup documentation (but for 1.3)

start_i = i * chunksize
end_i = min((i + 1) * chunksize, length)
converted = ints_to_pydatetime(
data[start_i:end_i], tz=self.tz, freq=self.freq, box="timestamp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just NOT pass self.freq?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not without changing the behavior. ATM next(iter(dti)).freq matches dti.freq. Not passing freq here would change that

@@ -1189,7 +1189,11 @@ def _addsub_object_array(self, other: np.ndarray, op):
# Caller is responsible for broadcasting if necessary
assert self.shape == other.shape, (self.shape, other.shape)

res_values = op(self.astype("O"), np.asarray(other))
with warnings.catch_warnings():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouln't these not have freq to begin with?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self here is DTA/TDA/PA, which all have freq. there is an issue about removing freq from DTA/TDA, which this deprecation is a blocker for

@simonjayhawkins simonjayhawkins mentioned this pull request Jun 7, 2021
@jreback jreback merged commit dae5c59 into pandas-dev:master Jun 7, 2021
@jreback
Copy link
Contributor

jreback commented Jun 7, 2021

merging, @jbrockmendel can you followup with respect to any open comments (i think just the whatsnew but maybe more)

@jbrockmendel jbrockmendel deleted the depr-freq branch June 7, 2021 16:51
@@ -421,6 +421,7 @@ def test_reset_index_multiindex_columns(self):
result = df2.rename_axis([("c", "ii")]).reset_index(col_level=1, col_fill="C")
tm.assert_frame_equal(result, expected)

@pytest.mark.filterwarnings("ignore:Timestamp.freq is deprecated:FutureWarning")
def test_reset_index_datetime(self, tz_naive_fixture):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question as before: what's causing this test to raise a warning?

@@ -403,6 +403,7 @@ def test_concat_multiple_tzs(self):
expected = DataFrame({"time": [ts2, ts3]})
tm.assert_frame_equal(results, expected)

@pytest.mark.filterwarnings("ignore:Timestamp.freq is deprecated:FutureWarning")
def test_concat_multiindex_with_tz(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, what's causing this test to raise a warning? (I don't directly see a .freq access)

@@ -516,6 +516,7 @@ def test_pivot_index_with_nan(self, method):
result = pd.pivot(df, "b", "a", "c")
tm.assert_frame_equal(result, pv.T)

@pytest.mark.filterwarnings("ignore:Timestamp.freq is deprecated:FutureWarning")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like in all three cases they no longer are needed. will remove in my next follow-up branch

@jorisvandenbossche jorisvandenbossche removed the Blocker Blocking issue or pull request for an upcoming release label Jun 15, 2021
@simonjayhawkins
Copy link
Member

This resulted in a performance regression https://pandas.pydata.org/speed/pandas/#tslibs.timestamp.TimestampProperties.time_freqstr?Cython=0.29.21&jinja2=&python=3.8

however, we will be ignoring this #40008 (comment)

BTW looking at the regressions listed, I think we can hit "ignore" for the TimestampProperties.time_freqstr ones, since those are driven by a deprecation of freqstr (and are still measured in ns)

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
@jbrockmendel jbrockmendel mentioned this pull request Jul 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Frequency DateOffsets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEPR: remove freq attribute of Timestamp?
5 participants