Skip to content

DEPR: Timestamp.freq #41586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jun 7, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
212 changes: 212 additions & 0 deletions doc/source/reference/offset_frequency.rst

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -649,6 +649,7 @@ Deprecations
- Deprecated behavior of :meth:`DatetimeIndex.union` with mixed timezones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`)
- Deprecated using ``usecols`` with out of bounds indices for ``read_csv`` with ``engine="c"`` (:issue:`25623`)
- Deprecated passing arguments as positional (except for ``"method"``) in :meth:`DataFrame.interpolate` and :meth:`Series.interpolate` (:issue:`41485`)
- Deprecated the :attr:`Timestamp.freq` attribute. For the properties that use it (``is_month_start``, ``is_month_end``, ``is_quarter_start``, ``is_quarter_end``, ``is_year_start``, ``is_year_end``), when you have a ``freq``, use e.g. ``freq.is_month_start(ts)`` (:issue:`15146`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find the explanation of "for properties that use it .., when you have a freq, ..." very clear. Adding an actual example might clarify.

First, I assume people who do ts.is_month_start (with ts being a Timestamp) don't necessarily know that they are relying on some freq.
Also, the attributes are not going away, right? It's only the fastpath for computing it based on the freq that goes away? Are there actually cases that will change behaviour, or is it only performance? (and if only performance, that might not matter much for scalars, or maybe not worth warning about).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there actually cases that will change behaviour, or is it only performance?

Yes, e.g.

dti = pd.date_range("2017-01-01", periods=30, freq="B")
dti[0].is_month_start  # <- True now, False once freq is removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in principle we can know when this will change? (depending on the freq, we can know whether it can give different answers?)
Could we warn only in those cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to warn in fewer cases. Still haven't updated the whatsnew note per the first comment here. suggestions on wording?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree here. i think we are simply deprecating freq. why do we need to even say anything about is_month_start for example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bc the result of is_month_start is going to change in some cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One consequence of deprecating Timestamp.freq is that when having a timeseries with a freq, the following equivalency won't no longer always hold:

dtidx.is_month_start[0] == dtidx[0].is_month_start

That's a bit unfortunate, I would say, and probably something we should explicitly document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for this as a followup documentation (but for 1.3)


.. ---------------------------------------------------------------------------

Expand Down
20 changes: 20 additions & 0 deletions pandas/_libs/tslibs/offsets.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -716,6 +716,26 @@ cdef class BaseOffset:
# if there were a canonical docstring for what is_anchored means.
return self.n == 1

# ------------------------------------------------------------------

def is_month_start(self, _Timestamp ts):
return ts._get_start_end_field("is_month_start", self)

def is_month_end(self, _Timestamp ts):
return ts._get_start_end_field("is_month_end", self)

def is_quarter_start(self, _Timestamp ts):
return ts._get_start_end_field("is_quarter_start", self)

def is_quarter_end(self, _Timestamp ts):
return ts._get_start_end_field("is_quarter_end", self)

def is_year_start(self, _Timestamp ts):
return ts._get_start_end_field("is_year_start", self)

def is_year_end(self, _Timestamp ts):
return ts._get_start_end_field("is_year_end", self)


cdef class SingleConstructorOffset(BaseOffset):
@classmethod
Expand Down
3 changes: 2 additions & 1 deletion pandas/_libs/tslibs/timestamps.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,12 @@ cdef class _Timestamp(ABCTimestamp):
int64_t value, nanosecond
object freq

cdef bint _get_start_end_field(self, str field)
cdef bint _get_start_end_field(self, str field, freq)
cdef _get_date_name_field(self, str field, object locale)
cdef int64_t _maybe_convert_value_to_local(self)
cdef bint _can_compare(self, datetime other)
cpdef to_datetime64(self)
cpdef datetime to_pydatetime(_Timestamp self, bint warn=*)
cdef bint _compare_outside_nanorange(_Timestamp self, datetime other,
int op) except -1
cpdef void _set_freq(self, freq)
97 changes: 81 additions & 16 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,12 @@ cdef class _Timestamp(ABCTimestamp):
dayofweek = _Timestamp.day_of_week
dayofyear = _Timestamp.day_of_year

cpdef void _set_freq(self, freq):
# set the .freq attribute without going through the constructor,
# which would issue a warning
# Caller is responsible for validation
self.freq = freq

def __hash__(_Timestamp self):
if self.nanosecond:
return hash(self.value)
Expand Down Expand Up @@ -263,7 +269,9 @@ cdef class _Timestamp(ABCTimestamp):

if is_any_td_scalar(other):
nanos = delta_to_nanoseconds(other)
result = type(self)(self.value + nanos, tz=self.tzinfo, freq=self.freq)
result = type(self)(self.value + nanos, tz=self.tzinfo)
if result is not NaT:
result._set_freq(self.freq) # avoid warning in constructor
return result

elif is_integer_object(other):
Expand Down Expand Up @@ -361,14 +369,13 @@ cdef class _Timestamp(ABCTimestamp):
val = self.value
return val

cdef bint _get_start_end_field(self, str field):
cdef bint _get_start_end_field(self, str field, freq):
cdef:
int64_t val
dict kwds
ndarray[uint8_t, cast=True] out
int month_kw

freq = self.freq
if freq:
kwds = freq.kwds
month_kw = kwds.get('startingMonth', kwds.get('month', 12))
Expand Down Expand Up @@ -400,7 +407,13 @@ cdef class _Timestamp(ABCTimestamp):
if self.freq is None:
# fast-path for non-business frequencies
return self.day == 1
return self._get_start_end_field("is_month_start")
warnings.warn(
"Timestamp.freq is deprecated and will be removed in a future version. "
"When you have a freq, use freq.is_month_start(timestamp) instead",
FutureWarning,
stacklevel=2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, I think this warning message is rather vague / unclear (but let's discuss above)

)
return self._get_start_end_field("is_month_start", self.freq)

@property
def is_month_end(self) -> bool:
Expand All @@ -420,7 +433,13 @@ cdef class _Timestamp(ABCTimestamp):
if self.freq is None:
# fast-path for non-business frequencies
return self.day == self.days_in_month
return self._get_start_end_field("is_month_end")
warnings.warn(
"Timestamp.freq is deprecated and will be removed in a future version. "
"When you have a freq, use freq.is_month_end(timestamp) instead",
FutureWarning,
stacklevel=2,
)
return self._get_start_end_field("is_month_end", self.freq)

@property
def is_quarter_start(self) -> bool:
Expand All @@ -440,7 +459,13 @@ cdef class _Timestamp(ABCTimestamp):
if self.freq is None:
# fast-path for non-business frequencies
return self.day == 1 and self.month % 3 == 1
return self._get_start_end_field("is_quarter_start")
warnings.warn(
"Timestamp.freq is deprecated and will be removed in a future version. "
"When you have a freq, use freq.is_quarter_start(timestamp) instead",
FutureWarning,
stacklevel=2,
)
return self._get_start_end_field("is_quarter_start", self.freq)

@property
def is_quarter_end(self) -> bool:
Expand All @@ -460,7 +485,13 @@ cdef class _Timestamp(ABCTimestamp):
if self.freq is None:
# fast-path for non-business frequencies
return (self.month % 3) == 0 and self.day == self.days_in_month
return self._get_start_end_field("is_quarter_end")
warnings.warn(
"Timestamp.freq is deprecated and will be removed in a future version. "
"When you have a freq, use freq.is_quarter_end(timestamp) instead",
FutureWarning,
stacklevel=2,
)
return self._get_start_end_field("is_quarter_end", self.freq)

@property
def is_year_start(self) -> bool:
Expand All @@ -480,7 +511,13 @@ cdef class _Timestamp(ABCTimestamp):
if self.freq is None:
# fast-path for non-business frequencies
return self.day == self.month == 1
return self._get_start_end_field("is_year_start")
warnings.warn(
"Timestamp.freq is deprecated and will be removed in a future version. "
"When you have a freq, use freq.is_year_start(timestamp) instead",
FutureWarning,
stacklevel=2,
)
return self._get_start_end_field("is_year_start", self.freq)

@property
def is_year_end(self) -> bool:
Expand All @@ -500,7 +537,13 @@ cdef class _Timestamp(ABCTimestamp):
if self.freq is None:
# fast-path for non-business frequencies
return self.month == 12 and self.day == 31
return self._get_start_end_field("is_year_end")
warnings.warn(
"Timestamp.freq is deprecated and will be removed in a future version. "
"When you have a freq, use freq.is_year_end(timestamp) instead",
FutureWarning,
stacklevel=2,
)
return self._get_start_end_field("is_year_end", self.freq)

cdef _get_date_name_field(self, str field, object locale):
cdef:
Expand Down Expand Up @@ -878,6 +921,12 @@ cdef class _Timestamp(ABCTimestamp):

if freq is None:
freq = self.freq
warnings.warn(
"Timestamp.freq is deprecated and will be removed in a future "
"version. Pass `freq` to Timestamp.to_period explicitly",
FutureWarning,
stacklevel=2,
)

return Period(self, freq=freq)

Expand Down Expand Up @@ -1147,7 +1196,7 @@ class Timestamp(_Timestamp):
nanosecond=None,
tzinfo_type tzinfo=None,
*,
fold=None
fold=None,
):
# The parameter list folds together legacy parameter names (the first
# four) and positional and keyword parameter names from pydatetime.
Expand Down Expand Up @@ -1277,8 +1326,15 @@ class Timestamp(_Timestamp):
if freq is None:
# GH 22311: Try to extract the frequency of a given Timestamp input
freq = getattr(ts_input, 'freq', None)
elif not is_offset_object(freq):
freq = to_offset(freq)
else:
warnings.warn(
"The 'freq' argument in Timestamp is deprecated and will be "
"removed in a future version.",
FutureWarning,
stacklevel=1,
)
if not is_offset_object(freq):
freq = to_offset(freq)

return create_timestamp_from_ts(ts.value, ts.dts, ts.tzinfo, freq, ts.fold)

Expand Down Expand Up @@ -1647,12 +1703,18 @@ default 'raise'
value = tz_localize_to_utc_single(self.value, tz,
ambiguous=ambiguous,
nonexistent=nonexistent)
return Timestamp(value, tz=tz, freq=self.freq)
out = Timestamp(value, tz=tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we some just avoid setting this?

if out is not NaT:
out._set_freq(self.freq) # avoid warning in constructor
return out
else:
if tz is None:
# reset tz
value = tz_convert_from_utc_single(self.value, self.tz)
return Timestamp(value, tz=tz, freq=self.freq)
out = Timestamp(value, tz=tz)
if out is not NaT:
out._set_freq(self.freq) # avoid warning in constructor
return out
else:
raise TypeError(
"Cannot localize tz-aware Timestamp, use tz_convert for conversions"
Expand Down Expand Up @@ -1707,7 +1769,10 @@ default 'raise'
)
else:
# Same UTC timestamp, different time zone
return Timestamp(self.value, tz=tz, freq=self.freq)
out = Timestamp(self.value, tz=tz)
if out is not NaT:
out._set_freq(self.freq) # avoid warning in constructor
return out

astimezone = tz_convert

Expand All @@ -1722,7 +1787,7 @@ default 'raise'
microsecond=None,
nanosecond=None,
tzinfo=object,
fold=None,
fold=None
):
"""
Implements datetime.replace, handles nanoseconds.
Expand Down
6 changes: 5 additions & 1 deletion pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -1208,7 +1208,11 @@ def _addsub_object_array(self, other: np.ndarray, op):
# Caller is responsible for broadcasting if necessary
assert self.shape == other.shape, (self.shape, other.shape)

res_values = op(self.astype("O"), np.asarray(other))
with warnings.catch_warnings():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouln't these not have freq to begin with?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self here is DTA/TDA/PA, which all have freq. there is an issue about removing freq from DTA/TDA, which this deprecation is a blocker for

# filter out warnings about Timestamp.freq
warnings.filterwarnings("ignore", category=FutureWarning)
res_values = op(self.astype("O"), np.asarray(other))

result = pd_array(res_values.ravel())
# error: Item "ExtensionArray" of "Union[Any, ExtensionArray]" has no attribute
# "reshape"
Expand Down
25 changes: 17 additions & 8 deletions pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -510,7 +510,11 @@ def _check_compatible_with(self, other, setitem: bool = False):
# Descriptive Properties

def _box_func(self, x) -> Timestamp | NaTType:
return Timestamp(x, freq=self.freq, tz=self.tz)
with warnings.catch_warnings():
# filter out warnings about Timestamp.freq
warnings.filterwarnings("ignore", category=FutureWarning)

return Timestamp(x, freq=self.freq, tz=self.tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not just set this to None? (or we have to because you are deprecating)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we could change this without a deprecation cycle itd be convenient, but that would change what you get from dti[0]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the internal _set_freq after construction instead of the catch_warnings?
Iteratively calling catch_warnings many times will give a lot of overhead (although we don't use this ourselves in something like __iter__ where we need to box many objects, but the user could still do something similar themselves with getitem?)


@property
# error: Return type "Union[dtype, DatetimeTZDtype]" of "dtype"
Expand Down Expand Up @@ -603,13 +607,18 @@ def __iter__(self):
length = len(self)
chunksize = 10000
chunks = (length // chunksize) + 1
for i in range(chunks):
start_i = i * chunksize
end_i = min((i + 1) * chunksize, length)
converted = ints_to_pydatetime(
data[start_i:end_i], tz=self.tz, freq=self.freq, box="timestamp"
)
yield from converted

with warnings.catch_warnings():
# filter out warnings about Timestamp.freq
warnings.filterwarnings("ignore", category=FutureWarning)

for i in range(chunks):
start_i = i * chunksize
end_i = min((i + 1) * chunksize, length)
converted = ints_to_pydatetime(
data[start_i:end_i], tz=self.tz, freq=self.freq, box="timestamp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just NOT pass self.freq?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not without changing the behavior. ATM next(iter(dti)).freq matches dti.freq. Not passing freq here would change that

)
yield from converted

def astype(self, dtype, copy: bool = True):
# We handle
Expand Down
4 changes: 1 addition & 3 deletions pandas/tests/frame/methods/test_reset_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -509,9 +509,7 @@ def test_reset_index_datetime(self, tz_naive_fixture):
},
columns=["level_0", "level_1", "a"],
)
expected["level_1"] = expected["level_1"].apply(
lambda d: Timestamp(d, freq="D", tz=tz)
)
expected["level_1"] = expected["level_1"].apply(lambda d: Timestamp(d, tz=tz))
result = df.reset_index()
tm.assert_frame_equal(result, expected)

Expand Down
8 changes: 4 additions & 4 deletions pandas/tests/indexes/datetimes/methods/test_repeat.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,10 @@ def test_repeat(self, tz_naive_fixture):

expected_rng = DatetimeIndex(
[
Timestamp("2016-01-01 00:00:00", tz=tz, freq="30T"),
Timestamp("2016-01-01 00:00:00", tz=tz, freq="30T"),
Timestamp("2016-01-01 00:30:00", tz=tz, freq="30T"),
Timestamp("2016-01-01 00:30:00", tz=tz, freq="30T"),
Timestamp("2016-01-01 00:00:00", tz=tz),
Timestamp("2016-01-01 00:00:00", tz=tz),
Timestamp("2016-01-01 00:30:00", tz=tz),
Timestamp("2016-01-01 00:30:00", tz=tz),
]
)

Expand Down
5 changes: 5 additions & 0 deletions pandas/tests/indexes/datetimes/methods/test_to_period.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,9 @@ def test_to_period_tz(self, tz):

with tm.assert_produces_warning(UserWarning):
# GH#21333 warning that timezone info will be lost
# filter warning about freq deprecation
warnings.filterwarnings("ignore", category=FutureWarning)

result = ts.to_period()[0]
expected = ts[0].to_period()

Expand All @@ -165,6 +168,8 @@ def test_to_period_tz_utc_offset_consistency(self, tz):
# GH#22905
ts = date_range("1/1/2000", "2/1/2000", tz="Etc/GMT-1")
with tm.assert_produces_warning(UserWarning):
warnings.filterwarnings("ignore", category=FutureWarning)

result = ts.to_period()[0]
expected = ts[0].to_period()
assert result == expected
Expand Down
Loading