Skip to content

DEPR: lower/uppercase strings such as 'y', 'q', 'H', 'MIN', etc. denoting freqs/units for time series, period, and timedelta #56346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
c7d82ec
deprecate lowercase strings denoting freq for week, month, monthend, …
natmokval Dec 5, 2023
614710b
fix tests
natmokval Dec 5, 2023
458a9c1
fix tests
natmokval Dec 6, 2023
e9c8af5
correct def _get_offset, fix tests
natmokval Dec 7, 2023
73f68f0
add tests, fix tests
natmokval Dec 7, 2023
eea9b69
fix tests
natmokval Dec 11, 2023
386eb57
correct parse_timedelta_unit, to_offset, fix tests, add tests
natmokval Dec 11, 2023
5dcb918
resolve conflicts, fix tests, add tests
natmokval Dec 12, 2023
25bfbef
fix tests
natmokval Dec 12, 2023
eb418eb
resolve conflicts, depr 'MIN' from to_timedelta, fix tests
natmokval Dec 14, 2023
943b898
Merge branch 'main' into depr-uppercasing-in-get-offset
natmokval Dec 15, 2023
435db76
deprecate 'Min' in favour of 'min'
natmokval Dec 15, 2023
3cc94d2
correct docs
natmokval Dec 15, 2023
457ae96
show depr warning in test_construction() for Period
natmokval Dec 19, 2023
3837b7f
resolve conflict
natmokval Dec 19, 2023
f709557
correct warning message in test_construction()
natmokval Dec 20, 2023
7cd909f
remove from to_offset() unnecessary check, fix test_to_offset_invalid
natmokval Dec 20, 2023
8d1d6dd
fix pre-commit error
natmokval Dec 20, 2023
6c24541
Merge branch 'main' into depr-uppercasing-in-get-offset
natmokval Dec 20, 2023
286b618
add notes to /v2.2.0.rst
natmokval Dec 21, 2023
606d0c5
add filterwarnings to test_to_offset_invalid, correct notes in v2.2.0…
natmokval Dec 21, 2023
32993a9
improve the headline in v2.2.0.rst
natmokval Dec 21, 2023
dcf1994
correct depr note in v2.2.0.rst
natmokval Dec 21, 2023
5443b1e
Merge branch 'main' into depr-uppercasing-in-get-offset
natmokval Dec 21, 2023
7889c7d
correct to_offset() for freqs such us ys, qs, and add tests
natmokval Dec 27, 2023
00e80c0
Merge branch 'main' into depr-uppercasing-in-get-offset
natmokval Dec 27, 2023
0eb098e
resolve conflicts, fix tests
natmokval Jan 4, 2024
b6074b1
resolve conflicts
natmokval Feb 8, 2024
530f056
deprecate lowercase freq 'w', 'd' from timeseries
natmokval Feb 8, 2024
f600538
fix tests for 'D'
natmokval Feb 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/user_guide/10min.rst
Original file line number Diff line number Diff line change
Expand Up @@ -612,7 +612,7 @@ financial applications. See the :ref:`Time Series section <timeseries>`.

rng = pd.date_range("1/1/2012", periods=100, freq="s")
ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts.resample("5Min").sum()
ts.resample("5min").sum()

:meth:`Series.tz_localize` localizes a time series to a time zone:

Expand Down
16 changes: 8 additions & 8 deletions doc/source/user_guide/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1645,7 +1645,7 @@ Basics

ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)

ts.resample("5Min").sum()
ts.resample("5min").sum()

The ``resample`` function is very flexible and allows you to specify many
different parameters to control the frequency conversion and resampling
Expand All @@ -1657,31 +1657,31 @@ a method of the returned object, including ``sum``, ``mean``, ``std``, ``sem``,

.. ipython:: python

ts.resample("5Min").mean()
ts.resample("5min").mean()

ts.resample("5Min").ohlc()
ts.resample("5min").ohlc()

ts.resample("5Min").max()
ts.resample("5min").max()


For downsampling, ``closed`` can be set to 'left' or 'right' to specify which
end of the interval is closed:

.. ipython:: python

ts.resample("5Min", closed="right").mean()
ts.resample("5min", closed="right").mean()

ts.resample("5Min", closed="left").mean()
ts.resample("5min", closed="left").mean()

Parameters like ``label`` are used to manipulate the resulting labels.
``label`` specifies whether the result is labeled with the beginning or
the end of the interval.

.. ipython:: python

ts.resample("5Min").mean() # by default label='left'
ts.resample("5min").mean() # by default label='left'

ts.resample("5Min", label="left").mean()
ts.resample("5min", label="left").mean()

.. warning::

Expand Down
35 changes: 35 additions & 0 deletions doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -408,6 +408,41 @@ For example:

pd.date_range('2020-01-01', periods=3, freq='QE-NOV')

Deprecate lowercase strings ``'w'``, ``'m'``, ``'q'``, etc. and uppercase strings ``'H'``, ``'MIN'``, ``'S'``, etc. for time series, period, and timedelta
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, both ``'h'`` and ``'H'`` were allowed for 'hour' offset/period alias. We now require the case to be correct - check the :ref:`offset aliases <timeseries.offset_aliases>` and the :ref:`period aliases <timeseries.period_aliases>` parts of the docs and make sure you're using the correct one (:issue:`56346`)

Deprecated lowercase strings in favour of uppercase strings denoting:

- period aliases weekly, monthly and bigger frequency
- offsets aliases representing :class:`DateOffset` subclasses that are a week or bigger (``Week``, ``MonthBegin``, ``MonthEnd``, etc.)
- timedelta units for week, month or bigger

Deprecated uppercase strings in favour of lowercase strings denoting:

- period aliases hourly, minutely or smaller frequency
- offsets aliases representing :class:`DateOffset` subclasses that are an hour or smaller (``Hour``, ``Minute``, etc.)
- timedelta units for hour, minute or smaller

For example:

*Previous behavior*:

.. code-block:: ipython

In [9]: pd.date_range('2020-01-01', periods=3, freq='H')
Out[9]:
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 01:00:00',
'2020-01-01 02:00:00'],
dtype='datetime64[ns]', freq='H')

*Future behavior*:

.. ipython:: python

pd.date_range('2020-01-01', periods=3, freq='h')

Deprecated automatic downcasting
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
4 changes: 3 additions & 1 deletion pandas/_libs/tslibs/dtypes.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -407,8 +407,10 @@ cdef dict c_DEPR_ABBREVS = {
"L": "ms",
"l": "ms",
"U": "us",
"US": "us",
"u": "us",
"N": "ns",
"NS": "ns",
"n": "ns",
}

Expand Down Expand Up @@ -509,7 +511,7 @@ class Resolution(Enum):
warnings.warn(
f"\'{freq}\' is deprecated and will be removed in a future "
f"version. Please use \'{abbrev}\' "
"instead of \'{freq}\'.",
f"instead of \'{freq}\'.",
FutureWarning,
stacklevel=find_stack_level(),
)
Expand Down
97 changes: 58 additions & 39 deletions pandas/_libs/tslibs/offsets.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1766,7 +1766,7 @@ cdef class BusinessDay(BusinessMixin):
s -= hrs * 3600
mts = int(s / 60)
if mts != 0:
off_str += str(mts) + "Min"
off_str += str(mts) + "min"
s -= mts * 60
if s != 0:
off_str += str(s) + "s"
Expand Down Expand Up @@ -4654,36 +4654,13 @@ _lite_rule_alias = {
"BYE": "BYE-DEC", # BYearEnd(month=12),
"BYS": "BYS-JAN", # BYearBegin(month=1),

"Min": "min",
"min": "min",
"ms": "ms",
"us": "us",
"ns": "ns",
}

_dont_uppercase = {
"h",
"bh",
"cbh",
"MS",
"ms",
"s",
"me",
"qe",
"qe-dec",
"qe-jan",
"qe-feb",
"qe-mar",
"qe-apr",
"qe-may",
"qe-jun",
"qe-jul",
"qe-aug",
"qe-sep",
"qe-oct",
"qe-nov",
"ye",
}
_dont_uppercase = {"h", "bh", "cbh", "MS", "ms", "s"}


INVALID_FREQ_ERR_MSG = "Invalid frequency: {0}"
Expand All @@ -4702,7 +4679,29 @@ def _get_offset(name: str) -> BaseOffset:
--------
_get_offset('EOM') --> BMonthEnd(1)
"""
if name.lower() not in _dont_uppercase:
if (
name not in _lite_rule_alias
and (name.upper() in _lite_rule_alias)
and name != "ms"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate to this PR, but can we move 'ms', 'us', 'ns', and 'min' from _lite_rule_alias? _lit_rule_alias is only ever used with .get(name, name), so I don't think they need to be in it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I did as you suggested and removed "min", ”ms”, “ns", and “us" from _lite_rule_alias in separate PR #56516. I added these aliases to the list _dont_uppercase, because otherwise we uppercase them in _get_offset(). could you please take a look at this PR?

I am not sure, do we need “Min” in _lite_rule_alias? We want to deprecate the alias “Min” and left only lowercase “min” for Minutes. Am I correct?

):
warnings.warn(
f"\'{name}\' is deprecated and will be removed "
f"in a future version, please use \'{name.upper()}\' instead.",
FutureWarning,
stacklevel=find_stack_level(),
)
elif (
name not in _lite_rule_alias
and (name.lower() in _lite_rule_alias)
and name != "MS"
):
warnings.warn(
f"\'{name}\' is deprecated and will be removed "
f"in a future version, please use \'{name.lower()}\' instead.",
FutureWarning,
stacklevel=find_stack_level(),
)
if name not in _dont_uppercase:
name = name.upper()
name = _lite_rule_alias.get(name, name)
name = _lite_rule_alias.get(name.lower(), name)
Expand Down Expand Up @@ -4795,40 +4794,60 @@ cpdef to_offset(freq, bint is_period=False):

tups = zip(split[0::4], split[1::4], split[2::4])
for n, (sep, stride, name) in enumerate(tups):
if is_period is False and name in c_OFFSET_DEPR_FREQSTR:
if is_period is False and name.upper() in c_OFFSET_DEPR_FREQSTR:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, little nitpick, but if not is_period is generally preferred over if is_period is False (and similarly for the True counterparts)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, if not is_period looks much better indeed. I replaced if is_period is False with it (and if is_period is True with if is_period).

warnings.warn(
f"\'{name}\' is deprecated and will be removed "
f"in a future version, please use "
f"\'{c_OFFSET_DEPR_FREQSTR.get(name)}\' instead.",
f"\'{c_OFFSET_DEPR_FREQSTR.get(name.upper())}\' instead.",
FutureWarning,
stacklevel=find_stack_level(),
)
name = c_OFFSET_DEPR_FREQSTR[name]
if is_period is True and name in c_REVERSE_OFFSET_DEPR_FREQSTR:
if name.startswith("Y"):
name = c_OFFSET_DEPR_FREQSTR[name.upper()]
elif (is_period is False and
name != name.upper() and
name.upper() in c_REVERSE_OFFSET_DEPR_FREQSTR):
warnings.warn(
f"\'{name}\' is deprecated and will be removed "
f"in a future version, please use "
f"\'{name.upper()}\' instead.",
FutureWarning,
stacklevel=find_stack_level(),
)
name = name.upper()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here that if someone passes 'me' instead of 'ME', then it will warn?

I'm not sure this is the right place for this warning to be honest - like this it'll only show the warning for lowercase versions of aliases which have been renamed. For example:

In [5]: to_offset('qs')
Out[5]: <QuarterBegin: startingMonth=1>

doesn't emit any warning, and similarly for others which aren't part of that list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for noticing this, it's my mistake. I fixed it and updated the PR.

if is_period is True and name.upper() in c_REVERSE_OFFSET_DEPR_FREQSTR:
if name.upper().startswith("Y"):
raise ValueError(
f"for Period, please use \'Y{name[2:]}\' "
f"for Period, please use \'Y{name.upper()[2:]}\' "
f"instead of \'{name}\'"
)
if (name.startswith("B") or
name.startswith("S") or name.startswith("C")):
if (name.upper().startswith("B") or
name.upper().startswith("S") or
name.upper().startswith("C")):
raise ValueError(INVALID_FREQ_ERR_MSG.format(name))
else:
raise ValueError(
f"for Period, please use "
f"\'{c_REVERSE_OFFSET_DEPR_FREQSTR.get(name)}\' "
f"\'{c_REVERSE_OFFSET_DEPR_FREQSTR.get(name.upper())}\' "
f"instead of \'{name}\'"
)
elif is_period is True and name in c_OFFSET_DEPR_FREQSTR:
if name.startswith("A"):
elif is_period is True and name.upper() in c_OFFSET_DEPR_FREQSTR:
if name.upper().startswith("A"):
warnings.warn(
f"\'{name}\' is deprecated and will be removed in a future "
f"version, please use \'{c_DEPR_ABBREVS.get(name)}\' "
f"version, please use "
f"\'{c_DEPR_ABBREVS.get(name.upper())}\' instead.",
FutureWarning,
stacklevel=find_stack_level(),
)
if name.upper() != name:
warnings.warn(
f"\'{name}\' is deprecated and will be removed in "
f"a future version, please use \'{name.upper()}\' "
f"instead.",
FutureWarning,
stacklevel=find_stack_level(),
)
name = c_OFFSET_DEPR_FREQSTR.get(name)
name = c_OFFSET_DEPR_FREQSTR.get(name.upper())

if sep != "" and not sep.isspace():
raise ValueError("separator must be spaces")
Expand Down
15 changes: 12 additions & 3 deletions pandas/_libs/tslibs/timedeltas.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -719,15 +719,24 @@ cpdef inline str parse_timedelta_unit(str unit):
return "ns"
elif unit == "M":
return unit
elif unit in c_DEPR_ABBREVS:
elif unit.upper() == "MIN" and unit.lower() != unit:
warnings.warn(
f"\'{unit}\' is deprecated and will be removed in a "
f"future version. Please use \'{c_DEPR_ABBREVS.get(unit)}\' "
f"future version. Please use \'{unit.lower()}\' "
f"instead of \'{unit}\'.",
FutureWarning,
stacklevel=find_stack_level(),
)
unit = c_DEPR_ABBREVS[unit]
unit = unit.lower()
elif unit.upper() in c_DEPR_ABBREVS and unit != c_DEPR_ABBREVS.get(unit.upper()):
warnings.warn(
f"\'{unit}\' is deprecated and will be removed in a "
f"future version. Please use \'{c_DEPR_ABBREVS.get(unit.upper())}\' "
f"instead of \'{unit}\'.",
FutureWarning,
stacklevel=find_stack_level(),
)
unit = c_DEPR_ABBREVS[unit.upper()]
try:
return timedelta_abbrevs[unit.lower()]
except KeyError:
Expand Down
37 changes: 37 additions & 0 deletions pandas/tests/arrays/test_datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -784,6 +784,43 @@ def test_date_range_frequency_M_Q_Y_A_deprecated(self, freq, freq_depr):
result = pd.date_range("1/1/2000", periods=4, freq=freq_depr)
tm.assert_index_equal(result, expected)

@pytest.mark.parametrize(
"freq, freq_depr",
[
("2h", "2H"),
("2s", "2S"),
],
)
def test_date_range_uppercase_frequency_deprecated(self, freq, freq_depr):
# GH#9586, GH#54939
depr_msg = f"'{freq_depr[1:]}' is deprecated and will be removed in a "
f"future version. Please use '{freq[1:]}' instead."

expected = pd.date_range("1/1/2000", periods=4, freq=freq)
with tm.assert_produces_warning(FutureWarning, match=depr_msg):
result = pd.date_range("1/1/2000", periods=4, freq=freq_depr)
tm.assert_index_equal(result, expected)

@pytest.mark.parametrize(
"freq, freq_depr",
[
("2ME", "2me"),
("2ME", "2m"),
("2QE-SEP", "2q-sep"),
("2W", "2w"),
("2min", "2MIN"),
],
)
def test_date_range_lowercase_frequency_deprecated(self, freq, freq_depr):
# GH#9586, GH#54939
depr_msg = f"'{freq_depr[1:]}' is deprecated and will be removed in a "
f"future version, please use '{freq[1:]}' instead."

expected = pd.date_range("1/1/2000", periods=4, freq=freq)
with tm.assert_produces_warning(FutureWarning, match=depr_msg):
result = pd.date_range("1/1/2000", periods=4, freq=freq_depr)
tm.assert_index_equal(result, expected)


def test_factorize_sort_without_freq():
dta = DatetimeArray._from_sequence([0, 2, 1], dtype="M8[ns]")
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/frame/methods/test_at_time.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ def test_at_time_midnight(self, frame_or_series):

def test_at_time_nonexistent(self, frame_or_series):
# time doesn't exist
rng = date_range("1/1/2012", freq="23Min", periods=384)
rng = date_range("1/1/2012", freq="23min", periods=384)
ts = DataFrame(np.random.default_rng(2).standard_normal(len(rng)), rng)
ts = tm.get_obj(ts, frame_or_series)
rs = ts.at_time("16:00")
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/indexes/datetimes/methods/test_repeat.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ def test_repeat(self, tz_naive_fixture, unit):
reps = 2
msg = "the 'axis' parameter is not supported"

rng = date_range(start="2016-01-01", periods=2, freq="30Min", tz=tz, unit=unit)
rng = date_range(start="2016-01-01", periods=2, freq="30min", tz=tz, unit=unit)

expected_rng = DatetimeIndex(
[
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/indexes/datetimes/methods/test_round.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def test_round_invalid(self, freq, error_msg):

def test_round(self, tz_naive_fixture, unit):
tz = tz_naive_fixture
rng = date_range(start="2016-01-01", periods=5, freq="30Min", tz=tz, unit=unit)
rng = date_range(start="2016-01-01", periods=5, freq="30min", tz=tz, unit=unit)
elt = rng[1]

expected_rng = DatetimeIndex(
Expand Down Expand Up @@ -104,7 +104,7 @@ def test_round4(self, tz_naive_fixture):
def test_no_rounding_occurs(self, tz_naive_fixture):
# GH 21262
tz = tz_naive_fixture
rng = date_range(start="2016-01-01", periods=5, freq="2Min", tz=tz)
rng = date_range(start="2016-01-01", periods=5, freq="2min", tz=tz)

expected_rng = DatetimeIndex(
[
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/indexes/datetimes/test_partial_slicing.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ def test_partial_slice_second_precision(self):
rng = date_range(
start=datetime(2005, 1, 1, 0, 0, 59, microsecond=999990),
periods=20,
freq="US",
freq="us",
)
s = Series(np.arange(20), rng)

Expand Down
Loading