Skip to content

Commit 05a7229

Browse files
mroeschkejreback
authored andcommitted
API: Add CalendarDay ('CD') offset (#22288)
1 parent 14ccdd2 commit 05a7229

File tree

16 files changed

+338
-200
lines changed

16 files changed

+338
-200
lines changed

doc/source/timeseries.rst

+24-2
Original file line numberDiff line numberDiff line change
@@ -369,7 +369,7 @@ In practice this becomes very cumbersome because we often need a very long
369369
index with a large number of timestamps. If we need timestamps on a regular
370370
frequency, we can use the :func:`date_range` and :func:`bdate_range` functions
371371
to create a ``DatetimeIndex``. The default frequency for ``date_range`` is a
372-
**calendar day** while the default for ``bdate_range`` is a **business day**:
372+
**day** while the default for ``bdate_range`` is a **business day**:
373373

374374
.. ipython:: python
375375
@@ -886,6 +886,27 @@ normalized after the function is applied.
886886
hour.apply(pd.Timestamp('2014-01-01 23:00'))
887887
888888
889+
.. _timeseries.dayvscalendarday:
890+
891+
Day vs. CalendarDay
892+
~~~~~~~~~~~~~~~~~~~
893+
894+
:class:`Day` (``'D'``) is a timedelta-like offset that respects absolute time
895+
arithmetic and is an alias for 24 :class:`Hour`. This offset is the default
896+
argument to many pandas time related function like :func:`date_range` and :func:`timedelta_range`.
897+
898+
:class:`CalendarDay` (``'CD'``) is a relativedelta-like offset that respects
899+
calendar time arithmetic. :class:`CalendarDay` is useful preserving calendar day
900+
semantics with date times with have day light savings transitions, i.e. :class:`CalendarDay`
901+
will preserve the hour before the day light savings transition.
902+
903+
.. ipython:: python
904+
905+
ts = pd.Timestamp('2016-10-30 00:00:00', tz='Europe/Helsinki')
906+
ts + pd.offsets.Day(1)
907+
ts + pd.offsets.CalendarDay(1)
908+
909+
889910
Parametric Offsets
890911
~~~~~~~~~~~~~~~~~~
891912

@@ -1176,7 +1197,8 @@ frequencies. We will refer to these aliases as *offset aliases*.
11761197

11771198
"B", "business day frequency"
11781199
"C", "custom business day frequency"
1179-
"D", "calendar day frequency"
1200+
"D", "day frequency"
1201+
"CD", "calendar day frequency"
11801202
"W", "weekly frequency"
11811203
"M", "month end frequency"
11821204
"SM", "semi-month end frequency (15th and end of month)"

doc/source/whatsnew/v0.24.0.txt

+40
Original file line numberDiff line numberDiff line change
@@ -285,6 +285,46 @@ that the dates have been converted to UTC
285285
.. ipython:: python
286286
pd.to_datetime(["2015-11-18 15:30:00+05:30", "2015-11-18 16:30:00+06:30"], utc=True)
287287

288+
.. _whatsnew_0240.api_breaking.calendarday:
289+
290+
CalendarDay Offset
291+
^^^^^^^^^^^^^^^^^^
292+
293+
:class:`Day` and associated frequency alias ``'D'`` were documented to represent
294+
a calendar day; however, arithmetic and operations with :class:`Day` sometimes
295+
respected absolute time instead (i.e. ``Day(n)`` and acted identically to ``Timedelta(days=n)``).
296+
297+
*Previous Behavior*:
298+
299+
.. code-block:: ipython
300+
301+
302+
In [2]: ts = pd.Timestamp('2016-10-30 00:00:00', tz='Europe/Helsinki')
303+
304+
# Respects calendar arithmetic
305+
In [3]: pd.date_range(start=ts, freq='D', periods=3)
306+
Out[3]:
307+
DatetimeIndex(['2016-10-30 00:00:00+03:00', '2016-10-31 00:00:00+02:00',
308+
'2016-11-01 00:00:00+02:00'],
309+
dtype='datetime64[ns, Europe/Helsinki]', freq='D')
310+
311+
# Respects absolute arithmetic
312+
In [4]: ts + pd.tseries.frequencies.to_offset('D')
313+
Out[4]: Timestamp('2016-10-30 23:00:00+0200', tz='Europe/Helsinki')
314+
315+
:class:`CalendarDay` and associated frequency alias ``'CD'`` are now available
316+
and respect calendar day arithmetic while :class:`Day` and frequency alias ``'D'``
317+
will now respect absolute time (:issue:`22274`, :issue:`20596`, :issue:`16980`, :issue:`8774`)
318+
See the :ref:`documentation here <timeseries.dayvscalendarday>` for more information.
319+
320+
Addition with :class:`CalendarDay` across a daylight savings time transition:
321+
322+
.. ipython:: python
323+
324+
ts = pd.Timestamp('2016-10-30 00:00:00', tz='Europe/Helsinki')
325+
ts + pd.offsets.Day(1)
326+
ts + pd.offsets.CalendarDay(1)
327+
288328
.. _whatsnew_0240.api_breaking.period_end_time:
289329

290330
Time values in ``dt.end_time`` and ``to_timestamp(how='end')``

pandas/core/arrays/datetimes.py

+59-53
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
from pandas.core import ops
3333

3434
from pandas.tseries.frequencies import to_offset
35-
from pandas.tseries.offsets import Tick, Day, generate_range
35+
from pandas.tseries.offsets import Tick, generate_range
3636

3737
from pandas.core.arrays import datetimelike as dtl
3838

@@ -239,56 +239,33 @@ def _generate_range(cls, start, end, periods, freq, tz=None,
239239
start, end, _normalized = _maybe_normalize_endpoints(start, end,
240240
normalize)
241241

242-
tz, inferred_tz = _infer_tz_from_endpoints(start, end, tz)
243-
244-
if hasattr(freq, 'delta') and freq != Day():
245-
# sub-Day Tick
246-
if inferred_tz is None and tz is not None:
247-
# naive dates
248-
if start is not None and start.tz is None:
249-
start = start.tz_localize(tz, ambiguous=False)
250-
251-
if end is not None and end.tz is None:
252-
end = end.tz_localize(tz, ambiguous=False)
253-
254-
if start and end:
255-
if start.tz is None and end.tz is not None:
256-
start = start.tz_localize(end.tz, ambiguous=False)
257-
258-
if end.tz is None and start.tz is not None:
259-
end = end.tz_localize(start.tz, ambiguous=False)
260-
242+
tz, _ = _infer_tz_from_endpoints(start, end, tz)
243+
244+
if tz is not None:
245+
# Localize the start and end arguments
246+
start = _maybe_localize_point(
247+
start, getattr(start, 'tz', None), start, freq, tz
248+
)
249+
end = _maybe_localize_point(
250+
end, getattr(end, 'tz', None), end, freq, tz
251+
)
252+
if start and end:
253+
# Make sure start and end have the same tz
254+
start = _maybe_localize_point(
255+
start, start.tz, end.tz, freq, tz
256+
)
257+
end = _maybe_localize_point(
258+
end, end.tz, start.tz, freq, tz
259+
)
260+
if freq is not None:
261261
if cls._use_cached_range(freq, _normalized, start, end):
262+
# Currently always False; never hit
263+
# Should be reimplemented as apart of GH 17914
262264
index = cls._cached_range(start, end, periods=periods,
263265
freq=freq)
264266
else:
265267
index = _generate_regular_range(cls, start, end, periods, freq)
266268

267-
else:
268-
269-
if tz is not None:
270-
# naive dates
271-
if start is not None and start.tz is not None:
272-
start = start.replace(tzinfo=None)
273-
274-
if end is not None and end.tz is not None:
275-
end = end.replace(tzinfo=None)
276-
277-
if start and end:
278-
if start.tz is None and end.tz is not None:
279-
end = end.replace(tzinfo=None)
280-
281-
if end.tz is None and start.tz is not None:
282-
start = start.replace(tzinfo=None)
283-
284-
if freq is not None:
285-
if cls._use_cached_range(freq, _normalized, start, end):
286-
index = cls._cached_range(start, end, periods=periods,
287-
freq=freq)
288-
else:
289-
index = _generate_regular_range(cls, start, end,
290-
periods, freq)
291-
292269
if tz is not None and getattr(index, 'tz', None) is None:
293270
arr = conversion.tz_localize_to_utc(
294271
ensure_int64(index.values),
@@ -302,12 +279,12 @@ def _generate_range(cls, start, end, periods, freq, tz=None,
302279
start = start.tz_localize(tz).asm8
303280
if end is not None:
304281
end = end.tz_localize(tz).asm8
305-
else:
306-
# Create a linearly spaced date_range in local time
307-
start = start.tz_localize(tz)
308-
end = end.tz_localize(tz)
309-
arr = np.linspace(start.value, end.value, periods)
310-
index = cls._simple_new(arr.astype('M8[ns]'), freq=None, tz=tz)
282+
else:
283+
# Create a linearly spaced date_range in local time
284+
arr = np.linspace(start.value, end.value, periods)
285+
index = cls._simple_new(
286+
arr.astype('M8[ns]', copy=False), freq=None, tz=tz
287+
)
311288

312289
if not left_closed and len(index) and index[0] == start:
313290
index = index[1:]
@@ -1256,10 +1233,10 @@ def _generate_regular_range(cls, start, end, periods, freq):
12561233
data = cls._simple_new(data.view(_NS_DTYPE), None, tz=tz)
12571234
else:
12581235
tz = None
1236+
# start and end should have the same timezone by this point
12591237
if isinstance(start, Timestamp):
12601238
tz = start.tz
1261-
1262-
if isinstance(end, Timestamp):
1239+
elif isinstance(end, Timestamp):
12631240
tz = end.tz
12641241

12651242
xdr = generate_range(start=start, end=end,
@@ -1330,3 +1307,32 @@ def _maybe_normalize_endpoints(start, end, normalize):
13301307
_normalized = _normalized and end.time() == _midnight
13311308

13321309
return start, end, _normalized
1310+
1311+
1312+
def _maybe_localize_point(ts, is_none, is_not_none, freq, tz):
1313+
"""
1314+
Localize a start or end Timestamp to the timezone of the corresponding
1315+
start or end Timestamp
1316+
1317+
Parameters
1318+
----------
1319+
ts : start or end Timestamp to potentially localize
1320+
is_none : argument that should be None
1321+
is_not_none : argument that should not be None
1322+
freq : Tick, DateOffset, or None
1323+
tz : str, timezone object or None
1324+
1325+
Returns
1326+
-------
1327+
ts : Timestamp
1328+
"""
1329+
# Make sure start and end are timezone localized if:
1330+
# 1) freq = a Timedelta-like frequency (Tick)
1331+
# 2) freq = None i.e. generating a linspaced range
1332+
if isinstance(freq, Tick) or freq is None:
1333+
localize_args = {'tz': tz, 'ambiguous': False}
1334+
else:
1335+
localize_args = {'tz': None}
1336+
if is_none is None and is_not_none is not None:
1337+
ts = ts.tz_localize(**localize_args)
1338+
return ts

pandas/core/indexes/datetimes.py

+6-13
Original file line numberDiff line numberDiff line change
@@ -385,7 +385,10 @@ def _generate_range(cls, start, end, periods, name=None, freq=None,
385385

386386
@classmethod
387387
def _use_cached_range(cls, freq, _normalized, start, end):
388-
return _use_cached_range(freq, _normalized, start, end)
388+
# Note: This always returns False
389+
return (freq._should_cache() and
390+
not (freq._normalize_cache and not _normalized) and
391+
_naive_in_cache_range(start, end))
389392

390393
def _convert_for_op(self, value):
391394
""" Convert value to be insertable to ndarray """
@@ -1580,7 +1583,7 @@ def date_range(start=None, end=None, periods=None, freq=None, tz=None,
15801583
Right bound for generating dates.
15811584
periods : integer, optional
15821585
Number of periods to generate.
1583-
freq : str or DateOffset, default 'D' (calendar daily)
1586+
freq : str or DateOffset, default 'D'
15841587
Frequency strings can have multiples, e.g. '5H'. See
15851588
:ref:`here <timeseries.offset_aliases>` for a list of
15861589
frequency aliases.
@@ -1861,17 +1864,7 @@ def _naive_in_cache_range(start, end):
18611864
else:
18621865
if start.tzinfo is not None or end.tzinfo is not None:
18631866
return False
1864-
return _in_range(start, end, _CACHE_START, _CACHE_END)
1865-
1866-
1867-
def _in_range(start, end, rng_start, rng_end):
1868-
return start > rng_start and end < rng_end
1869-
1870-
1871-
def _use_cached_range(freq, _normalized, start, end):
1872-
return (freq._should_cache() and
1873-
not (freq._normalize_cache and not _normalized) and
1874-
_naive_in_cache_range(start, end))
1867+
return start > _CACHE_START and end < _CACHE_END
18751868

18761869

18771870
def _time_to_micros(time):

pandas/core/indexes/interval.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1052,7 +1052,7 @@ def interval_range(start=None, end=None, periods=None, freq=None,
10521052
freq : numeric, string, or DateOffset, default None
10531053
The length of each interval. Must be consistent with the type of start
10541054
and end, e.g. 2 for numeric, or '5H' for datetime-like. Default is 1
1055-
for numeric and 'D' (calendar daily) for datetime-like.
1055+
for numeric and 'D' for datetime-like.
10561056
name : string, default None
10571057
Name of the resulting IntervalIndex
10581058
closed : {'left', 'right', 'both', 'neither'}, default 'right'

pandas/core/indexes/period.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -840,7 +840,7 @@ def period_range(start=None, end=None, periods=None, freq='D', name=None):
840840
Right bound for generating periods
841841
periods : integer, default None
842842
Number of periods to generate
843-
freq : string or DateOffset, default 'D' (calendar daily)
843+
freq : string or DateOffset, default 'D'
844844
Frequency alias
845845
name : string, default None
846846
Name of the resulting PeriodIndex

pandas/core/indexes/timedeltas.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -737,7 +737,7 @@ def timedelta_range(start=None, end=None, periods=None, freq=None,
737737
Right bound for generating timedeltas
738738
periods : integer, default None
739739
Number of periods to generate
740-
freq : string or DateOffset, default 'D' (calendar daily)
740+
freq : string or DateOffset, default 'D'
741741
Frequency strings can have multiples, e.g. '5H'
742742
name : string, default None
743743
Name of the resulting TimedeltaIndex

0 commit comments

Comments
 (0)