-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: Add CalendarDay ('CD') offset #22288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 28 commits
7d6528a
96f7348
7edf6e6
af599da
5918295
a060908
3232c26
9605aa1
4062597
fea7a2b
3330c9b
c664b28
e2e059a
04b35af
650ab15
af9117e
44349bf
26eae9e
307073c
4698d05
dc15dd3
846faa2
cbfecc5
e3d8c44
585c2b9
33eea98
40cb82f
496f2e7
815811e
637d92a
28c6716
cb76983
4cd5c95
dc2889d
0441ff1
85929bf
0884d96
df07647
9ed681c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -369,7 +369,7 @@ In practice this becomes very cumbersome because we often need a very long | |
index with a large number of timestamps. If we need timestamps on a regular | ||
frequency, we can use the :func:`date_range` and :func:`bdate_range` functions | ||
to create a ``DatetimeIndex``. The default frequency for ``date_range`` is a | ||
**calendar day** while the default for ``bdate_range`` is a **business day**: | ||
**day** while the default for ``bdate_range`` is a **business day**: | ||
|
||
.. ipython:: python | ||
|
||
|
@@ -886,6 +886,27 @@ normalized after the function is applied. | |
hour.apply(pd.Timestamp('2014-01-01 23:00')) | ||
|
||
|
||
.. _timeseries.dayvscalendarday: | ||
|
||
Day vs. CalendarDay | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
:class:`Day` (``'D'``) is a timedelta-like offset that respects absolute time | ||
arithmetic and is an alias for 24 :class:`Hour`. This offset is the default | ||
argument to many pandas time related function like :func:`date_range` and :func:`timedelta_range`. | ||
|
||
:class:`CalendarDay` (``'CD'``) is a relativedelta-like offset that respects | ||
calendar time arithmetic. :class:`CalendarDay` is useful preserving calendar day | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. meaning that we land on the same DST-preserved hour, right? |
||
semantics with date times with have day light savings transitions, i.e. :class:`CalendarDay` | ||
will preserve the hour before the day light savings transition. | ||
|
||
.. ipython:: python | ||
|
||
ts = pd.Timestamp('2016-10-30 00:00:00', tz='Europe/Helsinki') | ||
ts + pd.offsets.Day(1) | ||
ts + pd.offsets.CalendarDay(1) | ||
|
||
|
||
Parametric Offsets | ||
~~~~~~~~~~~~~~~~~~ | ||
|
||
|
@@ -1176,7 +1197,8 @@ frequencies. We will refer to these aliases as *offset aliases*. | |
|
||
"B", "business day frequency" | ||
"C", "custom business day frequency" | ||
"D", "calendar day frequency" | ||
"D", "day frequency" | ||
"CD", "calendar day frequency" | ||
"W", "weekly frequency" | ||
"M", "month end frequency" | ||
"SM", "semi-month end frequency (15th and end of month)" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -285,6 +285,46 @@ that the dates have been converted to UTC | |
.. ipython:: python | ||
pd.to_datetime(["2015-11-18 15:30:00+05:30", "2015-11-18 16:30:00+06:30"], utc=True) | ||
|
||
.. _whatsnew_0240.api_breaking.calendarday: | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you make a new section header here (or is there a reason you don't need one?) |
||
CalendarDay Offset | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
:class:`Day` and associated frequency alias ``'D'`` were documented to represent | ||
a calendar day; however, arithmetic and operations with :class:`Day` sometimes | ||
respected absolute time instead (i.e. ``Day(n)`` and acted identically to ``Timedelta(days=n)``). | ||
|
||
*Previous Behavior*: | ||
|
||
.. code-block:: ipython | ||
|
||
|
||
In [2]: ts = pd.Timestamp('2016-10-30 00:00:00', tz='Europe/Helsinki') | ||
|
||
# Respects calendar arithmetic | ||
In [3]: pd.date_range(start=ts, freq='D', periods=3) | ||
Out[3]: | ||
DatetimeIndex(['2016-10-30 00:00:00+03:00', '2016-10-31 00:00:00+02:00', | ||
'2016-11-01 00:00:00+02:00'], | ||
dtype='datetime64[ns, Europe/Helsinki]', freq='D') | ||
|
||
# Respects absolute arithmetic | ||
In [4]: ts + pd.tseries.frequencies.to_offset('D') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would make this section a bit smaller and push things to the docs themselves |
||
Out[4]: Timestamp('2016-10-30 23:00:00+0200', tz='Europe/Helsinki') | ||
|
||
:class:`CalendarDay` and associated frequency alias ``'CD'`` are now available | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you put a ref to the section in timeseries.rst |
||
and respect calendar day arithmetic while :class:`Day` and frequency alias ``'D'`` | ||
will now respect absolute time (:issue:`22274`, :issue:`20596`, :issue:`16980`, :issue:`8774`) | ||
See the :ref:`documentation here <timeseries.dayvscalendarday>` for more information. | ||
|
||
Addition with :class:`CalendarDay` across a daylight savings time transition: | ||
|
||
.. ipython:: python | ||
|
||
ts = pd.Timestamp('2016-10-30 00:00:00', tz='Europe/Helsinki') | ||
ts + pd.offsets.Day(1) | ||
ts + pd.offsets.CalendarDay(1) | ||
|
||
.. _whatsnew_0240.api_breaking.period_end_time: | ||
|
||
Time values in ``dt.end_time`` and ``to_timestamp(how='end')`` | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,7 +32,7 @@ | |
from pandas.core import ops | ||
|
||
from pandas.tseries.frequencies import to_offset | ||
from pandas.tseries.offsets import Tick, Day, generate_range | ||
from pandas.tseries.offsets import Tick, generate_range | ||
|
||
from pandas.core.arrays import datetimelike as dtl | ||
|
||
|
@@ -239,56 +239,38 @@ def _generate_range(cls, start, end, periods, freq, tz=None, | |
start, end, _normalized = _maybe_normalize_endpoints(start, end, | ||
normalize) | ||
|
||
tz, inferred_tz = _infer_tz_from_endpoints(start, end, tz) | ||
tz, _ = _infer_tz_from_endpoints(start, end, tz) | ||
|
||
if hasattr(freq, 'delta') and freq != Day(): | ||
# sub-Day Tick | ||
if inferred_tz is None and tz is not None: | ||
# naive dates | ||
if start is not None and start.tz is None: | ||
start = start.tz_localize(tz, ambiguous=False) | ||
# Make sure start and end are timezone localized if: | ||
# 1) freq = a Timedelta-like frequency (Tick) | ||
# 2) freq = None i.e. generating a linspaced range | ||
if isinstance(freq, Tick) or freq is None: | ||
localize_args = {'tz': tz, 'ambiguous': False} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think its worthile to add freq to |
||
else: | ||
localize_args = {'tz': None} | ||
|
||
if end is not None and end.tz is None: | ||
end = end.tz_localize(tz, ambiguous=False) | ||
if tz is not None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a blank line here |
||
# Localize the start and end arguments | ||
if start is not None and start.tz is None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you could use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm I think that should work! |
||
start = start.tz_localize(**localize_args) | ||
|
||
if start and end: | ||
if start.tz is None and end.tz is not None: | ||
start = start.tz_localize(end.tz, ambiguous=False) | ||
if end is not None and end.tz is None: | ||
end = end.tz_localize(**localize_args) | ||
|
||
if end.tz is None and start.tz is not None: | ||
end = end.tz_localize(start.tz, ambiguous=False) | ||
if start and end: | ||
# Make sure start and end have the same tz | ||
start = _maybe_localize_point(start, start.tz, end.tz, | ||
localize_args) | ||
end = _maybe_localize_point(end, end.tz, start.tz, localize_args) | ||
|
||
if freq is not None: | ||
if cls._use_cached_range(freq, _normalized, start, end): | ||
# Currently always False; never hit | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we expecting this to be hit in the near future? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably not for a while. There's an existing issue regarding caching offsets (#17914), and I think this infrastructure might useful for that? |
||
index = cls._cached_range(start, end, periods=periods, | ||
freq=freq) | ||
else: | ||
index = _generate_regular_range(cls, start, end, periods, freq) | ||
|
||
else: | ||
|
||
if tz is not None: | ||
# naive dates | ||
if start is not None and start.tz is not None: | ||
start = start.replace(tzinfo=None) | ||
|
||
if end is not None and end.tz is not None: | ||
end = end.replace(tzinfo=None) | ||
|
||
if start and end: | ||
if start.tz is None and end.tz is not None: | ||
end = end.replace(tzinfo=None) | ||
|
||
if end.tz is None and start.tz is not None: | ||
start = start.replace(tzinfo=None) | ||
|
||
if freq is not None: | ||
if cls._use_cached_range(freq, _normalized, start, end): | ||
index = cls._cached_range(start, end, periods=periods, | ||
freq=freq) | ||
else: | ||
index = _generate_regular_range(cls, start, end, | ||
periods, freq) | ||
|
||
if tz is not None and getattr(index, 'tz', None) is None: | ||
arr = conversion.tz_localize_to_utc( | ||
ensure_int64(index.values), | ||
|
@@ -302,12 +284,10 @@ def _generate_range(cls, start, end, periods, freq, tz=None, | |
start = start.tz_localize(tz).asm8 | ||
if end is not None: | ||
end = end.tz_localize(tz).asm8 | ||
else: | ||
# Create a linearly spaced date_range in local time | ||
start = start.tz_localize(tz) | ||
end = end.tz_localize(tz) | ||
arr = np.linspace(start.value, end.value, periods) | ||
index = cls._simple_new(arr.astype('M8[ns]'), freq=None, tz=tz) | ||
else: | ||
# Create a linearly spaced date_range in local time | ||
arr = np.linspace(start.value, end.value, periods) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you do this (does it work?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This failed on the numpy compat build unfortunately. |
||
index = cls._simple_new(arr.astype('M8[ns]'), freq=None, tz=tz) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can add |
||
|
||
if not left_closed and len(index) and index[0] == start: | ||
index = index[1:] | ||
|
@@ -1255,10 +1235,10 @@ def _generate_regular_range(cls, start, end, periods, freq): | |
data = cls._simple_new(data.view(_NS_DTYPE), None, tz=tz) | ||
else: | ||
tz = None | ||
# start and end should have the same timezone by this point | ||
if isinstance(start, Timestamp): | ||
tz = start.tz | ||
|
||
if isinstance(end, Timestamp): | ||
elif isinstance(end, Timestamp): | ||
tz = end.tz | ||
|
||
xdr = generate_range(start=start, end=end, | ||
|
@@ -1329,3 +1309,24 @@ def _maybe_normalize_endpoints(start, end, normalize): | |
_normalized = _normalized and end.time() == _midnight | ||
|
||
return start, end, _normalized | ||
|
||
|
||
def _maybe_localize_point(ts, is_none, is_not_none, localize_args): | ||
""" | ||
Localize a start or end Timestamp to the timezone of the corresponding | ||
start or end Timestamp | ||
|
||
Parameters | ||
---------- | ||
ts : start or end Timestamp to potentially localize | ||
is_none : tz argument that should be None | ||
is_not_none : tz argument that should not be None | ||
localize_args : dict to pass to tz_localize | ||
|
||
Returns | ||
------- | ||
ts : Timestamp | ||
""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. e.g. figure out localize_args here (from a passed freq) |
||
if is_none is None and is_not_none is not None: | ||
ts = ts.tz_localize(**localize_args) | ||
return ts |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -385,7 +385,10 @@ def _generate_range(cls, start, end, periods, name=None, freq=None, | |
|
||
@classmethod | ||
def _use_cached_range(cls, freq, _normalized, start, end): | ||
return _use_cached_range(freq, _normalized, start, end) | ||
# Note: This always returns False | ||
return (freq._should_cache() and | ||
not (freq._normalize_cache and not _normalized) and | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not unique to this PR, but I think freq._should_cache and freq._normalize_cache may be unnecessary; those predate DateOffsets being immutable |
||
_naive_in_cache_range(start, end)) | ||
|
||
def _convert_for_op(self, value): | ||
""" Convert value to be insertable to ndarray """ | ||
|
@@ -1580,7 +1583,7 @@ def date_range(start=None, end=None, periods=None, freq=None, tz=None, | |
Right bound for generating dates. | ||
periods : integer, optional | ||
Number of periods to generate. | ||
freq : str or DateOffset, default 'D' (calendar daily) | ||
freq : str or DateOffset, default 'D' | ||
Frequency strings can have multiples, e.g. '5H'. See | ||
:ref:`here <timeseries.offset_aliases>` for a list of | ||
frequency aliases. | ||
|
@@ -1861,17 +1864,7 @@ def _naive_in_cache_range(start, end): | |
else: | ||
if start.tzinfo is not None or end.tzinfo is not None: | ||
return False | ||
return _in_range(start, end, _CACHE_START, _CACHE_END) | ||
|
||
|
||
def _in_range(start, end, rng_start, rng_end): | ||
return start > rng_start and end < rng_end | ||
|
||
|
||
def _use_cached_range(freq, _normalized, start, end): | ||
return (freq._should_cache() and | ||
not (freq._normalize_cache and not _normalized) and | ||
_naive_in_cache_range(start, end)) | ||
return start > _CACHE_START and end < _CACHE_END | ||
|
||
|
||
def _time_to_micros(time): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -259,88 +259,18 @@ def test_to_period_microsecond(self): | |
assert period[0] == Period('2007-01-01 10:11:12.123456Z', 'U') | ||
assert period[1] == Period('2007-01-01 10:11:13.789123Z', 'U') | ||
|
||
def test_to_period_tz_pytz(self): | ||
from pytz import utc as UTC | ||
|
||
xp = date_range('1/1/2000', '4/1/2000').to_period() | ||
|
||
ts = date_range('1/1/2000', '4/1/2000', tz='US/Eastern') | ||
|
||
result = ts.to_period()[0] | ||
expected = ts[0].to_period() | ||
|
||
assert result == expected | ||
tm.assert_index_equal(ts.to_period(), xp) | ||
|
||
ts = date_range('1/1/2000', '4/1/2000', tz=UTC) | ||
|
||
result = ts.to_period()[0] | ||
expected = ts[0].to_period() | ||
|
||
assert result == expected | ||
tm.assert_index_equal(ts.to_period(), xp) | ||
|
||
ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal()) | ||
|
||
result = ts.to_period()[0] | ||
expected = ts[0].to_period() | ||
|
||
assert result == expected | ||
tm.assert_index_equal(ts.to_period(), xp) | ||
|
||
def test_to_period_tz_explicit_pytz(self): | ||
xp = date_range('1/1/2000', '4/1/2000').to_period() | ||
|
||
ts = date_range('1/1/2000', '4/1/2000', tz=pytz.timezone('US/Eastern')) | ||
|
||
result = ts.to_period()[0] | ||
expected = ts[0].to_period() | ||
|
||
assert result == expected | ||
tm.assert_index_equal(ts.to_period(), xp) | ||
|
||
ts = date_range('1/1/2000', '4/1/2000', tz=pytz.utc) | ||
|
||
result = ts.to_period()[0] | ||
expected = ts[0].to_period() | ||
|
||
assert result == expected | ||
tm.assert_index_equal(ts.to_period(), xp) | ||
|
||
ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal()) | ||
|
||
@pytest.mark.parametrize('tz', [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. worth making a fixure out of this? (dst_tz_fitxure)? |
||
'US/Eastern', pytz.utc, tzlocal(), 'dateutil/US/Eastern', | ||
dateutil.tz.tzutc()]) | ||
def test_to_period_tz(self, tz): | ||
ts = date_range('1/1/2000', '2/1/2000', tz=tz) | ||
result = ts.to_period()[0] | ||
expected = ts[0].to_period() | ||
|
||
assert result == expected | ||
tm.assert_index_equal(ts.to_period(), xp) | ||
|
||
def test_to_period_tz_dateutil(self): | ||
xp = date_range('1/1/2000', '4/1/2000').to_period() | ||
|
||
ts = date_range('1/1/2000', '4/1/2000', tz='dateutil/US/Eastern') | ||
|
||
result = ts.to_period()[0] | ||
expected = ts[0].to_period() | ||
|
||
assert result == expected | ||
tm.assert_index_equal(ts.to_period(), xp) | ||
|
||
ts = date_range('1/1/2000', '4/1/2000', tz=dateutil.tz.tzutc()) | ||
|
||
result = ts.to_period()[0] | ||
expected = ts[0].to_period() | ||
|
||
assert result == expected | ||
tm.assert_index_equal(ts.to_period(), xp) | ||
|
||
ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal()) | ||
|
||
result = ts.to_period()[0] | ||
expected = ts[0].to_period() | ||
|
||
assert result == expected | ||
tm.assert_index_equal(ts.to_period(), xp) | ||
expected = date_range('1/1/2000', '2/1/2000').to_period() | ||
result = ts.to_period() | ||
tm.assert_index_equal(result, expected) | ||
|
||
def test_to_period_nofreq(self): | ||
idx = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-04']) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a label here