-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DatetimeIndex + arraylike of DateOffsets #18849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: DatetimeIndex + arraylike of DateOffsets #18849
Conversation
@jbrockmendel : As this PR stands, I'm a little uneasy about it. Besides the |
@pytest.mark.parametrize('box', [np.array, pd.Index]) | ||
def test_dti_add_offset_array(self, tz, box): | ||
dti = pd.date_range('2017-01-01', periods=2, tz=tz) | ||
# TODO: check that `name` propogates correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference issue number below all of your new tests.
pandas/core/indexes/datetimelike.py
Outdated
return self - other[0] | ||
else: | ||
from pandas.errors import PerformanceWarning | ||
warnings.warn("Adding/subtracting array of DateOffsets to " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see that you added this yourself. Still, feel a little uneasy about this.
This is explicitly copying the Series behavior. |
Hmmm...I see. Theoretically, it makes sense. Though if we can avoid adding this warning, that would be great. Thus, either you can confirm that there is indeed a performance difference OR the code is rewritten so that we avoid this overhead. |
I've been working on offsets a lot recently (and upcoming, see #18854). Perf has improved quite a bit, but I can absolutely confirm that this is not a speedy operation. |
pandas/core/indexes/datetimelike.py
Outdated
if len(other) == 1: | ||
return self + other[0] | ||
else: | ||
from pandas.errors import PerformanceWarning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you are adding code here that already exists in datetimes.py:_add_offset
? is there a reason you are not dispatching to that (which may need to handle array-likes in addition to scalars). Further I am not in love with explicit type checking for DTI here. again why are you not handling that at a higher level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_add_offset is for a single offset which does not have an implementation of apply_index
. This is for an array of offsets. It's a case handled by Series but not by DatetimeIndex.
I can make a _add_offset_array and dispatch to that.
Codecov Report
@@ Coverage Diff @@
## master #18849 +/- ##
==========================================
- Coverage 91.6% 91.58% -0.03%
==========================================
Files 150 150
Lines 48939 48966 +27
==========================================
+ Hits 44833 44845 +12
- Misses 4106 4121 +15
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. just some additional testing requested around series + offset series names.
name=dti.name, freq='infer') | ||
tm.assert_index_equal(res, expected) | ||
|
||
def test_dti_with_offset_series(self, tz): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parametrize this with name (None, same name, other name) to make sure they are propagated correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Looks like Series.op(Index) was always taking on the name of the Series.
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -282,6 +282,7 @@ Conversion | |||
- Bug in :meth:`Index.astype` with a categorical dtype where the resultant index is not converted to a :class:`CategoricalIndex` for all types of index (:issue:`18630`) | |||
- Bug in :meth:`Series.astype` and ``Categorical.astype()`` where an existing categorical data does not get updated (:issue:`10696`, :issue:`18593`) | |||
- Bug in :class:`Series` constructor with an int or float list where specifying ``dtype=str``, ``dtype='str'`` or ``dtype='U'`` failed to convert the data elements to strings (:issue:`16605`) | |||
- Bug in :class:`DatetimeIndex` where adding or subtracting an array-like of ``DateOffset`` objects either raised (``np.array``, ``pd.Index``) or broadcast incorrectly (``pd.Series``) (:issue:`18224`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use this PR number here (as 18224 is a very general reference). Is there any issue for this one specifically? (don't create one, just if there is an open one).
…i_add_offset_array
…i_add_offset_array
…i_add_offset_array
thanks! |
Before:
After:
Caveat This will need a follow-up to make sure
name
attribute is propogated correctly.git diff upstream/master -u -- "*.py" | flake8 --diff