Skip to content

REF/TST: Fix remaining DatetimeArray with DateOffset arithmetic ops #23789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Nov 28, 2018
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
986fdbc
unrelated, change _window->libwindow
jbrockmendel Nov 19, 2018
fd75931
revert non-central
jbrockmendel Nov 19, 2018
66c866b
implement remaining methods to fix dateoffset arithmetic with DTA/TDA
jbrockmendel Nov 19, 2018
4dc17e2
xfail dataframe case
jbrockmendel Nov 19, 2018
da3459c
Merge branch 'master' of https://github.com/pandas-dev/pandas into et…
jbrockmendel Nov 20, 2018
348a8b2
Merge branch 'master' of https://github.com/pandas-dev/pandas into et…
jbrockmendel Nov 20, 2018
dd7e873
dont use verify_integrity, push one more level of test into parametrize
jbrockmendel Nov 20, 2018
c8351bc
Merge branch 'master' of https://github.com/pandas-dev/pandas into et…
jbrockmendel Nov 20, 2018
23a25d1
fix broken tests
jbrockmendel Nov 20, 2018
b4ae288
comment clarification
jbrockmendel Nov 22, 2018
d1ebdbf
Merge branch 'master' of https://github.com/pandas-dev/pandas into et…
jbrockmendel Nov 22, 2018
711ee61
dummy commit to force CI
jbrockmendel Nov 22, 2018
9338b5b
Merge branch 'master' of https://github.com/pandas-dev/pandas into et…
jbrockmendel Nov 25, 2018
5fbe9c8
comments about cached and non-cached implementations
jbrockmendel Nov 25, 2018
c7db0e4
Merge branch 'master' of https://github.com/pandas-dev/pandas into et…
jbrockmendel Nov 26, 2018
a4f9733
coverage
jbrockmendel Nov 26, 2018
b50fedf
special casing in freq_infer
jbrockmendel Nov 26, 2018
7e951e4
special casing followup
jbrockmendel Nov 26, 2018
317e1e7
fix simple_new args
jbrockmendel Nov 26, 2018
5de2d42
Merge branch 'master' of https://github.com/pandas-dev/pandas into et…
jbrockmendel Nov 27, 2018
5433a71
privatize
jbrockmendel Nov 27, 2018
dc137f3
Merge branch 'master' of https://github.com/pandas-dev/pandas into et…
jbrockmendel Nov 27, 2018
2c65f3b
more kludges
jbrockmendel Nov 27, 2018
31c5c0b
Merge branch 'master' of https://github.com/pandas-dev/pandas into et…
jbrockmendel Nov 27, 2018
c3d775e
typo fixup
jbrockmendel Nov 27, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -376,13 +376,15 @@ cdef class _Timestamp(datetime):
neg_other = -other
return self + neg_other

typ = getattr(other, '_typ', None)

# a Timestamp-DatetimeIndex -> yields a negative TimedeltaIndex
elif getattr(other, '_typ', None) == 'datetimeindex':
if typ in ('datetimeindex', 'datetimearray'):
# timezone comparison is performed in DatetimeIndex._sub_datelike
return -other.__sub__(self)

# a Timestamp-TimedeltaIndex -> yields a negative TimedeltaIndex
elif getattr(other, '_typ', None) == 'timedeltaindex':
elif typ in ('timedeltaindex', 'timedeltaarray'):
return (-other).__add__(self)

elif other is NaT:
Expand Down
43 changes: 36 additions & 7 deletions pandas/core/arrays/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

import numpy as np

from pandas._libs import tslibs
from pandas._libs import algos, tslibs
from pandas._libs.tslibs import NaT, Timedelta, Timestamp, iNaT
from pandas._libs.tslibs.fields import get_timedelta_field
from pandas._libs.tslibs.timedeltas import (
Expand All @@ -24,7 +24,7 @@
from pandas.core.dtypes.missing import isna

from pandas.core import ops
from pandas.core.algorithms import checked_add_with_arr
from pandas.core.algorithms import checked_add_with_arr, unique1d
import pandas.core.common as com

from pandas.tseries.frequencies import to_offset
Expand Down Expand Up @@ -162,15 +162,32 @@ def _simple_new(cls, values, freq=None, dtype=_TD_DTYPE):
result._freq = freq
return result

def __new__(cls, values, freq=None, dtype=_TD_DTYPE):
def __new__(cls, values, freq=None, dtype=_TD_DTYPE, copy=False):

freq, freq_infer = dtl.maybe_infer_freq(freq)

values = np.array(values, copy=False)
if values.dtype == np.object_:
values = array_to_timedelta64(values)
values, inferred_freq = sequence_to_td64ns(
values, copy=copy, unit=None)
if inferred_freq is not None:
if freq is not None and freq != inferred_freq:
raise ValueError('Inferred frequency {inferred} from passed '
'values does not conform to passed frequency '
'{passed}'
.format(inferred=inferred_freq,
passed=freq.freqstr))
elif freq is None:
# TODO: should this be the stronger condition `if freq_infer`?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify this TODO comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. We might also be able to resolve the answer as "no" in this thread.

ATM if we get an inferred_freq here, we are setting freq = inferred_freq regardless of whether the user passed freq="infer". In particular, we are ignoring the possibility that the user passed freq=None specifically wanting the result to have no freq.

# i.e what if the user passed `freq=None` and specifically
# wanted freq=None in the result?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is a valid condition. what are you trying to do here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is a valid condition. what are you trying to do here?

To be clear, this is the same as the existing logic in TimedeltaIndex.__new__. The question is if we're sure this is the desired behavior. ATM:

tdi = pd.timedelta_range('1D', periods=5, freq='D')
tdi2 = pd.TimedeltaIndex(tdi, freq=None)
tdi3 = pd.TimedeltaIndex(tdi, freq="infer")

tdi2 and tdi3 both have daily freq, and there is no extant way for the user to say "no, seriously, I want freq=None" except by setting it manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt a user setting a freq is actually used anywhere, not even sure of the purpose of it; nor is it likely tested.

This is just adding complextiy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll remove the comment

freq = inferred_freq
freq_infer = False

result = cls._simple_new(values, freq=freq)
# check that we are matching freqs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could simplify a lot of these checks if also pass inferred_freq (optionally to ._validate_frequencey, and for example handle the fact that result could be a scalar (just ignore the validation) / sequence (do the validation)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill take a look at this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that we can change this from a 3-4 liner into a 1-2 liner. Since this pattern shows up in all four of TDA/DTA/TDI/DTI constructors (actually, future tense for DTA), I'd like to do change them all at once in a dedicated follow-up

if inferred_freq is None and len(result) > 0:
if freq is not None and not freq_infer:
cls._validate_frequency(result, freq)

if freq_infer:
result.freq = to_offset(result.inferred_freq)

Expand Down Expand Up @@ -227,6 +244,18 @@ def _validate_fill_value(self, fill_value):
"Got '{got}'.".format(got=fill_value))
return fill_value

@property
def is_monotonic_increasing(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When calling the constructor with freq="infer" we end up calling tseries.frequencies.infer_freq, which accesses this property

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I already expressed this previously, but I would prefer that we have this discussion for all the Arrays, not just for TimedeltaArray (or DatetimeArray), as this is not a datetime-specific attribute. If you don't want to have that discussion first, you can always make it a private attribute here, and check that in infer_freq (or call the algos methods there if the passed values don't have such an attribute)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a discussion you'd like to have for EAs more generally, go ahead and open an issue for it.

I would be +1 on putting these attributes in the mixin class so that they are available on all three of DTA/TDA/PA, but for now they are only needed on TDA.

Special-casing inside infer_freq (or more specifically FrequencyInferer is a pretty ugly hack that I'd rather avoid. That said, if you're going to insist on it, I might as well get it over with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel I would really prefer if you leave this out of this PR (I mean adding the attributes, so which means special casing this in infer_freq).

Looking at the code again, I think the easiest to do is to ensure what is passed to _FrequencyInferrer is an actual index. So if we have an array, simply convert to Index without copy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converting to an Index would break the "Array should be ignorant of Index" rule discussed elsewhere.

I'll special-case this to get this over with, but I maintain this is introducing a code smell.

return algos.is_monotonic(self.asi8, timelike=True)[0]

@property
def is_monotonic_decreasing(self):
return algos.is_monotonic(self.asi8, timelike=True)[1]

@property
def is_unique(self):
return len(unique1d(self.asi8)) == len(self)

# ----------------------------------------------------------------
# Arithmetic Methods

Expand Down Expand Up @@ -281,7 +310,7 @@ def _add_datetimelike_scalar(self, other):
result = checked_add_with_arr(i8, other.value,
arr_mask=self._isnan)
result = self._maybe_mask_results(result)
return DatetimeArrayMixin(result, tz=other.tz)
return DatetimeArrayMixin(result, tz=other.tz, freq=self.freq)

def _addsub_offset_array(self, other, op):
# Add or subtract Array-like of DateOffset objects
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/indexes/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,10 @@ def _format_native_types(self, na_rep=u'NaT', date_format=None, **kwargs):
# -------------------------------------------------------------------
# Wrapping TimedeltaArray

is_monotonic_increasing = Index.is_monotonic_increasing
is_monotonic_decreasing = Index.is_monotonic_decreasing
is_unique = Index.is_unique
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to time this, we have a dedicated routine in the cython engine for this, the main reason for using it is it a O(n) op once the hashtable is computed, rather than a non-cached computation which you did above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. TimedeltaIndex uses the _engine-based implementation that is available because we know TDI is immutable. TimedeltaArray uses the naive implementation, at least for now. There's an Issue to investigate caching on PeriodArray which can be extended to TDA/DTA when the time comes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not obvious at all. i would expect a comment on this. maybe even do this in a separate PR, with testing for this. I am not sure its relevant to the changes in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment. It is definitely relevant to this PR, since without implementing these in the TDA calls to infer_freq (via __new__) raise AttributeError


__mul__ = Index.__mul__
__rmul__ = Index.__rmul__
__truediv__ = Index.__truediv__
Expand Down
Loading