REF/TST: Fix remaining DatetimeArray with DateOffset arithmetic ops #23789

jbrockmendel · 2018-11-19T18:30:46Z

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2018-11-19T18:30:56Z

Hello @jbrockmendel! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/arrays/timedeltas.py !
There are no PEP8 issues in the file pandas/core/indexes/timedeltas.py !
There are no PEP8 issues in the file pandas/tests/arithmetic/test_datetime64.py !
There are no PEP8 issues in the file pandas/tests/arithmetic/test_timedelta64.py !
There are no PEP8 issues in the file pandas/tseries/offsets.py !

jbrockmendel · 2018-11-19T22:54:43Z

Travis fail is the test_rank segfault

…ests

codecov · 2018-11-20T01:18:08Z

Codecov Report

Merging #23789 into master will decrease coverage by <.01%.
The diff coverage is 94%.

@@            Coverage Diff             @@
##           master   #23789      +/-   ##
==========================================
- Coverage   92.31%   92.31%   -0.01%     
==========================================
  Files         161      161              
  Lines       51471    51501      +30     
==========================================
+ Hits        47515    47542      +27     
- Misses       3956     3959       +3

Flag	Coverage Δ
#multiple	`90.7% <94%> (-0.01%)`	⬇️
#single	`42.43% <36%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/tseries/frequencies.py	`97.08% <100%> (ø)`	⬆️
pandas/core/arrays/timedeltas.py	`96.25% <100%> (-0.19%)`	⬇️
pandas/core/indexes/datetimes.py	`96.48% <100%> (+0.01%)`	⬆️
pandas/core/indexes/timedeltas.py	`89.28% <100%> (+0.09%)`	⬆️
pandas/tseries/offsets.py	`96.84% <86.36%> (-0.14%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee283fa...c3d775e. Read the comment docs.

pandas/core/arrays/timedeltas.py

pandas/tests/arithmetic/test_datetime64.py

jreback · 2018-11-20T01:46:34Z

pandas/tseries/offsets.py

@@ -1328,20 +1346,24 @@ def _end_apply_index(self, dtindex):

        base, mult = libfrequencies.get_freq_code(self.freqstr)
        base_period = dtindex.to_period(base)
+        if not isinstance(base_period._data, np.ndarray):


you seem to be doing this unwrappng a lot, wouldn't it be better to have a ._to_unwrapped_period(..)? that just does this (basically to_period() with unwrapping to the base data)

The issue here is that we want to unwrap a PeriodIndex, but leave a PeriodArray unchanged. Or are you suggesting that _to_unwrapped_period() would return self for the PeriodArray? I'm kind of expecting that down the road we'll have some general version of this to allow us to treat EAs and EA-backed Indexes symmetrically.

just a thought. happy to defer, but i think it does make sense to have methods that unwrap on Index but return self for Array

…ests

jorisvandenbossche · 2018-11-21T09:47:26Z

pandas/core/arrays/timedeltas.py

@@ -227,6 +242,18 @@ def _validate_fill_value(self, fill_value):
                             "Got '{got}'.".format(got=fill_value))
        return fill_value

+    @property
+    def is_monotonic_increasing(self):


Where is this used in this PR?

When calling the constructor with freq="infer" we end up calling tseries.frequencies.infer_freq, which accesses this property

I think I already expressed this previously, but I would prefer that we have this discussion for all the Arrays, not just for TimedeltaArray (or DatetimeArray), as this is not a datetime-specific attribute. If you don't want to have that discussion first, you can always make it a private attribute here, and check that in infer_freq (or call the algos methods there if the passed values don't have such an attribute)

If this is a discussion you'd like to have for EAs more generally, go ahead and open an issue for it.

I would be +1 on putting these attributes in the mixin class so that they are available on all three of DTA/TDA/PA, but for now they are only needed on TDA.

Special-casing inside infer_freq (or more specifically FrequencyInferer is a pretty ugly hack that I'd rather avoid. That said, if you're going to insist on it, I might as well get it over with.

@jbrockmendel I would really prefer if you leave this out of this PR (I mean adding the attributes, so which means special casing this in infer_freq).

Looking at the code again, I think the easiest to do is to ensure what is passed to _FrequencyInferrer is an actual index. So if we have an array, simply convert to Index without copy.

converting to an Index would break the "Array should be ignorant of Index" rule discussed elsewhere.

I'll special-case this to get this over with, but I maintain this is introducing a code smell.

pandas/core/arrays/timedeltas.py

jorisvandenbossche · 2018-11-21T09:52:14Z

pandas/core/arrays/timedeltas.py

+                                 .format(inferred=inferred_freq,
+                                         passed=freq.freqstr))
+            elif freq is None:
+                # TODO: should this be the stronger condition `if freq_infer`?


Can you clarify this TODO comment?

Sure. We might also be able to resolve the answer as "no" in this thread.

ATM if we get an inferred_freq here, we are setting freq = inferred_freq regardless of whether the user passed freq="infer". In particular, we are ignoring the possibility that the user passed freq=None specifically wanting the result to have no freq.

jreback · 2018-11-21T12:50:57Z

pandas/core/arrays/timedeltas.py


        result = cls._simple_new(values, freq=freq)
+        # check that we are matching freqs


I think you could simplify a lot of these checks if also pass inferred_freq (optionally to ._validate_frequencey, and for example handle the fact that result could be a scalar (just ignore the validation) / sequence (do the validation)

ill take a look at this

I think you're right that we can change this from a 3-4 liner into a 1-2 liner. Since this pattern shows up in all four of TDA/DTA/TDI/DTI constructors (actually, future tense for DTA), I'd like to do change them all at once in a dedicated follow-up

…ests

jbrockmendel · 2018-11-24T15:56:21Z

gentle ping

jreback · 2018-11-25T02:07:09Z

pandas/core/arrays/timedeltas.py

+            elif freq is None:
+                # TODO: should this be the stronger condition `if freq_infer`?
+                #  i.e what if the user passed `freq=None` and specifically
+                #  wanted freq=None in the result?


I am not sure this is a valid condition. what are you trying to do here?

I am not sure this is a valid condition. what are you trying to do here?

To be clear, this is the same as the existing logic in TimedeltaIndex.__new__. The question is if we're sure this is the desired behavior. ATM:

tdi = pd.timedelta_range('1D', periods=5, freq='D') tdi2 = pd.TimedeltaIndex(tdi, freq=None) tdi3 = pd.TimedeltaIndex(tdi, freq="infer")

tdi2 and tdi3 both have daily freq, and there is no extant way for the user to say "no, seriously, I want freq=None" except by setting it manually.

I doubt a user setting a freq is actually used anywhere, not even sure of the purpose of it; nor is it likely tested.

This is just adding complextiy.

OK, I'll remove the comment

pandas/core/arrays/timedeltas.py

jreback · 2018-11-25T02:10:46Z

pandas/core/indexes/timedeltas.py

@@ -227,6 +227,10 @@ def _format_native_types(self, na_rep=u'NaT', date_format=None, **kwargs):
    # -------------------------------------------------------------------
    # Wrapping TimedeltaArray

+    is_monotonic_increasing = Index.is_monotonic_increasing
+    is_monotonic_decreasing = Index.is_monotonic_decreasing
+    is_unique = Index.is_unique


you need to time this, we have a dedicated routine in the cython engine for this, the main reason for using it is it a O(n) op once the hashtable is computed, rather than a non-cached computation which you did above.

Right. TimedeltaIndex uses the _engine-based implementation that is available because we know TDI is immutable. TimedeltaArray uses the naive implementation, at least for now. There's an Issue to investigate caching on PeriodArray which can be extended to TDA/DTA when the time comes.

this is not obvious at all. i would expect a comment on this. maybe even do this in a separate PR, with testing for this. I am not sure its relevant to the changes in this PR.

I'll add a comment. It is definitely relevant to this PR, since without implementing these in the TDA calls to infer_freq (via __new__) raise AttributeError

…ests

jorisvandenbossche · 2018-11-26T09:05:12Z

pandas/core/arrays/timedeltas.py

@@ -227,6 +242,18 @@ def _validate_fill_value(self, fill_value):
                             "Got '{got}'.".format(got=fill_value))
        return fill_value

+    @property
+    def is_monotonic_increasing(self):


@jbrockmendel I would really prefer if you leave this out of this PR (I mean adding the attributes, so which means special casing this in infer_freq).

Looking at the code again, I think the easiest to do is to ensure what is passed to _FrequencyInferrer is an actual index. So if we have an array, simply convert to Index without copy.

jorisvandenbossche · 2018-11-26T09:12:46Z

pandas/tests/arithmetic/test_datetime64.py

+
+        offset_cls = getattr(pd.offsets, cls_name)
+
+        with warnings.catch_warnings(record=True):


I don't think the record=True is needed? (as you are not checking the captured warnings)

jorisvandenbossche · 2018-11-26T09:15:57Z

pandas/tseries/offsets.py


            weeks = (kwds.get('weeks', 0)) * self.n
            if weeks:
                # integer addition on PeriodIndex is deprecated,
                #   so we directly use _time_shift instead
                asper = i.to_period('W')
-                shifted = asper._data._time_shift(weeks)
+                if not isinstance(asper._data, np.ndarray):


when is this not the case?

(or why was this change needed?)

and something else, this full if weeks block does not seem to be covered (don't know if it is a false positive or not)

asper at this point may be either a PeriodIndex or a PeriodArray. If it is a PeriodIndex we want to unwrap it to get a PeriodArray. If it is a PeriodArray, we dont want to unwrap it since that would give a ndarray

Good note on the if weeks. Adding coverage is out of scope, but I'll add a note in the offsets TODO list

jorisvandenbossche · 2018-11-26T09:17:04Z

pandas/tseries/offsets.py

@@ -912,7 +919,9 @@ def apply(self, other):
    @apply_index_wraps
    def apply_index(self, i):
        shifted = liboffsets.shift_months(i.asi8, self.n, self._day_opt)
-        return i._shallow_copy(shifted)
+        # TODO: going through __new__ raises on call to _validate_frequency;
+        #  are we passing incorrect freq?


Does this todo needs to be discussed / handled now?

I'm happy thinking of this as a note to future-me. Really looking forward to getting the current crop of PRs done with and DatetimeArray out the door.

pandas/core/arrays/timedeltas.py

…ests

jbrockmendel · 2018-11-27T00:06:05Z

@jorisvandenbossche I think your main concern has been addressed. LMK if any others are deal-breakers.

jreback · 2018-11-27T02:10:48Z

pandas/tseries/frequencies.py

@@ -295,8 +295,19 @@ def __init__(self, index, warn=True):
        if len(index) < 3:
            raise ValueError('Need at least 3 dates to infer frequency')

-        self.is_monotonic = (self.index.is_monotonic_increasing or
-                             self.index.is_monotonic_decreasing)
+        if not hasattr(index, "is_monotonic_increasing"):


why don't you define a method on both TimedeltaArray & TimedeltaIndex which basically does the correct thing here? I find this not very obvious what you are doing and the differencde betwee TDI and TA is pretty stark here (of the calling conventions and such).

That's what I did originally, but @jorisvandenbossche insisted on not-that.

hmm, this is the wrong place for logic like this. Its a private method, not sure the reason. @jorisvandenbossche

Its a private method, not sure the reason.

Not sure what you mean with this comment. This is a private method yes, but why would that be a reason that this is not the good place to put it?
For now, it is also the only place where this logic is needed, so I don't think it is necessarily a bad place. Yes, if this would start to be used in a lot of places, putting it as a method on the Array class makes sense, for now I don't really care.

What I asked was to just leave out the addition of public attributes on the Array classes out of this PR (as that needs a more general discussion). Whether the logic is put here, in a private attribute on the Array, or by doing a self.index = pd.Index(self.index, copy=False) to ensure it has the needed properties (and the rest of the code here can stay the same), I don't care much.

because you are putting logic that clearly has a different implementation outside of the class which defines it ; this is breaking all kinds of abstraction here

the inference engine needs to know some info which is better off donenall together for perf reasons - so call a method - that is what they are for

@jbrockmendel move this to a private method on ta and tdi

done. Now frequencies.infer_freq is accessing private attrs, but we can revisit that smell later

…ests

jbrockmendel · 2018-11-27T20:34:40Z

Should be all squared away now.

jbrockmendel · 2018-11-28T14:57:32Z

gentle ping. This and #23829 are the main things standing in the way of declaring DTA/TDA "finished" (internally)

jreback · 2018-11-28T17:48:18Z

couple of notes for followup, but lgtm.

@jorisvandenbossche

jorisvandenbossche · 2018-11-28T23:33:59Z

Thanks @jbrockmendel !

jbrockmendel · 2018-11-28T23:35:30Z

Booyah. Thanks @jorisvandenbossche, this one is a big win.

…andas-dev#23789)

jbrockmendel added 3 commits November 19, 2018 06:55

unrelated, change _window->libwindow

986fdbc

revert non-central

fd75931

implement remaining methods to fix dateoffset arithmetic with DTA/TDA

66c866b

xfail dataframe case

4dc17e2

Merge branch 'master' of https://github.com/pandas-dev/pandas into et…

da3459c

…ests

jbrockmendel mentioned this pull request Nov 20, 2018

TST: Extend timedelta64 arithmetic tests to TimedeltaArray #23642

Merged

jreback added Datetime Datetime data dtype Frequency DateOffsets labels Nov 20, 2018

jreback requested changes Nov 20, 2018

View reviewed changes

jbrockmendel added 4 commits November 19, 2018 17:57

Merge branch 'master' of https://github.com/pandas-dev/pandas into et…

348a8b2

…ests

dont use verify_integrity, push one more level of test into parametrize

dd7e873

Merge branch 'master' of https://github.com/pandas-dev/pandas into et…

c8351bc

…ests

fix broken tests

23a25d1

jorisvandenbossche reviewed Nov 21, 2018

View reviewed changes

jreback requested changes Nov 21, 2018

View reviewed changes

jbrockmendel added 3 commits November 22, 2018 09:01

comment clarification

b4ae288

Merge branch 'master' of https://github.com/pandas-dev/pandas into et…

d1ebdbf

…ests

dummy commit to force CI

711ee61

jbrockmendel mentioned this pull request Nov 24, 2018

Implement mul, floordiv, mod, divmod, and reversed directly in TimedeltaArray #23885

Merged

4 tasks

jreback reviewed Nov 25, 2018

View reviewed changes

jreback requested changes Nov 25, 2018

View reviewed changes

jbrockmendel added 2 commits November 24, 2018 19:04

Merge branch 'master' of https://github.com/pandas-dev/pandas into et…

9338b5b

…ests

comments about cached and non-cached implementations

5fbe9c8

jorisvandenbossche requested changes Nov 26, 2018

View reviewed changes

jorisvandenbossche reviewed Nov 26, 2018

View reviewed changes

pandas/core/arrays/timedeltas.py Show resolved Hide resolved

Merge branch 'master' of https://github.com/pandas-dev/pandas into et…

c7db0e4

…ests

jbrockmendel added 4 commits November 26, 2018 09:16

coverage

a4f9733

special casing in freq_infer

b50fedf

special casing followup

7e951e4

fix simple_new args

317e1e7

jreback requested changes Nov 27, 2018

View reviewed changes

jbrockmendel added 2 commits November 27, 2018 06:53

Merge branch 'master' of https://github.com/pandas-dev/pandas into et…

5de2d42

…ests

privatize

5433a71

jbrockmendel force-pushed the etests branch from a3d78c2 to 5433a71 Compare November 27, 2018 14:58

jbrockmendel added 4 commits November 27, 2018 08:36

Merge branch 'master' of https://github.com/pandas-dev/pandas into et…

dc137f3

…ests

more kludges

2c65f3b

Merge branch 'master' of https://github.com/pandas-dev/pandas into et…

31c5c0b

…ests

typo fixup

c3d775e

jbrockmendel mentioned this pull request Nov 27, 2018

Implement TimedeltaArray._from_sequence #23953

Merged

4 tasks

jreback approved these changes Nov 28, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Nov 28, 2018

jorisvandenbossche merged commit b1ee2df into pandas-dev:master Nov 28, 2018

jbrockmendel deleted the etests branch November 28, 2018 23:35

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

REF/TST: Fix remaining DatetimeArray with DateOffset arithmetic ops (p…

33d9086

…andas-dev#23789)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

REF/TST: Fix remaining DatetimeArray with DateOffset arithmetic ops (p…

cfa19c2

…andas-dev#23789)


		result = cls._simple_new(values, freq=freq)
		# check that we are matching freqs


		offset_cls = getattr(pd.offsets, cls_name)

		with warnings.catch_warnings(record=True):

REF/TST: Fix remaining DatetimeArray with DateOffset arithmetic ops #23789

REF/TST: Fix remaining DatetimeArray with DateOffset arithmetic ops #23789

Conversation

jbrockmendel commented Nov 19, 2018

pep8speaks commented Nov 19, 2018

jbrockmendel commented Nov 19, 2018

codecov bot commented Nov 20, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 27, 2018

jbrockmendel commented Nov 28, 2018

jreback commented Nov 28, 2018

jorisvandenbossche commented Nov 28, 2018

jbrockmendel commented Nov 28, 2018

codecov bot commented Nov 20, 2018 •

edited

Loading