-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
implement truediv, rtruediv directly in TimedeltaArray; tests #23829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jbrockmendel! Thanks for updating the PR.
Comment last updated on November 29, 2018 at 02:49 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #23829 +/- ##
==========================================
- Coverage 92.31% 92.31% -0.01%
==========================================
Files 161 161
Lines 51513 51578 +65
==========================================
+ Hits 47554 47613 +59
- Misses 3959 3965 +6
Continue to review full report at Codecov.
|
rng = tm.box_expected(rng, box_with_array) | ||
for obj in [mismatched, mismatched[:2]]: | ||
# one shorter, one longer | ||
for other in [obj, np.array(obj), pd.Index(obj)]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to parameterize this? Potentially via flags to indicate what transformation to perform on obj
before testing for the error?
Definitely the outer-loop can be parameterized (just use [1, 2, 3, 4]
and [1, 2]
), unless there's a really strong reason for using mismatched
and mismatched[:2]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible, but I'm trying to push back a little bit against over-parametrization. The pytest setup cost is non-trivial
|
||
elif is_timedelta64_dtype(other): | ||
# let numpy handle it | ||
return self._data / other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to ensure other
is a np.ndarray and not TimedeltaArray here? (meaning, extract the numpy array out of the TimedeltaArray)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. self._data.__div__(other)
would return NotImplemented
if other were a TimedeltaArray
. This PR includes a test that covers TDA/TDA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that is another level of redirection, while we know this will happen and can directly do the correct thing here?
(I suppose is_timedelta64_dtype
only passes through those two cases of ndarray or TimedeltaArray?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose is_timedelta64_dtype only passes through those two cases of ndarray or TimedeltaArray?
Yes, since we exclude Series and Index at the start.
Yes, but that is another level of redirection, while we know this will happen and can directly do the correct thing here?
I guess we could replace other
with getattr(other, "_data", other)
(or if/when TimedeltaArray gets an __array__
method, just np.array(other)
, which would be prettier)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the (hopefully not too distant future), TimedeltaArray will no longer be an Index. In this case we would want to explicitly grab the ._data
out of it and proceed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what's the return type here? Does this need to be wrapped in a a type(self)
so that we return a TimedeltaArray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what's the return type here?
float-dtyped ndarray
# let numpy handle it | ||
return self._data / other | ||
|
||
elif is_object_dtype(other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to allow this?
I would be fine with raising a TypeError here.
(I first wanted to say: can't we dispatch that to numpy, thinking that numpy object dtype would handle that, but they raise a TypeError)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why this would be first on the chopping block if we had to support fewer cases. Is there a compelling reason not to handle this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also turn around the question :) Is there a compelling reason to do handle this case?
It's just an extra case to support. And eg, we could discuss whether this should return object dtype data or timedelta, as you are inferring now? Looking at Series behaviour with int64 and object integers, it actually returns object. For datetimes it now raises. So at least, our support is at the moment not very consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a compelling reason to do handle this case?
Because a selling point of pandas is that things Just Work? Because the code and tests are already written, so the marginal cost is \approx zero?
our support is at the moment not very consistent
Fair enough. If a goal is to make things more consistent (which I'm +1 on BTW) then we're probably not going to go around and start breaking the places where it currently is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree with @jbrockmendel here
pandas/core/indexes/timedeltas.py
Outdated
if isinstance(other, Index): | ||
# TimedeltaArray defers, so we need to unwrap | ||
oth = other._values | ||
result = TimedeltaArray.__truediv__(self, oth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do here something like result = self._values / oth
instead of calling the dunder method? (I don't really know the difference in practice, but it looks a bit cleaner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll double-check. Usually my reasoning for using the dunder methods is to preserve NotImplemented
semantics.
|
||
elif is_object_dtype(other): | ||
result = [self[n] / other[n] for n in range(len(self))] | ||
result = np.array(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this is really close to what soft_convert_objects does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That isn't clear to me. soft_convert_objects
doesn't call lib.infer_dtype
or any analogue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are essentially re-implementing it. i would rather not do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on what you have in mind. Something like:
if lib.infer_dtype(result) == 'timedelta':
result = soft_convert_objects(result, timedelta=True, coerce=False)
return type(self)(result)
return result
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, will change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed. this ends up changing the behavior of the DataFrame
test case, but that's largely driven by the fact that DataFrame([NaT])
gets inferred as datetime64[ns]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is changed here? shouldn't is_object_type result in a TypeError or a NotImplemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two independent questions that have been asked about the object-dtype case:
- should we just raise TypeError instead or should we handle it so it Just Works (the latter being what this PR does)
- Given that we handle this case, do we try to infer the output dtpye or just return object dtype? This PR originally did the former, then changed to do the latter following discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this is fine, you are returning object dtype (which is consistent with how we do for Series now)
Next round implementing floordiv, mod, divmod, and reversed ops is almost ready. Any preference between follow-up vs adding to this PR? |
How many more things do you need to follow-up on? And how straightforward are they to handle? Probably would lean towards merging this and follow-up later, but not sure whether some of the remaining comments are blockers... |
|
||
elif is_object_dtype(other): | ||
result = [self[n] / other[n] for n in range(len(self))] | ||
result = np.array(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok reversing here. i agree with @jorisvandenbossche we don't infer object dtypes in ops (only in the constructor), so this seems reasonable
In [2]: pd.Series([1,2, 3]) / pd.Series([1,1,1], dtype=object)
Out[2]:
0 1
1 2
2 3
dtype: object
# let numpy handle it | ||
return other / self._data | ||
|
||
elif is_object_dtype(other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or can raise NotImplemented here? does that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that might be fragile; might depend on having __array__
implemented. Either way, better to make it explicit than rely on numpy imlpementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, can you add a comment here (and above), were we do the operation but do not infer the output type (just for posterity), otherwise this PR lgtm. ping on green.
I think all comments have been addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One remaining question/comment about the testing of the last change for object dtype.
For the rest looks good to me!
result = tdser / vector.astype(object) | ||
expected = [tdser[n] / vector[n] for n in range(len(tdser))] | ||
expected = tm.box_expected(expected, xbox) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this converted the expected list to an DatetimeArray, while the expected result is on object ndarray? (so it's not really testing that?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. In the case where xbox
is tm.to_array
(the only case that could conceivably give a DatetimeArray), tm.to_array(any_list)
returning np.array(that_list)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK, all a bit opaque .. (I checked what get_upcast_box
and box_expected
do, but not box_with_array
:-))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, I'm hoping to simplify some of it, and ideally even get rid of box_expected, but it'll be a while before thats feasible.
doc/source/whatsnew/v0.24.0.rst
Outdated
@@ -1227,7 +1227,7 @@ Timedelta | |||
- Bug in :class:`TimedeltaIndex` where adding a timezone-aware datetime scalar incorrectly returned a timezone-naive :class:`DatetimeIndex` (:issue:`23215`) | |||
- Bug in :class:`TimedeltaIndex` where adding ``np.timedelta64('NaT')`` incorrectly returned an all-`NaT` :class:`DatetimeIndex` instead of an all-`NaT` :class:`TimedeltaIndex` (:issue:`23215`) | |||
- Bug in :class:`Timedelta` and :func:`to_timedelta()` have inconsistencies in supported unit string (:issue:`21762`) | |||
|
|||
- Bug in :class:`TimedeltaIndex` division where dividing by another :class:`TimedeltaIndex` raised ``TypeError`` instead of returning a :class:`Float64Index` (:issue:`23829`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you need the refernces issue number here. do you need to mention the other issue that is refernces in the top or PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
|
||
elif is_object_dtype(other): | ||
result = [self[n] / other[n] for n in range(len(self))] | ||
result = np.array(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is changed here? shouldn't is_object_type result in a TypeError or a NotImplemented?
# let numpy handle it | ||
return other / self._data | ||
|
||
elif is_object_dtype(other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
Travis fail is unrelated flake8-rst problem |
ping |
thanks! |
Is the difference in behavior between In [14]: rng = pd.timedelta_range(start='1D', periods=10, freq='D')
In [15]: rng / np.timedelta64('NaT')
Out[15]: Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], dtype='float64')
In [16]: rng / pd.NaT
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-0bf699fa1965> in <module>
----> 1 rng / pd.NaT
~/sandbox/pandas-alt/pandas/core/indexes/timedeltas.py in __truediv__(self, other)
259 # TimedeltaArray defers, so we need to unwrap
260 oth = other._values
--> 261 result = TimedeltaArray.__truediv__(self, oth)
262 return wrap_arithmetic_op(self, other, result)
263
~/sandbox/pandas-alt/pandas/core/arrays/timedeltas.py in __truediv__(self, other)
364 elif lib.is_scalar(other):
365 # assume it is numeric
--> 366 result = self._data / other
367 freq = None
368 if self.freq is not None:
TypeError: ufunc true_divide cannot use operands with types dtype('<m8[ns]') and dtype('O') |
Yes. timedelta64("NaT") is unambiguously timedelta-like, whereas pd.NaT is both datetime-like and timedelta-like (and defaults to datetime-like). |
Thanks for the clarification.
…On Tue, Dec 11, 2018 at 11:53 AM jbrockmendel ***@***.***> wrote:
Is the difference in behavior between TimedeltaIndex / timedelta64('NaT')
and TimedeltaIndex / pd.NaT deliberate?
Yes. timedelta64("NaT") is unambiguously timedelta-like, whereas pd.NaT is
both datetime-like and timedelta-like (and defaults to datetime-like).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#23829 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIir-WB_mF_gczMVbnt2o4SsZUH-oks5u3_ERgaJpZM4YsTAg>
.
|
Follow-up to #23642, implements
__rdiv__
in TimedeltaArray, plus missing tests.After this will be a PR that does the same for floordiv+rfloordiv.
git diff upstream/master -u -- "*.py" | flake8 --diff
Fixes:
tdi / np.timedelta64('NaT')
tdi / tdi
(#22631)tdi / object_dtyped
object_dtyped / tdi