Skip to content

Fix Timedelta.__floordiv__, __rfloordiv__ #18961

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jan 7, 2018

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Dec 27, 2017

It would not at all surprise me to learn that there are more corner cases that this misses. Needs some eyeballs.

assert (2 * td) // scalar.to_timedelta64() == 2

assert td // np.nan is pd.NaT
assert np.isnan(td // pd.NaT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should prob be in test_nat with other nat tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestTimedeltaArithmetic doesn't have a dedicated test/section for nat tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas/tests/scalar/test_nat is where the nat tests go (i don't mind them here), but should have some (even if slightly duplicative)

# also timedelta-like
# We need to watch out for np.timedelta64('NaT')
if np.ndim(other) == 0:
if np.isnat(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls try to be DRY
don’t use np.ndim ; we don’t use it anywhere else
don’t use np.isnat

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don’t use np.isnat

This is specifically trying to catch np.timedelta64('NaT') and because of np.NaT rules we can't use is timedelta64('NaT') and we can't use == timedelta64('NaT')


if isinstance(other, float) and np.isnan(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh?
why are you not just checking isna

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because np.nan (which can't be detected with other is np.nan) gets treated differently from pd.NaT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? nan is equiv of nat here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? nan is equiv of nat here

That's slightly ambiguous, but I lean towards "no it isn't". Timedelta(nan) does return NaT, but Timedelta / float is meaningful on its own, so we should treat nan as a float for the purposes of this operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checknull(other) and is_float_object(other) is more clear here (and add a comment)

elif not _validate_ops_compat(other):
return NotImplemented

other = Timedelta(other)
if other is NaT:
return NaT
return np.nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this what we do for division?
is it not tested now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this what we do for division?

Yes.

The status quo is... weird.

    def __floordiv__(self, other):
        if hasattr(other, 'dtype'):
            # work with i8
            other = other.astype('m8[ns]').astype('i8')
            return self.value // other

        elif is_integer_object(other):
            # integers only
            return Timedelta(self.value // other, unit='ns')

        elif not _validate_ops_compat(other):
            return NotImplemented

        other = Timedelta(other)
        if other is NaT:
            return np.nan
        return self.value // other.value

    def __rfloordiv__(self, other):
        if hasattr(other, 'dtype'):
            # work with i8
            other = other.astype('m8[ns]').astype('i8')
            return other // self.value

        elif not _validate_ops_compat(other):
            return NotImplemented

        other = Timedelta(other)
        if other is NaT:
            return NaT
        return other.value // self.value

is it not tested now?

Not that I'm aware of.

other = other.astype('m8[ns]').astype('i8')
return other // self.value
if other.dtype.kind == 'm':
# also timedelta-like
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc for division we defer to. TDI to handle his logic
does this code path follow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc for division we defer to. TDI to handle his logic

I would find that design choice surprising.

    def __truediv__(self, other):
        if hasattr(other, 'dtype'):
            return self.to_timedelta64() / other

        elif is_integer_object(other) or is_float_object(other):
            # integers or floats
            return Timedelta(self.value / other, unit='ns')

        elif not _validate_ops_compat(other):
            return NotImplemented

        other = Timedelta(other)
        if other is NaT:
            return np.nan
        return self.value / float(other.value)

I think the relevant difference here is for cases where other is an array this returns NotImplemented, but that isn't a viable option for floordiv because numpy doesn't implement floordiv for timedelta64.

@jreback jreback added Bug Timedelta Timedelta data type labels Dec 28, 2017
@codecov
Copy link

codecov bot commented Dec 28, 2017

Codecov Report

Merging #18961 into master will decrease coverage by 49.91%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #18961       +/-   ##
===========================================
- Coverage   91.53%   41.61%   -49.92%     
===========================================
  Files         148      148               
  Lines       48688    48754       +66     
===========================================
- Hits        44566    20289    -24277     
- Misses       4122    28465    +24343
Flag Coverage Δ
#multiple ?
#single 41.61% <ø> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/lib.py 0% <0%> (-100%) ⬇️
pandas/tools/hashing.py 0% <0%> (-100%) ⬇️
pandas/tslib.py 0% <0%> (-100%) ⬇️
pandas/parser.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/types/concat.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.08%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.28%) ⬇️
pandas/core/tools/numeric.py 10.44% <0%> (-89.56%) ⬇️
... and 109 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1888f0...f101378. Read the comment docs.

@jbrockmendel
Copy link
Member Author

The linting doesn't like np.testing.assert_array_equal. What is the preferred usage?

@jreback
Copy link
Contributor

jreback commented Dec 29, 2017

we have banned all numpy functions, e.g. np.testing.assert_array_equal instead use tm.assert_numpy_array_equal; they are not generally flexibile enough and some are buggy (don't compare object arrays correctly for example)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls acknowledge comments individually (generally)
i don’t want to repeat myself


if isinstance(other, float) and np.isnan(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? nan is equiv of nat here


# We need to watch out for np.timedelta64('NaT')
if ndim == 0:
if np.isnat(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use isna

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the Travis build with numpy 1.10.4 is failing because there is no np.isnat, so this is necessary even aside from matters of taste/convention.

I'm making the suggested change now, but is there an alternative that doesn't involve reaching into non-cython pandas? I have a strong preference for keeping the dependency graph DAG.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean like missing.pyx/is_null_datetimelike

return operation(value, other.astype('m8[ns]').astype('i8'))

else:
mask = np.isnat(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

else:
return self.to_timedelta64() // other

else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just raise you don't need an else

elif other.dtype.kind in ['i', 'u', 'f']:
if np.ndim(other) == 0:
return Timedelta(self.value // other)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the else

also pls use other.ndim

if other.dtype.kind == 'm':
# also timedelta-like
return _broadcast_floordiv_td64(self.value, other, _rfloordiv)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


if isinstance(other, float) and np.isnan(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checknull(other) and is_float_object(other) is more clear here (and add a comment)


# We need to watch out for np.timedelta64('NaT').
mask = other.view('i8') == NPY_NAT
# equiv np.isnat, which does not exist in some supported np versions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment, we DO NOT use np.isnat

if ndim == 0:
if mask:
return np.nan
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no else

else:
return operation(value, other.astype('m8[ns]').astype('i8'))

else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no else

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change, but curious why you have such a strong preference here. I tend to like making these elses explicit (you've no doubt noticed).

assert (2 * td) // scalar.to_timedelta64() == 2

assert td // np.nan is pd.NaT
assert np.isnan(td // pd.NaT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas/tests/scalar/test_nat is where the nat tests go (i don't mind them here), but should have some (even if slightly duplicative)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test comments, lgtm, ping on green.


dt64 = np.datetime64('2016-01-01', dtype='datetime64[us]')
with pytest.raises(TypeError):
td.__rfloordiv__(dt64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding a bit more comments here enumerating the cases would be useful

@jreback jreback added this to the 0.23.0 milestone Jan 6, 2018
@jbrockmendel
Copy link
Member Author

Ping

@jreback jreback merged commit 36a71eb into pandas-dev:master Jan 7, 2018
@jreback
Copy link
Contributor

jreback commented Jan 7, 2018

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Timedelta.__floordiv__ - need historical context
2 participants