Skip to content

REGR: to_timedelta precision issues with floating data #25651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 12, 2019
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Fixed Regressions
- Fixed regression in ``IntervalDtype`` construction where passing an incorrect string with 'Interval' as a prefix could result in a ``RecursionError``. (:issue:`25338`)
- Fixed regression in creating a period-dtype array from a read-only NumPy array of period objects. (:issue:`25403`)
- Fixed regression in :class:`Categorical`, where constructing it from a categorical ``Series`` and an explicit ``categories=`` that differed from that in the ``Series`` created an invalid object which could trigger segfaults. (:issue:`25318`)
- Fixed regression in :func:`to_timedelta` loosing precision when converting floating data to ``Timedelta`` data (:issue:`25077`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loosing -> losing

- Fixed pip installing from source into an environment without NumPy (:issue:`25193`)

.. _whatsnew_0242.enhancements:
Expand Down
10 changes: 3 additions & 7 deletions pandas/core/arrays/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -918,13 +918,9 @@ def sequence_to_td64ns(data, copy=False, unit="ns", errors="raise"):
copy = copy and not copy_made

elif is_float_dtype(data.dtype):
# treat as multiples of the given unit. If after converting to nanos,
# there are fractional components left, these are truncated
# (i.e. NOT rounded)
mask = np.isnan(data)
coeff = np.timedelta64(1, unit) / np.timedelta64(1, 'ns')
data = (coeff * data).astype(np.int64).view('timedelta64[ns]')
data[mask] = iNaT
# object_to_td64ns has custom logic for float -> int conversion
# to avoid precision issues
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this have the same perf?

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Mar 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is slower. But it is more correct (it is written specifically to handle this case), and it is what was used before 0.24.0 anyway.

I assume we might be able to port the similar logic here to be more performant (to not work element by element, knowing we only have floats), but I would personally leave that for 0.25.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the prior fix was for performance and this is a very narrow minor case

so you are causing a rather large perf regression by changing this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls shown asv before / after

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting floats to timedelta is a narrow use case anyway. If you care about performance but not care about precision, you can convert to integers yourself.

data = objects_to_td64ns(data, unit=unit, errors=errors)
copy = False

elif is_timedelta64_dtype(data.dtype):
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/indexes/timedeltas/test_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,3 +181,10 @@ def test_to_timedelta_on_missing_values(self):

actual = pd.to_timedelta(pd.NaT)
assert actual.value == timedelta_NaT.astype('int64')

def test_to_timedelta_float(self):
# https://github.com/pandas-dev/pandas/issues/25077
arr = np.arange(0, 1, 1e-6)[-10:]
result = pd.to_timedelta(arr, unit='s')
expected_asi8 = np.arange(999990000, int(1e9), 1000, dtype='int64')
tm.assert_numpy_array_equal(result.asi8, expected_asi8)