BUG: to_datetime with M or Y unit and non-round float #50301

jbrockmendel · 2022-12-16T18:23:49Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

MarcoGorelli · 2022-12-16T18:29:56Z

If non-round values aren't allowed, would this supersede #50183?

jbrockmendel · 2022-12-16T18:44:41Z

If non-round values aren't allowed, would this supersede #50183?

Its only for unit="M" or unit="Y"

pandas/tests/tools/test_to_datetime.py

pandas/_libs/tslib.pyx

MarcoGorelli

couple of minor comments, other than that looks good

doc/source/whatsnew/v2.0.0.rst

pandas/tests/tools/test_to_datetime.py

MarcoGorelli

LGTM, thanks @jbrockmendel

pandas/tests/tools/test_to_datetime.py

MarcoGorelli · 2022-12-26T21:55:38Z

pandas/_libs/tslib.pyx

+                    if is_ym and not fval.is_integer():
+                        # Analogous to GH#47266 for Timestamp
+                        if is_raise:
+                            raise ValueError(
+                                f"Conversion of non-round float with unit={unit} "
+                                "is ambiguous"
+                            )
+                        elif is_ignore:
+                            raise AssertionError
+                        iresult[i] = NPY_NAT
+                        continue


sorry to block this, but is there a test that hits this?

I can see tests when val is a non-round float, but what about when it's a string containing a non-round float (e.g. '1.5'), which I believe is what would reach this branch?

MarcoGorelli

Looks good to me, thanks @jbrockmendel !

@mroeschke any thoughts?

mroeschke · 2022-12-28T17:47:29Z

Thanks @jbrockmendel

* add test for float to_datetime near overflow bounds * fix float to_datetime near overflow bounds * fix typo and formatting * fix formatting * fix test to not fail on rounding differences * don't use approximate comparison on datetimes, it doesn't work * also can't convert datetime to float * match dtypes * TST: don't try to use non-integer years (see #50301) * TST: don't cross an integer (tsmax_in_days happens to be close to an integer, and this is a test of rounding) * PERF: remove unnecessary copy * add whatsnew

* add test for float to_datetime near overflow bounds * fix float to_datetime near overflow bounds * fix typo and formatting * fix formatting * fix test to not fail on rounding differences * don't use approximate comparison on datetimes, it doesn't work * also can't convert datetime to float * match dtypes * TST: don't try to use non-integer years (see pandas-dev#50301) * TST: don't cross an integer (tsmax_in_days happens to be close to an integer, and this is a test of rounding) * PERF: remove unnecessary copy * add whatsnew

In Pandas 1.5.3, the `float(val)` cast was inline to the `cast_from_unit` call in `array_with_unit_to_datetime`. This caused the intermediate (unnamed) value to be a Python float. Since pandas-dev#50301, an temporary variable was added to avoid multiple casts, but with explicit type `cdef float`, which defines a _Cython_ float. This type is 32-bit, and causes a loss of precision, and a regression in parsing from 1.5.3. Since `cast_from_unit` takes an `object`, not a more specific Cython type, remove the explicit type from the temporary `fval` variable entirely. This will cause it to be a (64-bit) Python float, and thus not lose precision. Fixes pandas-dev#57051

In Pandas 1.5.3, the `float(val)` cast was inline to the `cast_from_unit` call in `array_with_unit_to_datetime`. This caused the intermediate (unnamed) value to be a Python float. Since pandas-dev#50301, a temporary variable was added to avoid multiple casts, but with explicit type `cdef float`, which defines a _Cython_ float. This type is 32-bit, and causes a loss of precision, and a regression in parsing from 1.5.3. Since `cast_from_unit` takes an `object`, not a more specific Cython type, remove the explicit type from the temporary `fval` variable entirely. This will cause it to be a (64-bit) Python float, and thus not lose precision. Fixes pandas-dev#57051

In Pandas 1.5.3, the `float(val)` cast was inline to the `cast_from_unit` call in `array_with_unit_to_datetime`. This caused the intermediate (unnamed) value to be a Python float. Since pandas-dev#50301, a temporary variable was added to avoid multiple casts, but with explicit type `cdef float`, which defines a _Cython_ float. This type is 32-bit, and causes a loss of precision, and a regression in parsing from 1.5.3. So widen the explicit type of the temporary `fval` variable to (64-bit) `double`, which will not lose precision. Fixes pandas-dev#57051

In Pandas 1.5.3, the `float(val)` cast was inline to the `cast_from_unit` call in `array_with_unit_to_datetime`. This caused the intermediate (unnamed) value to be a Python float. Since #50301, a temporary variable was added to avoid multiple casts, but with explicit type `cdef float`, which defines a _Cython_ float. This type is 32-bit, and causes a loss of precision, and a regression in parsing from 1.5.3. So widen the explicit type of the temporary `fval` variable to (64-bit) `double`, which will not lose precision. Fixes #57051

…as-dev#57548) In Pandas 1.5.3, the `float(val)` cast was inline to the `cast_from_unit` call in `array_with_unit_to_datetime`. This caused the intermediate (unnamed) value to be a Python float. Since pandas-dev#50301, a temporary variable was added to avoid multiple casts, but with explicit type `cdef float`, which defines a _Cython_ float. This type is 32-bit, and causes a loss of precision, and a regression in parsing from 1.5.3. So widen the explicit type of the temporary `fval` variable to (64-bit) `double`, which will not lose precision. Fixes pandas-dev#57051

BUG: to_datetime with M or Y unit and non-round float

7455e87

MarcoGorelli self-requested a review December 16, 2022 18:50

MarcoGorelli suggested changes Dec 17, 2022

View reviewed changes

jbrockmendel added 3 commits December 17, 2022 10:58

use is_integer

811aec9

Merge branch 'main' into depr-cast_from_unit

f66303d

Merge branch 'main' into depr-cast_from_unit

5ecec46

mroeschke added the Datetime Datetime data dtype label Dec 20, 2022

Merge branch 'main' into depr-cast_from_unit

530437c

MarcoGorelli reviewed Dec 22, 2022

View reviewed changes

doc/source/whatsnew/v2.0.0.rst Outdated Show resolved Hide resolved

pandas/tests/tools/test_to_datetime.py Show resolved Hide resolved

jbrockmendel added 2 commits December 22, 2022 09:21

Merge branch 'main' into depr-cast_from_unit

1bfc2a3

GH ref

3d6d2a0

MarcoGorelli approved these changes Dec 22, 2022

View reviewed changes

pandas/tests/tools/test_to_datetime.py Show resolved Hide resolved

MarcoGorelli added this to the 2.0 milestone Dec 22, 2022

Merge branch 'main' into depr-cast_from_unit

d13fcbf

MarcoGorelli suggested changes Dec 26, 2022

View reviewed changes

jbrockmendel added 2 commits December 27, 2022 15:23

added test

f008c84

Merge branch 'main' into depr-cast_from_unit

2a34cd2

MarcoGorelli approved these changes Dec 28, 2022

View reviewed changes

mroeschke approved these changes Dec 28, 2022

View reviewed changes

mroeschke merged commit 83c2a5f into pandas-dev:main Dec 28, 2022

jbrockmendel deleted the depr-cast_from_unit branch December 28, 2022 19:09

rebecca-palmer added a commit to rebecca-palmer/pandas that referenced this pull request Jan 18, 2023

TST: don't try to use non-integer years (see pandas-dev#50301)

8ebc910

rebecca-palmer mentioned this pull request Jan 18, 2023

Improve to_datetime bounds checking #50183

Merged

3 tasks

QuLogic mentioned this pull request Feb 21, 2024

Fix accidental loss-of-precision for to_datetime(str, unit=...) #57548

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: to_datetime with M or Y unit and non-round float #50301

BUG: to_datetime with M or Y unit and non-round float #50301

jbrockmendel commented Dec 16, 2022

MarcoGorelli commented Dec 16, 2022

jbrockmendel commented Dec 16, 2022

MarcoGorelli left a comment

MarcoGorelli left a comment

MarcoGorelli Dec 26, 2022

MarcoGorelli left a comment

mroeschke commented Dec 28, 2022

BUG: to_datetime with M or Y unit and non-round float #50301

BUG: to_datetime with M or Y unit and non-round float #50301

Conversation

jbrockmendel commented Dec 16, 2022

MarcoGorelli commented Dec 16, 2022

jbrockmendel commented Dec 16, 2022

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli Dec 26, 2022

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

mroeschke commented Dec 28, 2022