Skip to content

ENH: Make Timestamp implementation bounds match DTA/DTI/Series #39245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 20, 2021

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Jan 17, 2021

@jreback jreback added the Datetime Datetime data dtype label Jan 20, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the example from the OP as well, the pd.date_range

@jbrockmendel
Copy link
Member Author

can you add the example from the OP as well, the pd.date_range

looks like that example now fails for a different reason that will need to be fixed within pd.date_range

@jreback
Copy link
Contributor

jreback commented Jan 20, 2021

can you add the example from the OP as well, the pd.date_range

looks like that example now fails for a different reason that will need to be fixed within pd.date_range

ok will remove the closes then

@jreback jreback added this to the 1.3 milestone Jan 20, 2021
@jreback jreback merged commit 86e07f2 into pandas-dev:master Jan 20, 2021
@jbrockmendel jbrockmendel deleted the enh-bounds branch January 20, 2021 23:17
nofarm3 pushed a commit to nofarm3/pandas that referenced this pull request Jan 21, 2021
@milotoor
Copy link

@jreback @jbrockmendel I think this change introduced a minor regression. A simple shell session to demonstrate:

~ cat tslib_bug_demo.py
import pandas as pd

min_timestamp = pd.Timestamp.min
min_pydatetime = min_timestamp.to_pydatetime()
min_datetime = pd.to_datetime(min_pydatetime)
print(min_datetime)

➜  ~ pyenv activate pandas_1_2_5
(pandas_1_2_5) ➜  ~ python --version
Python 3.9.10
(pandas_1_2_5) ➜  ~ pip list | grep pandas
pandas          1.2.5
(pandas_1_2_5) ➜  ~ python tslib_bug_demo.py
1677-09-21 00:12:43.145225
(pandas_1_2_5) ➜  ~ pyenv activate pandas_1_3_0
(pandas_1_3_0) ➜  ~ python --version
Python 3.9.10
(pandas_1_3_0) ➜  ~ pip list | grep pandas
pandas          1.3.0
(pandas_1_3_0) ➜  ~ python tslib_bug_demo.py
sys:1: UserWarning: Discarding nonzero nanoseconds in conversion
Traceback (most recent call last):
  File "/Users/milotoor/.pyenv/versions/pandas_1_3_0/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2187, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data.ravel("K"))
  File "pandas/_libs/tslibs/conversion.pyx", line 357, in pandas._libs.tslibs.conversion.datetime_to_datetime64
  File "pandas/_libs/tslibs/np_datetime.pyx", line 120, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1677-09-21 00:12:43

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/milotoor/tslib_bug_demo.py", line 5, in <module>
    min_datetime = pd.to_datetime(min_pydatetime)
  File "/Users/milotoor/.pyenv/versions/pandas_1_3_0/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 914, in to_datetime
    result = convert_listlike(np.array([arg]), format)[0]
  File "/Users/milotoor/.pyenv/versions/pandas_1_3_0/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 401, in _convert_listlike_datetimes
    result, tz_parsed = objects_to_datetime64ns(
  File "/Users/milotoor/.pyenv/versions/pandas_1_3_0/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2193, in objects_to_datetime64ns
    raise err
  File "/Users/milotoor/.pyenv/versions/pandas_1_3_0/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2175, in objects_to_datetime64ns
    result, tz_parsed = tslib.array_to_datetime(
  File "pandas/_libs/tslib.pyx", line 379, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 606, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 602, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 474, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslibs/np_datetime.pyx", line 120, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1677-09-21 00:12:43

Going from pd.Timestamp.min to the equivalent datetime.datetime instance and back works with pandas v1.2.5 but fails in v1.3.0 due to an out of bounds error. I'm not sure what to make of the "Discarding nonzero nanoseconds in conversion" warning above the traceback.

The pandas._libs.tslibs.np_datetime.check_dts_bounds functions are character-for-character the exact same between the two versions, but the _NS_MIN_DTS const has changed ever so slightly. Moreover, this change was the only change in the vicinity:

➜  pandas git:(main) git df v1.2.5 v1.3.0 -- pandas/_libs/tslibs/src

diff --git a/pandas/_libs/tslibs/src/datetime/np_datetime.c b/pandas/_libs/tslibs/src/datetime/np_datetime.c
index 8eb995dee6..9ad2ead5f9 100644
--- a/pandas/_libs/tslibs/src/datetime/np_datetime.c
+++ b/pandas/_libs/tslibs/src/datetime/np_datetime.c
@@ -32,7 +32,7 @@ This file is derived from NumPy 1.7. See NUMPY_LICENSE.txt
 #endif  // PyInt_AsLong

 const npy_datetimestruct _NS_MIN_DTS = {
-    1677, 9, 21, 0, 12, 43, 145225, 0, 0};
+    1677, 9, 21, 0, 12, 43, 145224, 193000, 0};
 const npy_datetimestruct _NS_MAX_DTS = {
     2262, 4, 11, 23, 47, 16, 854775, 807000, 0};

This feels like a bug to me, but perhaps it's expected behavior? I can confirm that v1.4.1 also raises the same exception. I'm happy to open an actual issue if you think this merits it. Your sagacious opinions would be greatly appreciated and thank you for your contributions to an amazing tool.

@jbrockmendel
Copy link
Member Author

I'm not sure what to make of the "Discarding nonzero nanoseconds in conversion" warning above the traceback.

It means that when doing .to_pydatetime, the nanosecond portion of the Timestamp is lost, so you should not expect to round-trip losslessly. The behavior you are describing is expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants