Skip to content

BUG: Timestamp(pd.NA) raising TypeError #54102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 17, 2023

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley added Bug Timestamp pd.Timestamp and associated methods labels Jul 13, 2023
@lukemanley lukemanley added this to the 2.1 milestone Jul 13, 2023
@lukemanley lukemanley requested a review from MarcoGorelli as a code owner July 13, 2023 00:30
MarcoGorelli

This comment was marked as outdated.

Comment on lines 1885 to 1886
elif checknull_with_nat_and_na(ts_input):
return NaT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the

        if ts.value == NPY_NAT:
            return NaT

be removed from below now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like it can be removed. Just tried it and this starts to raise if its removed:

pd.Timestamp(np.datetime64("NaT"))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, thanks, checknull checks for pandas NaT, not numpy NaT

does it work to do the check a bit deeper? i.e.

diff --git a/pandas/_libs/tslibs/conversion.pyx b/pandas/_libs/tslibs/conversion.pyx
index 57b7754b08..4d494d2282 100644
--- a/pandas/_libs/tslibs/conversion.pyx
+++ b/pandas/_libs/tslibs/conversion.pyx
@@ -1,4 +1,5 @@
 import numpy as np
+from pandas._libs.missing cimport checknull_with_nat_and_na
 
 cimport numpy as cnp
 from libc.math cimport log10
@@ -268,7 +269,7 @@ cdef _TSObject convert_to_tsobject(object ts, tzinfo tz, str unit,
     if isinstance(ts, str):
         return convert_str_to_tsobject(ts, tz, unit, dayfirst, yearfirst)
 
-    if ts is None or ts is NaT:
+    if checknull_with_nat_and_na(ts):
         obj.value = NPY_NAT
     elif is_datetime64_object(ts):
         reso = get_supported_reso(get_datetime64_unit(ts))
diff --git a/pandas/_libs/tslibs/timestamps.pyx b/pandas/_libs/tslibs/timestamps.pyx
index 80b7c2d2f5..59640a3ef4 100644
--- a/pandas/_libs/tslibs/timestamps.pyx
+++ b/pandas/_libs/tslibs/timestamps.pyx
@@ -1882,9 +1882,6 @@ class Timestamp(_Timestamp):
                                 hour or 0, minute or 0, second or 0, fold=fold or 0)
             unit = None
 
-        elif checknull_with_nat_and_na(ts_input):
-            return NaT
-
         if getattr(ts_input, "tzinfo", None) is not None and tz is not None:
             raise ValueError("Cannot pass a datetime or Timestamp with tzinfo with "
                              "the tz parameter. Use tz_convert instead.")

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - updated. thanks

@jbrockmendel
Copy link
Member

Is it obvious we want to allow this? If we consider pd.NA as meaningfully distinct from pd.NaT, then silently converting would be wrong

@MarcoGorelli
Copy link
Member

I would've thought so - the docs say

For datetime64[ns] types, NaT represents missing values

and pd.NA is what signals missing values. Will check what's done in other cases, not really sure

@jbrockmendel
Copy link
Member

I would've thought so - the docs say [...] and pd.NA is what signals missing values. Will check what's done in other cases, not really sure

The meaning of "missing" has changed since that doc was written. In particular NaT has nan-like semantics.

@lukemanley
Copy link
Member Author

Is it obvious we want to allow this? If we consider pd.NA as meaningfully distinct from pd.NaT, then silently converting would be wrong

I dont have a strong opinion but I'll note:

  1. Other na-likes already get converted to NaT (e.g. None -> NaT, NaN -> NaT)

  2. The current behavior does seem inconsistent:

pd.to_timedelta(pd.NA)  # -> NaT
pd.to_datetime(pd.NA)   # -> NaT
pd.Timedelta(pd.NA)     # -> NaT
pd.Timestamp(pd.NA)     # -> TypeError

@jbrockmendel
Copy link
Member

The current behavior does seem inconsistent:

fair enough. no objection here.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, going to go ahead and approve then, thanks @lukemanley !

@mroeschke
Copy link
Member

This might cause issue if we every introduced a nullable numpy datetime/timedelta types with pd.NA as the missing value, but that's not planned right?

@jbrockmendel
Copy link
Member

This might cause issue if we every introduced a nullable numpy datetime/timedelta types with pd.NA as the missing value, but that's not planned right?

Does the numpy-nullable part matter here? timestamp[pyarrow] already behaves that way

@mroeschke
Copy link
Member

Does the numpy-nullable part matter here? timestamp[pyarrow] already behaves that way

Sure, I was just confirming that there wasn't appetite to have BaseMaskedArray support datetime/timedelta types for example

@mroeschke mroeschke merged commit a74f0a9 into pandas-dev:main Jul 17, 2023
@mroeschke
Copy link
Member

Thanks @lukemanley

@lukemanley lukemanley deleted the timestamp-ctor-na branch July 25, 2023 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timestamp pd.Timestamp and associated methods
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Cannot convert pd.NA to Timestamp
4 participants