API: Timestamp(pydatetime) microsecond reso #49034

jbrockmendel · 2022-10-10T20:13:34Z

closes #xxxx (Replace xxxx with the Github issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

I've got one json test still failing locally cc @WillAyd see the comment in the ujson file, suggestions?

WillAyd · 2022-10-11T01:50:34Z

I think the get_long_attr function currently in the JSON implicitly expects nanosecond resolution, which would be incorrect and mangles the subsequent functions that convert to the unit of the encoder. I think the easiest approach would be to update the get_long_attr function to read the unit of the object and convert to nanosecond as required

WillAyd · 2022-10-11T02:03:51Z

Also wanted to add...this is awesome. Very impressive work

jbrockmendel · 2022-10-11T19:34:15Z

update the get_long_attr function

I tried adding

PyObject *nano_obj = PyObject_CallMethod(o, "_as_unit", "(s)", "ns");

and am getting a segfault. is there a better way to do this?

WillAyd · 2022-10-11T21:23:55Z

I think you want

PyObject *nano_obj = PyObject_CallMethod(o, "_as_unit", "s", "ns");

The (s) would mean you are building a tuple.

If you get stuck from there I would suggest running python with gdb attached to it via gdb python then doing something like run -m pytest --lf (or run whatever script gives you the segfault)

Assuming you built the extensions with debugging symbols enabled your debugger will drop you right at the point where the segfault occurs and let you inspect the state of things. If the above doesn't do it alone, my guess is there are other types of objects that we deal with than just timestamps in that function. Ideally we would do a type check...but we aren't at the moment, so dealing with anything but a Timestamp and trying to invoke that method could also cause a segfault

Lastly you might find this useful:

https://pandas.pydata.org/docs/dev/development/debugging_extensions.html#improve-debugger-printing

Good luck. If you are still having trouble ping me again and can look deeper

jbrockmendel · 2022-10-11T22:57:24Z

I ended up effectively re-implementing _as_unit. No more segfault, but im just getting wrong answers.

static npy_int64 get_long_attr(PyObject *o, const char *attr) {
    // NB we are implicitly assuming that o is a Timedelta or Timestamp, or NaT

    npy_int64 long_val;
    PyObject *value = PyObject_GetAttrString(o, attr);
    long_val =
        (PyLong_Check(value) ? PyLong_AsLongLong(value) : PyLong_AsLong(value));

    Py_DECREF(value);

    if (long_val == NPY_MIN_INT64) {
        // i.e. o is NaT
        return long_val;
    }

    // ensure we are in nanoseconds, similar to Timestamp._as_reso or _as_unit
    NPY_DATETIMEUNIT reso = (NPY_DATETIMEUNIT)PyObject_GetAttrString(o, "_reso");

    if (reso == NPY_FR_us) {
        long_val = long_val * 1000L;
    } else if (reso == NPY_FR_ms) {
        long_val = long_val * 1000000L;
    } else if (reso == NPY_FR_s) {
        long_val = long_val * 1000000000L;
    }

    return long_val;
}

The test case thats failing should go through the NPY_FR_us branch, but apparently none of these conditions are evaluating as truthy.

WillAyd · 2022-10-11T23:26:24Z

Ah I gotcha. The problem there is you essentially are getting back a PyObject - casting it to an NPY_DATETIMEUNIT doesn't change anything about the object itself or the bytes that comprise it; in this case you would just errantly be suppressing warnings with the cast

If you want the primitive integer value for comparison against the NPY_DATETIMEUNIT enum you'd likely want to do something like:

if (!PyLong_Check(reso)) {
  // TODO: we should have error handling here, but one step at a time...
}

long cReso = PyLong_AsLong(reso);
if (cReso == -1 && PyErr_Occurred()) {
  // TODO: we should have error handling here, but one step at a time...
}

And then use cReso for your comparisons

Also be sure to Py_DECREF(reso) when you are done with it

WillAyd · 2022-10-12T00:13:00Z

@jbrockmendel just to drive home the issue with the cast, I think it is helpful to think about the bytes that are getting moved around. When comparing against the NPY_DATETIMEUNIT enum to see if something is in milliseconds, you are essentially comparisong against the integer 8. Assuming a 32 bit integer on a little endian platform, the binary structure of that is 00000000 00000000 00000000 00001000.

By contrast, a PyObject is a struct that will occupy more memory than 32 bits and have a totally different binary representation. It's not important what exactly that is for now, but let's assume that it's 00110011 00110011 00110011 00110011 00110011 00110011 00110011 00110011 and so on...

When you cast the PyObject to a NPY_DATETIMEUNIT you are basically just casting it to an integer and therefore only interpreting an integer's amount of bytes. I don't know what the exact rules for that are and if its well defined, but let's assume for simplicity it just takes the first 32 bytes of the PyObject. Your previous comparisons would then just be trying to compare against 00110011 00110011 00110011 00110011, which still doesn't equal the 8 you are looking for.

mroeschke · 2022-10-18T18:35:00Z

pandas/_libs/tslibs/conversion.pyx

    elif PyDate_Check(ts):
        # Keep the converter same as PyDateTime's
        ts = datetime.combine(ts, time())
-        return convert_datetime_to_tsobject(ts, tz)
+        return convert_datetime_to_tsobject(ts, tz, nanos=0, reso=NPY_FR_us)  # TODO: or lower?


IMO I would cast dates to the lowest resolution (and document it)

WillAyd

Can you add a whatsnew for this too?

WillAyd · 2022-11-17T21:55:33Z

pandas/_libs/tslibs/conversion.pyx

+            if isinstance(ts, ABCTimestamp):
+                reso = abbrev_to_npy_unit(ts.unit)  # TODO: faster way to do this?
+            else:
+                # TODO: what if user explicitly passes nanos=0?


Is it possible to hit this? Maybe we should raise instead?

i think it could happen with pd.Timestamp(pydatetime_obj, nanosecond=0)

WillAyd

I think generally lgtm. A few comments that aren't blockers

WillAyd · 2022-11-29T04:34:46Z

pandas/core/arrays/datetimes.py

@@ -508,7 +508,10 @@ def _unbox_scalar(self, value) -> np.datetime64:
        if not isinstance(value, self._scalar_type) and value is not NaT:
            raise ValueError("'value' should be a Timestamp.")
        self._check_compatible_with(value)
-        return value.asm8
+        if value is NaT:


Possible I've missed this conversation but do we need to give consideration to a generic NaT type that can hold different precisions? Or are we always going to use numpy's value?

do we need to give consideration to a generic NaT type that can hold different precisions

The closest I've seen to this has been a discussion of having a separate NaT-like for timedelta. I'm not aware of any discussion of a resolution-specific NaT.

WillAyd · 2022-11-29T04:39:55Z

pandas/core/array_algos/take.py

@@ -360,7 +360,14 @@ def wrapper(
        if out_dtype is not None:
            out = out.view(out_dtype)
        if fill_wrap is not None:
+            # FIXME: if we get here with dt64/td64 we need to be sure we have
+            #  matching resos
+            if fill_value.dtype.kind == "m":


Might be worth making a re-usable function for this? I could see this useful in other areas. Something like reso_for_type

i'd like to hold off on that as out of scope

mroeschke · 2022-11-29T19:15:19Z

pandas/_libs/tslibs/conversion.pyx

    elif PyDate_Check(ts):
        # Keep the converter same as PyDateTime's
+        # For date object we give the lowest supporte resolution, ie. "s"


Do we have a test where we construct a Timestamp from a datetime.date?

dedicated test added + green

pandas/_libs/tslibs/conversion.pyx

Co-authored-by: Matthew Roeschke <[email protected]>

…as into nano-tstamp-pydatetime

mroeschke

LGTM can merge on green

mroeschke · 2022-11-30T23:28:48Z

Thanks @jbrockmendel

jbrockmendel · 2022-11-30T23:38:24Z

thanks for reviews @mroeschke and @WillAyd. we're getting close to having full non-nano support!

jbrockmendel added 4 commits October 10, 2022 11:24

API: Timedelta(td64_obj) retain resolution

261719c

BUG: preserve DTA/TDA+timedeltalike scalar with mismatched resos

a8c6906

BUG: DatetimeArray-datetimelike mixed resos

6f5d4b5

API: Timestamp(pydatetime) microsecond reso

ad51d10

mroeschke added Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timestamp pd.Timestamp and associated methods labels Oct 11, 2022

Merge branch 'main' into nano-tstamp-pydatetime

74105d8

use willayd suggestion

43436ce

jbrockmendel added 11 commits October 12, 2022 13:54

Merge branch 'main' into nano-tstamp-pydatetime

dad131f

ci fixup

8802add

mypy fixup

4c6f0f6

ignore pyright

5c18738

Merge branch 'main' into nano-tstamp-pydatetime

17682b5

fix doctest

aeadbdc

un-xfail

382c46e

Merge branch 'main' into nano-tstamp-pydatetime

85aba3f

Merge branch 'main' into nano-tstamp-pydatetime

f8cef09

Merge main follow-up

bc6f014

Merge branch 'main' into nano-tstamp-pydatetime

343954f

mroeschke added this to the 2.0 milestone Oct 18, 2022

mroeschke reviewed Oct 18, 2022

View reviewed changes

jbrockmendel added 2 commits October 31, 2022 09:53

Merge branch 'main' into nano-tstamp-pydatetime

7f8db31

Merge branch 'main' into nano-tstamp-pydatetime

25db552

jbrockmendel added 3 commits November 16, 2022 15:55

suggestion json validation

40f28a1

Merge branch 'main' into nano-tstamp-pydatetime

e32996e

extra Py_DECREF

83cf179

WillAyd mentioned this pull request Nov 17, 2022

BUG: Validate All Py_* Functions in C Extensions check for NULL #49756

Open

requested refactor

0eafbd5

WillAyd reviewed Nov 17, 2022

View reviewed changes

jbrockmendel added 6 commits November 21, 2022 19:08

Merge branch 'main' into nano-tstamp-pydatetime

072eaee

Merge branch 'main' into nano-tstamp-pydatetime

fd0125d

fix doctest

c7c0cee

unit keyword

eab61b9

Merge branch 'main' into nano-tstamp-pydatetime

afe09bb

Merge branch 'main' into nano-tstamp-pydatetime

257276d

WillAyd approved these changes Nov 29, 2022

View reviewed changes

mroeschke reviewed Nov 29, 2022

View reviewed changes

pandas/_libs/tslibs/conversion.pyx Outdated Show resolved Hide resolved

jbrockmendel and others added 4 commits November 29, 2022 14:05

Update pandas/_libs/tslibs/conversion.pyx

6975db1

Co-authored-by: Matthew Roeschke <[email protected]>

Merge branch 'main' into nano-tstamp-pydatetime

a640de0

dedicate pydate reso test

c15675d

Merge branch 'nano-tstamp-pydatetime' of github.com:jbrockmendel/pand…

be83f35

…as into nano-tstamp-pydatetime

mroeschke approved these changes Nov 29, 2022

View reviewed changes

jbrockmendel added 2 commits November 30, 2022 09:07

fix failing resample test

8155f2a

Merge branch 'main' into nano-tstamp-pydatetime

497ee62

mroeschke merged commit 41db572 into pandas-dev:main Nov 30, 2022

jbrockmendel deleted the nano-tstamp-pydatetime branch November 30, 2022 23:37

Uh oh!

API: Timestamp(pydatetime) microsecond reso #49034

API: Timestamp(pydatetime) microsecond reso #49034

Uh oh!

Conversation

jbrockmendel commented Oct 10, 2022

Uh oh!

WillAyd commented Oct 11, 2022

Uh oh!

WillAyd commented Oct 11, 2022

Uh oh!

jbrockmendel commented Oct 11, 2022

Uh oh!

WillAyd commented Oct 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel commented Oct 11, 2022

Uh oh!

WillAyd commented Oct 11, 2022

Uh oh!

WillAyd commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mroeschke left a comment

Choose a reason for hiding this comment

Uh oh!

mroeschke commented Nov 30, 2022

Uh oh!

jbrockmendel commented Nov 30, 2022

Uh oh!

Uh oh!

WillAyd commented Oct 11, 2022 •

edited

Loading

WillAyd commented Oct 12, 2022 •

edited

Loading