-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: Timestamp(pydatetime) microsecond reso #49034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Timestamp(pydatetime) microsecond reso #49034
Conversation
I think the |
Also wanted to add...this is awesome. Very impressive work |
I tried adding
and am getting a segfault. is there a better way to do this? |
I think you want PyObject *nano_obj = PyObject_CallMethod(o, "_as_unit", "s", "ns"); The If you get stuck from there I would suggest running python with gdb attached to it via Assuming you built the extensions with debugging symbols enabled your debugger will drop you right at the point where the segfault occurs and let you inspect the state of things. If the above doesn't do it alone, my guess is there are other types of objects that we deal with than just timestamps in that function. Ideally we would do a type check...but we aren't at the moment, so dealing with anything but a Timestamp and trying to invoke that method could also cause a segfault Lastly you might find this useful: https://pandas.pydata.org/docs/dev/development/debugging_extensions.html#improve-debugger-printing Good luck. If you are still having trouble ping me again and can look deeper |
I ended up effectively re-implementing _as_unit. No more segfault, but im just getting wrong answers.
The test case thats failing should go through the NPY_FR_us branch, but apparently none of these conditions are evaluating as truthy. |
Ah I gotcha. The problem there is you essentially are getting back a If you want the primitive integer value for comparison against the NPY_DATETIMEUNIT enum you'd likely want to do something like: if (!PyLong_Check(reso)) {
// TODO: we should have error handling here, but one step at a time...
}
long cReso = PyLong_AsLong(reso);
if (cReso == -1 && PyErr_Occurred()) {
// TODO: we should have error handling here, but one step at a time...
} And then use Also be sure to |
@jbrockmendel just to drive home the issue with the cast, I think it is helpful to think about the bytes that are getting moved around. When comparing against the NPY_DATETIMEUNIT enum to see if something is in milliseconds, you are essentially comparisong against the integer 8. Assuming a 32 bit integer on a little endian platform, the binary structure of that is By contrast, a PyObject is a struct that will occupy more memory than 32 bits and have a totally different binary representation. It's not important what exactly that is for now, but let's assume that it's When you cast the PyObject to a NPY_DATETIMEUNIT you are basically just casting it to an integer and therefore only interpreting an integer's amount of bytes. I don't know what the exact rules for that are and if its well defined, but let's assume for simplicity it just takes the first 32 bytes of the PyObject. Your previous comparisons would then just be trying to compare against |
pandas/_libs/tslibs/conversion.pyx
Outdated
elif PyDate_Check(ts): | ||
# Keep the converter same as PyDateTime's | ||
ts = datetime.combine(ts, time()) | ||
return convert_datetime_to_tsobject(ts, tz) | ||
return convert_datetime_to_tsobject(ts, tz, nanos=0, reso=NPY_FR_us) # TODO: or lower? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO I would cast dates to the lowest resolution (and document it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a whatsnew for this too?
if isinstance(ts, ABCTimestamp): | ||
reso = abbrev_to_npy_unit(ts.unit) # TODO: faster way to do this? | ||
else: | ||
# TODO: what if user explicitly passes nanos=0? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to hit this? Maybe we should raise instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it could happen with pd.Timestamp(pydatetime_obj, nanosecond=0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think generally lgtm. A few comments that aren't blockers
@@ -508,7 +508,10 @@ def _unbox_scalar(self, value) -> np.datetime64: | |||
if not isinstance(value, self._scalar_type) and value is not NaT: | |||
raise ValueError("'value' should be a Timestamp.") | |||
self._check_compatible_with(value) | |||
return value.asm8 | |||
if value is NaT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible I've missed this conversation but do we need to give consideration to a generic NaT type that can hold different precisions? Or are we always going to use numpy's value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to give consideration to a generic NaT type that can hold different precisions
The closest I've seen to this has been a discussion of having a separate NaT-like for timedelta. I'm not aware of any discussion of a resolution-specific NaT.
@@ -360,7 +360,14 @@ def wrapper( | |||
if out_dtype is not None: | |||
out = out.view(out_dtype) | |||
if fill_wrap is not None: | |||
# FIXME: if we get here with dt64/td64 we need to be sure we have | |||
# matching resos | |||
if fill_value.dtype.kind == "m": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth making a re-usable function for this? I could see this useful in other areas. Something like reso_for_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd like to hold off on that as out of scope
pandas/_libs/tslibs/conversion.pyx
Outdated
elif PyDate_Check(ts): | ||
# Keep the converter same as PyDateTime's | ||
# For date object we give the lowest supporte resolution, ie. "s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test where we construct a Timestamp from a datetime.date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dedicated test added + green
Co-authored-by: Matthew Roeschke <[email protected]>
…as into nano-tstamp-pydatetime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM can merge on green
Thanks @jbrockmendel |
thanks for reviews @mroeschke and @WillAyd. we're getting close to having full non-nano support! |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.I've got one json test still failing locally cc @WillAyd see the comment in the ujson file, suggestions?