-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Timedelta in to_json object array and ISO dates not handled properly #28256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What is the desired behavior here? That the first case using Timedelta look like the second (casting to object)? |
The time delta should be written out using iso format just like datetimes are (though using the duration ISO format) whether or not included in an object array or DTA |
Sounds good. I'll take a whack at it if no one else has professed a desire. |
Awesome! If it's any help I'm fairly certain the issue is that the PyDelta check in the extension module that serializes everything doesn't make consideration as to whether or not the date format should be iso. Here's where I think this is relevant: pandas/pandas/_libs/src/ujson/python/objToJSON.c Line 1919 in 2d65e38
If you did some kind of conditional branch as follows: pandas/pandas/_libs/src/ujson/python/objToJSON.c Line 1894 in 2d65e38
The true branch should dispatch to the |
Thanks Will! That definitely saved some time getting to the root of one of the issues and I think the fix is just as you say. This takes care of the case when you do In this case, we pass through the pandas/pandas/_libs/src/ujson/python/objToJSON.c Line 1919 in 9aa9db9
and instead get into the check here: pandas/pandas/_libs/src/ujson/python/objToJSON.c Line 2070 in 9aa9db9
then here:
which sets
then go back to pandas/pandas/_libs/src/ujson/python/objToJSON.c Line 1803 in 9aa9db9
and finally here:
So it appears that a simple check somewhere before that last line to differentiate a timedelta from a datetime could work, but the issue here is that now we're dealing with a NumPy timedelta64 rather than a pandas timedelta so we don't have the This seems to be pointing to a solution where we define something like a I suppose another possible solution is to convert the NP timedelta64 to a pandas TimeDelta and then proceed as we do for the other case, but aside for what effects that might have on performance, making such a conversion in C doesn't appear to be as straightforward as it sounds. |
Great thanks for diving into this. For now I think it would be OK to just create a pandas Timedelta object and call Generally I think we need to simplify our JSON serialization as it's been rather stale development-wise for a few years, so I've thought about doing as much logic in the Python layer as possible before sending to JSON. This would mean dispatching to vectorized methods in Python where applicable (i.e. a DatetimeArray is converted to an array of ISO strings before getting to the serializer) and for one-off items just converting to the equivalent pandas object during serialization. ^ just thinking out loud so as you look at it any ideas you have are certainly welcome |
Thanks Will. That's pretty much how I decided to do it. Looks like that's how it was done in the |
Intertwined with #15137 though a slight different issue.
date_format="iso"
has different behavior for Timedeltas depending on whether or not the Timedelta is in a DTA or an object array. To illustrate:By contrast the same issue does not appear with datetimes
The text was updated successfully, but these errors were encountered: