BUG: to_json incorrectly localizes tz-naive datetimes to UTC #46730

lithomas1 · 2022-04-10T16:43:29Z

closes pandas.Series.to_json() incorrectly localizes tz-naive datetimes to UTC #38760 (Replace xxxx with the Github issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

WillAyd

Wow great start. A few things I think we can improve upon but this looks great

pandas/_libs/src/ujson/python/date_conversions.c

pandas/_libs/src/ujson/python/objToJSON.c

pandas/_libs/src/ujson/python/date_conversions.c

lithomas1 · 2022-04-12T04:36:53Z

pandas/_libs/src/ujson/python/objToJSON.c

@@ -221,8 +221,18 @@ static PyObject *get_values(PyObject *obj) {
        // The special cases to worry about are dt64tz and category[dt64tz].
        //  In both cases we want the UTC-localized datetime64 ndarray,
        //  without going through and object array of Timestamps.


@WillAyd In my current (very rough) implementation of this, I've decided to force an array of Timestamps(not sure what the perf/behavior implications of this would be).

Alternatively, I could try to just use the dt64 ndarray, and grab the tz of the DTA(as the tz should be the same). Then, I can probably store this inside the JSONTypeContext and pass that through to int64ToISO function. This muddles up the code a little, since numpy dt64 objects are supposed to be tz-naive and PyDateTimetoISO should really be handling this case.

Do you have a preference for one way or the other?
(also cc @jbrockmendel who may have an opinion)

I would avoid adding anything to JSONTypeContext as it’s already way too overloaded. Is this not something we can just manage in the serializer?

Working with an ndarray should be much faster. However, there are also cases where we need to handle datetimes being inside of an object array so need to be robust in how we manage (you might want to add a test for the latter)

I added the test. Can you elaborate more about how to handle this in the serializer?
I don't see anywhere else where I could store the tz info other than in the TypeContext.

I guess the way it is handled now is fine, but it would be nice to avoid the performance hit from casting to object.

You might get away with storing this information in the NpyArrayContext since that already has the type number from NumPy. My hesitation with TypeContext is that it is already pretty sparsely overloaded for different types so its difficult to debug and figure out the intent of it within various contexts

jreback

needs a big whatsnew note

pandas/_libs/src/ujson/python/date_conversions.c

pandas/tests/io/json/test_pandas.py

pandas/_libs/tslibs/src/datetime/np_datetime.c

jbrockmendel · 2022-04-20T16:14:15Z

pandas/core/internals/blocks.py

+        # force dt64tz to go through object dtype
+        # tz info will be lost when converting to
+        # dt64 which is naive
+        return self.values.astype(object)


with this change, behavior is equivalent to just removing both this and DatimeLikeBlock.values_for_json. Might mean a perf hit for dt64naive and td64 though

Sorry, I'm not sure I follow here about dt64naive and td64.
Don't those go through DatetimeLikeBlock instead of DatetimeTZBlock? (I didn't change any behavior there).

right. what im saying is that i think NDArrayBackedBlock.values_for_json works, so no longer needs to be overridden

I'll take care of this in a follow up.

doc/source/whatsnew/v1.5.0.rst

…tznaive-utc

pandas/_libs/tslibs/src/datetime/np_datetime.c

pandas/_libs/src/ujson/python/date_conversions.c

lithomas1 · 2022-04-30T04:17:11Z

Sorry for the low quality code. I was prototyping quickly and forgot to clean up afterwards.

Regarding #46730, I think I'll stick to the current approach for now. Existing perf benchmarks surprisingly seem unaffected by this change, so not sure its worth pursuing at least for now(it might be worth doing for the tz-aware case later).

WillAyd

one last nit otherwise this lgtm

WillAyd · 2022-05-03T03:21:19Z

pandas/_libs/src/ujson/python/date_conversions.c

+    if (PyObject_HasAttrString(obj, "tzinfo")) {
+        PyObject *offset = extract_utc_offset(obj);
+        if (offset == NULL) {
+            return NULL;


Looks like we also need to PyObject_Free(result) before returning if things fail to avoid any leaks

WillAyd

Lgtm - impressive work

jreback · 2022-05-04T23:30:30Z

thanks @lithomas1 very nice!

I believe there is a de-duplication followup (values_for_json) if you can

…dev#46730)

BUG: to_json incorrectly localizes tz-naive datetimes to UTC

acdcb05

lithomas1 added Bug IO JSON read_json, to_json, json_normalize labels Apr 10, 2022

lithomas1 added 4 commits April 10, 2022 11:54

fix warnings?

9652002

actually fix warnings

16332fc

Update objToJSON.c

aa6444c

fix formatting(and test)

94ad92a

WillAyd reviewed Apr 11, 2022

View reviewed changes

lithomas1 commented Apr 12, 2022

View reviewed changes

Address code comments

ab895dc

lithomas1 marked this pull request as ready for review April 16, 2022 16:28

jreback requested changes Apr 17, 2022

View reviewed changes

pandas/_libs/src/ujson/python/date_conversions.c Outdated Show resolved Hide resolved

pandas/tests/io/json/test_pandas.py Show resolved Hide resolved

address comments

075e7ca

WillAyd reviewed Apr 20, 2022

View reviewed changes

pandas/_libs/tslibs/src/datetime/np_datetime.c Outdated Show resolved Hide resolved

jbrockmendel reviewed Apr 20, 2022

View reviewed changes

jreback added this to the 1.5 milestone Apr 26, 2022

jreback requested changes Apr 26, 2022

View reviewed changes

doc/source/whatsnew/v1.5.0.rst Show resolved Hide resolved

lithomas1 added 2 commits April 26, 2022 20:14

address more comments

ca3316d

Merge branch 'main' of https://github.com/pandas-dev/pandas into bug-…

9772fc3

…tznaive-utc

WillAyd reviewed Apr 27, 2022

View reviewed changes

pandas/_libs/tslibs/src/datetime/np_datetime.c Outdated Show resolved Hide resolved

pandas/_libs/src/ujson/python/date_conversions.c Outdated Show resolved Hide resolved

lithomas1 added 2 commits April 29, 2022 20:47

address comments

fe90bb3

Merge branch 'main' into bug-tznaive-utc

f983dce

lithomas1 requested review from WillAyd, jbrockmendel and jreback May 2, 2022 17:18

WillAyd reviewed May 3, 2022

View reviewed changes

fix memleak

65f321b

WillAyd approved these changes May 3, 2022

View reviewed changes

Merge branch 'main' into bug-tznaive-utc

e314020

jreback approved these changes May 4, 2022

View reviewed changes

jreback merged commit 009c4c6 into pandas-dev:main May 4, 2022

lithomas1 deleted the bug-tznaive-utc branch May 5, 2022 00:03

lithomas1 mentioned this pull request Jun 6, 2022

CLN: DatetimeTZBlock don't override values_for_json #46993

Merged

4 tasks

PablocFonseca mentioned this pull request Jul 5, 2022

Feat: increase data serialization speed PablocFonseca/streamlit-aggrid#85

Open

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022

BUG: to_json incorrectly localizes tz-naive datetimes to UTC (pandas-…

76da657

…dev#46730)

Uh oh!

BUG: to_json incorrectly localizes tz-naive datetimes to UTC #46730

BUG: to_json incorrectly localizes tz-naive datetimes to UTC #46730

Uh oh!

Conversation

lithomas1 commented Apr 10, 2022

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lithomas1 commented Apr 30, 2022

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

jreback commented May 4, 2022

Uh oh!

Uh oh!