Skip to content

Maintain Timezone Awareness with to_json and date_format="iso" #28912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,7 @@ I/O
- Bug in :func:`read_hdf` closing stores that it didn't open when Exceptions are raised (:issue:`28699`)
- Bug in :meth:`DataFrame.read_json` where using ``orient="index"`` would not maintain the order (:issue:`28557`)
- Bug in :meth:`DataFrame.to_html` where the length of the ``formatters`` argument was not verified (:issue:`28469`)
- Bug in :meth:`DataFrame.to_json` where timezone-aware dates were converted to UTC (:issue:`12997`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this specify that it is only fixed-offset timezones that are now supported? any other names e.g. pd.to_json to mention?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't know what fixed versus (assumedly) non-fixed offsets are - can you clarify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"UTC-8" is a fixed offset. "US/Pacific" (or colloquially-but-inaccurately "PST/PDT") is not a fixed offset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't think that distinction matters when writing then (?). At least in tests the construction goes something like pd.date_range(..., tz="US/Eastern") and it writes out in ISO format

That may be a problem when reading. There needs to be a follow up on the reading side to support this as this now does the following when using "+0000" instead of "Z" as the time zone designator

In [5]: df = pd.DataFrame(pd.date_range("2013-01-01", periods=2, tz="US/Eastern"), columns=['date'])

In [6]: pd.read_json(df.to_json(date_format="iso")).dtypes
Out[6]:
date    datetime64[ns, pytz.FixedOffset(-300)]
dtype: object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah never mind...I get what you are saying now. I guess the ISO format is ambiguous as to what timezone it is, so sure fixed only. Can update that on next iteration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess if youve specified date_format="iso" then the user knows what they're getting into. its just annoying that we dont have a nice way to round-trip read/write non-fixed-offset timezones


Plotting
^^^^^^^^
Expand Down
91 changes: 84 additions & 7 deletions pandas/_libs/src/ujson/python/objToJSON.c
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,7 @@ static void *PyFloatToDOUBLE(JSOBJ _obj, JSONTypeContext *tc, void *outValue,
}

static void *PyBytesToUTF8(JSOBJ _obj, JSONTypeContext *tc, void *outValue,
size_t *_outLen) {
size_t *_outLen) {
PyObject *obj = (PyObject *)_obj;
*_outLen = PyBytes_GET_SIZE(obj);
return PyBytes_AS_STRING(obj);
Expand All @@ -462,9 +462,33 @@ static void *PyUnicodeToUTF8(JSOBJ _obj, JSONTypeContext *tc, void *outValue,
return PyBytes_AS_STRING(newObj);
}

/*
Generic function to serialize date time structs to the appropriate JSON format.

Parameters
----------
npy_datetimestruct *dts : Pointer to a struct holding datetime information
(year, month, day, etc...)
JSONTypeContext *tc : Pointer to the context for serialization
void *outValue : Pointer to a JSON serializable value size_t
*_outLen : For C-string output, the length of the string that needs to be
accounted for.
int offset_in_min : Number of minutes the npy_datetimestruct is offset from UTC

Returns
-------
TODO : This returns a C String for ISO dates while also modifying the cStr for
the type context. That seems buggy and/or unnecessary?

Notes
-----
In an ideal world we wouldn't have to handle offset_in_min separate from
npy_datetimestruct. Unfortunately npy_datetimestruct does not hold this info, so
we pass it alongside the struct.
*/
static void *PandasDateTimeStructToJSON(npy_datetimestruct *dts,
JSONTypeContext *tc, void *outValue,
size_t *_outLen) {
size_t *_outLen, int offset_in_min) {
NPY_DATETIMEUNIT base = ((PyObjectEncoder *)tc->encoder)->datetimeUnit;

if (((PyObjectEncoder *)tc->encoder)->datetimeIso) {
Expand All @@ -477,7 +501,8 @@ static void *PandasDateTimeStructToJSON(npy_datetimestruct *dts,
return NULL;
}

if (!make_iso_8601_datetime(dts, GET_TC(tc)->cStr, *_outLen, base)) {
if (!make_iso_8601_datetime(dts, GET_TC(tc)->cStr, *_outLen, 1, 0, base,
offset_in_min, 0)) {
PRINTMARK();
*_outLen = strlen(GET_TC(tc)->cStr);
return GET_TC(tc)->cStr;
Expand Down Expand Up @@ -505,19 +530,68 @@ static void *NpyDateTimeScalarToJSON(JSOBJ _obj, JSONTypeContext *tc,

pandas_datetime_to_datetimestruct(obj->obval,
(NPY_DATETIMEUNIT)obj->obmeta.base, &dts);
return PandasDateTimeStructToJSON(&dts, tc, outValue, _outLen);
return PandasDateTimeStructToJSON(&dts, tc, outValue, _outLen, 0);
}

/*
Top level method for returning the conversion routine for serializing a
datetimestruct to JSON.

Parameters
----------
JSOBJ _obj : In all actuality, this is a PyObject* passed from the Object_ type
context; should be a datetime
JSONTypeContext *tc : Pointer to the Type Context at this point in serialization
void *outValue : Pointer to the serializable object; in this scope, can be
either an integer or C-string,
depending on whether or not we are serializing dates to Unix epoch or ISO
format
size_t *_outLen : Pointer to the C-string length of the serializable object.
Should be modified within function body.

Returns
-------
Function pointer to appropriate serialization routine.

Notes
-----
For iso_date formats, this passes a npy_datetimestruct to the appropriate
conversion function. Unfortunately the npy_datetimestuct does not have timezone
awareness, so the offset from UTC in minutes is passed instead.
*/
static void *PyDateTimeToJSON(JSOBJ _obj, JSONTypeContext *tc, void *outValue,
size_t *_outLen) {
npy_datetimestruct dts;
PyDateTime_Date *obj = (PyDateTime_Date *)_obj;
PyDateTime_DateTime *obj = (PyDateTime_DateTime *)_obj;

PRINTMARK();

if (!convert_pydatetime_to_datetimestruct(obj, &dts)) {
PRINTMARK();
return PandasDateTimeStructToJSON(&dts, tc, outValue, _outLen);

long offset_in_min = 0;
PyObject *utcoffset = PyObject_CallMethod(_obj, "utcoffset", NULL);

if (utcoffset == NULL)
return PyErr_NoMemory();

else if (utcoffset != Py_None) {
PyObject *tot_seconds =
PyObject_CallMethod(utcoffset, "total_seconds", NULL);

if (tot_seconds == NULL) {
Py_DECREF(utcoffset);
return PyErr_NoMemory();
}

offset_in_min = PyLong_AsLong(tot_seconds) / 60;
Py_DECREF(tot_seconds);
}

Py_DECREF(utcoffset);

return PandasDateTimeStructToJSON(&dts, tc, outValue, _outLen,
offset_in_min);
} else {
if (!PyErr_Occurred()) {
PyErr_SetString(PyExc_ValueError,
Expand All @@ -535,7 +609,10 @@ static void *NpyDatetime64ToJSON(JSOBJ _obj, JSONTypeContext *tc,

pandas_datetime_to_datetimestruct((npy_datetime)GET_TC(tc)->longValue,
NPY_FR_ns, &dts);
return PandasDateTimeStructToJSON(&dts, tc, outValue, _outLen);

// Because this function is for numpy datetimes which by nature are not
// tz-aware we can pass the offset_in_min as 0
return PandasDateTimeStructToJSON(&dts, tc, outValue, _outLen, 0);
}

static void *PyTimeToJSON(JSOBJ _obj, JSONTypeContext *tc, void *outValue,
Expand Down
20 changes: 19 additions & 1 deletion pandas/_libs/tslibs/src/datetime/np_datetime.c
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,24 @@ This file is derived from NumPy 1.7. See NUMPY_LICENSE.txt
#include <numpy/ndarraytypes.h>
#include "np_datetime.h"

char *_datetime_strings[NPY_DATETIME_NUMUNITS] = {
"Y",
"M",
"W",
"<invalid>",
"D",
"h",
"m",
"s",
"ms",
"us",
"ns",
"ps",
"fs",
"as",
"generic"
};

#if PY_MAJOR_VERSION >= 3
#define PyInt_AsLong PyLong_AsLong
#endif // PyInt_AsLong
Expand Down Expand Up @@ -321,7 +339,7 @@ int cmp_npy_datetimestruct(const npy_datetimestruct *a,
* Returns -1 on error, 0 on success, and 1 (with no error set)
* if obj doesn't have the needed date or datetime attributes.
*/
int convert_pydatetime_to_datetimestruct(PyDateTime_Date *dtobj,
int convert_pydatetime_to_datetimestruct(PyDateTime_DateTime *dtobj,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Types are pretty loosely checked in this module; changed these to a more exact date time for clarity

npy_datetimestruct *out) {
// Assumes that obj is a valid datetime object
PyObject *tmp;
Expand Down
3 changes: 2 additions & 1 deletion pandas/_libs/tslibs/src/datetime/np_datetime.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ extern const npy_datetimestruct _NS_MAX_DTS;
// stuff pandas needs
// ----------------------------------------------------------------------------

int convert_pydatetime_to_datetimestruct(PyDateTime_Date *dtobj,
int convert_pydatetime_to_datetimestruct(PyDateTime_DateTime *dtobj,
npy_datetimestruct *out);

npy_datetime npy_datetimestruct_to_datetime(NPY_DATETIMEUNIT base,
Expand All @@ -48,6 +48,7 @@ void pandas_timedelta_to_timedeltastruct(npy_timedelta val,
NPY_DATETIMEUNIT fr,
pandas_timedeltastruct *result);

extern char *_datetime_strings[NPY_DATETIME_NUMUNITS];
extern const int days_per_month_table[2][12];

// stuff numpy-derived code needs in header
Expand Down
Loading