Skip to content

Maintain Timezone Awareness with to_json and date_format="iso" #28912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Oct 10, 2019

This vendors updates from numpy that allow for tz-aware ISO date formatting. Note this does slightly change the behavior of UTC dates. Previously they would write out as 2013-01-01T05:00:00.000Z but now are 2013-01-01T05:00:00.000+00:00. Both are valid ISO 8601

There is a follow up that needs to be addressed with reading these dates

"tz_range",
[
pd.date_range("2013-01-01 05:00:00Z", periods=2),
pd.date_range("2013-01-01 00:00:00", periods=2, tz="US/Eastern"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests were unfortunately asserting the previous wrong behavior, so I split them up into tests looking for UTC versus those looking for offsets

@@ -321,7 +339,7 @@ int cmp_npy_datetimestruct(const npy_datetimestruct *a,
* Returns -1 on error, 0 on success, and 1 (with no error set)
* if obj doesn't have the needed date or datetime attributes.
*/
int convert_pydatetime_to_datetimestruct(PyDateTime_Date *dtobj,
int convert_pydatetime_to_datetimestruct(PyDateTime_DateTime *dtobj,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Types are pretty loosely checked in this module; changed these to a more exact date time for clarity

@WillAyd WillAyd added IO JSON read_json, to_json, json_normalize Timezones Timezone data dtype labels Oct 10, 2019
@@ -314,6 +314,7 @@ I/O
- Bug in :func:`read_hdf` closing stores that it didn't open when Exceptions are raised (:issue:`28699`)
- Bug in :meth:`DataFrame.read_json` where using ``orient="index"`` would not maintain the order (:issue:`28557`)
- Bug in :meth:`DataFrame.to_html` where the length of the ``formatters`` argument was not verified (:issue:`28469`)
- Bug in :meth:`DataFrame.to_json` where timezone-aware dates were converted to UTC (:issue:`12997`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this specify that it is only fixed-offset timezones that are now supported? any other names e.g. pd.to_json to mention?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't know what fixed versus (assumedly) non-fixed offsets are - can you clarify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"UTC-8" is a fixed offset. "US/Pacific" (or colloquially-but-inaccurately "PST/PDT") is not a fixed offset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't think that distinction matters when writing then (?). At least in tests the construction goes something like pd.date_range(..., tz="US/Eastern") and it writes out in ISO format

That may be a problem when reading. There needs to be a follow up on the reading side to support this as this now does the following when using "+0000" instead of "Z" as the time zone designator

In [5]: df = pd.DataFrame(pd.date_range("2013-01-01", periods=2, tz="US/Eastern"), columns=['date'])

In [6]: pd.read_json(df.to_json(date_format="iso")).dtypes
Out[6]:
date    datetime64[ns, pytz.FixedOffset(-300)]
dtype: object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah never mind...I get what you are saying now. I guess the ISO format is ambiguous as to what timezone it is, so sure fixed only. Can update that on next iteration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess if youve specified date_format="iso" then the user knows what they're getting into. its just annoying that we dont have a nice way to round-trip read/write non-fixed-offset timezones

@jbrockmendel
Copy link
Member

are these changes mostly copy/pasted from numpy? IIRC we used to use copy/pasted versions for some of these but i edited them to remove unused or always-0 args, woops.

"was too short, with length %d",
outlen);
"The string provided for NumPy ISO datetime formatting "
"was too short, with length %"NPY_INTP_FMT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a space or percent or something after the close-quote?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the leading spaces you deleted here, were those to make a linter happy? if not, they look wierd to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have been a straight copy / paste from NumPy. I did use clang-format for objToJSON but not the vendored items

static int
convert_datetimestruct_utc_to_local(npy_datetimestruct *out_dts_local,
const npy_datetimestruct *dts_utc, int *out_timezone_offset)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont already have something like this available? or we only have it in cython?

rawtime += dts_utc->min * 60;

/* localtime converts a 'time_t' into a local 'struct tm' */
if (get_localtime(&rawtime, &tm_) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost sure i removed get_localtime a while back. can we do without this check? then we could get rid of get_localtime and the ugly windows shim that goes with it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. This code does seem to segfault on local or ambiguous times that predate 1970 atm so maybe related

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. if you find an easy way to avoid the need that'd be neat, but not a priority

@WillAyd WillAyd mentioned this pull request Oct 11, 2019
5 tasks
@WillAyd
Copy link
Member Author

WillAyd commented Oct 30, 2019

Closing for now as I haven't been actively working this and want to keep the queue down, but plan on coming back to it one day

@WillAyd WillAyd closed this Oct 30, 2019
@WillAyd WillAyd deleted the fix-json-tz branch January 16, 2020 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_json converts to UTC when encoding ISO formatted datetimes
2 participants