-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Maintain Timezone Awareness with to_json and date_format="iso" #28912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
"tz_range", | ||
[ | ||
pd.date_range("2013-01-01 05:00:00Z", periods=2), | ||
pd.date_range("2013-01-01 00:00:00", periods=2, tz="US/Eastern"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests were unfortunately asserting the previous wrong behavior, so I split them up into tests looking for UTC versus those looking for offsets
@@ -321,7 +339,7 @@ int cmp_npy_datetimestruct(const npy_datetimestruct *a, | |||
* Returns -1 on error, 0 on success, and 1 (with no error set) | |||
* if obj doesn't have the needed date or datetime attributes. | |||
*/ | |||
int convert_pydatetime_to_datetimestruct(PyDateTime_Date *dtobj, | |||
int convert_pydatetime_to_datetimestruct(PyDateTime_DateTime *dtobj, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Types are pretty loosely checked in this module; changed these to a more exact date time for clarity
@@ -314,6 +314,7 @@ I/O | |||
- Bug in :func:`read_hdf` closing stores that it didn't open when Exceptions are raised (:issue:`28699`) | |||
- Bug in :meth:`DataFrame.read_json` where using ``orient="index"`` would not maintain the order (:issue:`28557`) | |||
- Bug in :meth:`DataFrame.to_html` where the length of the ``formatters`` argument was not verified (:issue:`28469`) | |||
- Bug in :meth:`DataFrame.to_json` where timezone-aware dates were converted to UTC (:issue:`12997`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this specify that it is only fixed-offset timezones that are now supported? any other names e.g. pd.to_json to mention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't know what fixed versus (assumedly) non-fixed offsets are - can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"UTC-8" is a fixed offset. "US/Pacific" (or colloquially-but-inaccurately "PST/PDT") is not a fixed offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I don't think that distinction matters when writing then (?). At least in tests the construction goes something like pd.date_range(..., tz="US/Eastern")
and it writes out in ISO format
That may be a problem when reading. There needs to be a follow up on the reading side to support this as this now does the following when using "+0000" instead of "Z" as the time zone designator
In [5]: df = pd.DataFrame(pd.date_range("2013-01-01", periods=2, tz="US/Eastern"), columns=['date'])
In [6]: pd.read_json(df.to_json(date_format="iso")).dtypes
Out[6]:
date datetime64[ns, pytz.FixedOffset(-300)]
dtype: object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah never mind...I get what you are saying now. I guess the ISO format is ambiguous as to what timezone it is, so sure fixed only. Can update that on next iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess if youve specified date_format="iso" then the user knows what they're getting into. its just annoying that we dont have a nice way to round-trip read/write non-fixed-offset timezones
are these changes mostly copy/pasted from numpy? IIRC we used to use copy/pasted versions for some of these but i edited them to remove unused or always-0 args, woops. |
"was too short, with length %d", | ||
outlen); | ||
"The string provided for NumPy ISO datetime formatting " | ||
"was too short, with length %"NPY_INTP_FMT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be a space or percent or something after the close-quote?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the leading spaces you deleted here, were those to make a linter happy? if not, they look wierd to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would have been a straight copy / paste from NumPy. I did use clang-format for objToJSON but not the vendored items
static int | ||
convert_datetimestruct_utc_to_local(npy_datetimestruct *out_dts_local, | ||
const npy_datetimestruct *dts_utc, int *out_timezone_offset) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont already have something like this available? or we only have it in cython?
rawtime += dts_utc->min * 60; | ||
|
||
/* localtime converts a 'time_t' into a local 'struct tm' */ | ||
if (get_localtime(&rawtime, &tm_) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm almost sure i removed get_localtime a while back. can we do without this check? then we could get rid of get_localtime and the ugly windows shim that goes with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. This code does seem to segfault on local or ambiguous times that predate 1970 atm so maybe related
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. if you find an easy way to avoid the need that'd be neat, but not a priority
Closing for now as I haven't been actively working this and want to keep the queue down, but plan on coming back to it one day |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This vendors updates from numpy that allow for tz-aware ISO date formatting. Note this does slightly change the behavior of UTC dates. Previously they would write out as
2013-01-01T05:00:00.000Z
but now are2013-01-01T05:00:00.000+00:00
. Both are valid ISO 8601There is a follow up that needs to be addressed with reading these dates