-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Add support for date_unit="D" in read_json and to_json #51000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
535336a
to
ba87eaa
Compare
@@ -1063,7 +1066,9 @@ def __init__( | |||
self.dtype = dtype | |||
|
|||
if date_unit is not None: | |||
date_unit = date_unit.lower() | |||
# avoid lowercasing "D" but ensure retrocompatibility for other units | |||
if date_unit != "D": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just put a lowercase D in _STAMP_UNITS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep the exact same format passed to the underlying function
Does this work correctly when calling this during a leap year? |
What do you mean? This threshold is used only when detecting if a number is a suitable value for decoding a timestamp with date unit "D". |
Not an expert in this area but I'm also a little unsure of how accurate this would be. The code seems to assume there is always 24 hours in a day which isn't true when you think about things like DST Do you know how numpy internally manages going from nanoseconds to day precision? |
I guess DST might not apply here since we are talking about UTC, but yea still not sure about taking this on directly the JSON code. @jbrockmendel might have some insights since he has been working on non-ns array support |
I tried some manipulations and it seems to truncate the time part, at least from my experiments: In [105]: a = np.array(["2023-03-25T00:23:45.248755"], dtype="datetime64[ns]")
In [106]: b = a + np.array([np.timedelta64(i, 'h') for i in range(30)])
In [107]: b.astype('datetime64[D]')
Out[107]:
array(['2023-03-25', '2023-03-25', '2023-03-25', '2023-03-25',
'2023-03-25', '2023-03-25', '2023-03-25', '2023-03-25',
'2023-03-25', '2023-03-25', '2023-03-25', '2023-03-25',
'2023-03-25', '2023-03-25', '2023-03-25', '2023-03-25',
'2023-03-25', '2023-03-25', '2023-03-25', '2023-03-25',
'2023-03-25', '2023-03-25', '2023-03-25', '2023-03-25',
'2023-03-26', '2023-03-26', '2023-03-26', '2023-03-26',
'2023-03-26', '2023-03-26'], dtype='datetime64[D]')
In [108]: b - a
Out[108]:
array([ 0, 3600000000000, 7200000000000, 10800000000000,
14400000000000, 18000000000000, 21600000000000, 25200000000000,
28800000000000, 32400000000000, 36000000000000, 39600000000000,
43200000000000, 46800000000000, 50400000000000, 54000000000000,
57600000000000, 61200000000000, 64800000000000, 68400000000000,
72000000000000, 75600000000000, 79200000000000, 82800000000000,
86400000000000, 90000000000000, 93600000000000, 97200000000000,
100800000000000, 104400000000000], dtype='timedelta64[ns]')
In [109]: (b-a).astype('timedelta64[D]')
Out[109]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1], dtype='timedelta64[D]') I've chosen |
division/multiplication. numpy does everything with timezone-naive datetimes. |
Hi, is there any interest in merging this PR? If so, I can resolve the conflicts with the main branch |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
I didn't update this on purpose, since I got no feedback and had no time to keep it updated. As I said before, if there is any interest in this feature, I'll update it |
I would have used this feature if it were available. My use case involves regulatory data that applies to a day of the year (no regard for timezone). Having an output option that does not include the time of day would significantly reduce the payload size. As previously discussed, this would still be an ISO 8601 conformant date format. I'm subscribed to this issue and happy to provide additional information/support if it will help. |
Hi @mousedownmike! The feature in this PR was working at the time I created the PR, I was waiting for the review by some maintainer, but it was developed on pandas 1.5.x, so this probably needs some work to adapt all the code to the new major. I'll probably look into it, just I don't know when, since in this period I'm quite busy, if you have some spare time feel free to work on it! |
@mroeschke I've updated the code, can you please reopen the PR? |
@mroeschke why almost all the checks were cancelled? |
Could you merge main and push to this branch again? |
How do we see this aligning with the efforts in #53757 (review) ? I think we are developing two different ways to deal with various levels of precision. I'm still inclined to go with the other approach as it is more native to the type system, but maybe that doesn't include all of the work here |
It makes sense, since this PR was opened before pandas 2, when things were slightly different. Said so, I read a bit the linked PR and the issue, and I agree it doesn't include all the support for Maybe it may make sense to either close this PR and have a fresh start when the other is merged, or rebase this onto the one you provided? |
Looks like #53757 has been merged. I agree it's probably best to have a fresh PR start relative to that PR so closing. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This PR extends the
date_unit
parameter of thepd.read_json
and theto_json
method of Series and DataFrame methods by adding another allowed value:"D"
.This value can be used to serialize and deserialize dates into
YYYY-MM-DD
format.In the deserialization, I chose 365 as min value for the D unit since all the minimum values represent a year.
The differences with the other supported units in deserialization are:
"s"
to avoid breaking retrocompatibility.I've already updated unit tests and the documentation. I wasn't sure of where this enhancement should be put in the whatsnew, so feel free to suggest me and I will update it.
All suggestions are welcome, please let me know what I can improve.