-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Add Timedelta Support to JSON Reader with orient=table (#21140) #21827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change - one comment to address
if 'timedelta64' in dtypes.values(): | ||
raise NotImplementedError('table="orient" can not yet read ' | ||
'ISO-formatted Timedelta data') | ||
for col, dtype in dtypes.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this block necessary? Assumed the subsequent astype
call would take care of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, support for iso-format timedeltas should be added to astype
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No just go with what jreback said for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, are you sure you need to do this? I believe .astype(dtypes)
should handle this, and if not we should fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now
delta = pd.Timedelta(1e9).isoformat()
pd.DataFrame([delta]).astype('timedelta64[ns]')
results in error.
Codecov Report
@@ Coverage Diff @@
## master #21827 +/- ##
==========================================
+ Coverage 92.05% 92.05% +<.01%
==========================================
Files 170 170
Lines 50708 50710 +2
==========================================
+ Hits 46677 46679 +2
Misses 4031 4031
Continue to review full report at Codecov.
|
pandas/io/json/table_schema.py
Outdated
'ISO-formatted Timedelta data') | ||
for col, dtype in dtypes.items(): | ||
if dtype == 'timedelta64[ns]': | ||
df[col] = df[col].apply(Timedelta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you would not convert this way, use pd.to_timedelta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can pd.to_timedelta
parse iso-formated timedeltas now? Should support for this format be added to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to change this now
@@ -306,10 +306,9 @@ def parse_table_schema(json, precise_float): | |||
raise NotImplementedError('table="orient" can not yet read timezone ' | |||
'data') | |||
|
|||
# No ISO constructor for Timedelta as of yet, so need to raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this not hit in tests prior to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have been covered by test_read_json_table_orient_raises
through parametrization
pandas/_libs/tslibs/timedeltas.pyx
Outdated
@@ -183,7 +183,10 @@ cpdef convert_to_timedelta64(object ts, object unit): | |||
ts = cast_from_unit(ts, unit) | |||
ts = np.timedelta64(ts) | |||
elif is_string_object(ts): | |||
ts = np.timedelta64(parse_timedelta_string(ts)) | |||
if len(ts) > 0 and ts[0] == 'P': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm what problem are you trying to solve with this addition? ISO format support should have been added back in #19191 as a precursor to this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pd.to_timedelta
was not converting iso-formated strings
In pd.Timedelta
constructor parse_iso_format_string
function is used:
if len(value) > 0 and value[0] == 'P':
value = parse_iso_format_string(value)
else:
value = parse_timedelta_string(value)
while in pd.to_timedelta
array_to_timedelta64
is used. Which is not using parse_iso_format_string
, actually there're only two occurrences of this function in timedeltas.pyx
, its definition and inside pd.Timedelta
.
Also, why have parse_iso_format_string
and parse_timedelta_string
simultaneously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK I see. If that's the case then it would be better to do a separate PR as a pre-cursor to this one. That way we can communicate the enhancement and ensure appropriate testing.
Since you started on that, do you want to open an issue about pd.to_timedelta
not accepting the ISO duration and send a separate PR to fix accordingly? Can come back to this one after the fact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll open an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened an issue #21877
pandas/io/json/table_schema.py
Outdated
@@ -308,7 +308,7 @@ def parse_table_schema(json, precise_float): | |||
|
|||
for col, dtype in dtypes.items(): | |||
if dtype == 'timedelta64[ns]': | |||
df[col] = df[col].apply(Timedelta) | |||
df[col] = to_timedelta(df[col]) | |||
|
|||
df = df.astype(dtypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any conflict between this and the loop above? Wondering if we should be removing any timedelta calls from the dtypes
dict since they get accounted for in the loop
pandas/io/json/table_schema.py
Outdated
'ISO-formatted Timedelta data') | ||
for col, dtype in dtypes.items(): | ||
if dtype == 'timedelta64[ns]': | ||
df[col] = df[col].apply(Timedelta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to change this now
pandas/io/json/table_schema.py
Outdated
@@ -6,7 +6,7 @@ | |||
import warnings | |||
|
|||
import pandas._libs.json as json | |||
from pandas import DataFrame | |||
from pandas import DataFrame, Timedelta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import to_timedelta
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I've pushed this accidentally
Hello, what is wrong with Travis test? Is it a problem on the testing side or on mine? |
Looks like something on the Travis side in this particular instance - @TomAugspurger can you help restart that particular job? |
restarted.
…On Tue, Jul 24, 2018 at 10:23 AM William Ayd ***@***.***> wrote:
Looks like something on the Travis side in this particular instance -
@TomAugspurger <https://github.com/TomAugspurger> can you help restart
that particular job?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21827 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIpHh9lD3neA7Yf-jvKy2AexGTWt8ks5uJzwIgaJpZM4VHm9i>
.
|
@WillAyd is there anything else that I should change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one minor fix up but otherwise lgtm
if 'timedelta64' in dtypes.values(): | ||
raise NotImplementedError('table="orient" can not yet read ' | ||
'ISO-formatted Timedelta data') | ||
for col, dtype in dtypes.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, are you sure you need to do this? I believe .astype(dtypes)
should handle this, and if not we should fix that.
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -121,6 +121,7 @@ Other Enhancements | |||
- :class:`IntervalIndex` has gained the :meth:`~IntervalIndex.set_closed` method to change the existing ``closed`` value (:issue:`21670`) | |||
- :func:`~DataFrame.to_csv` and :func:`~DataFrame.to_json` now support ``compression='infer'`` to infer compression based on filename (:issue:`15008`) | |||
- :func:`to_timedelta` now supports iso-formated timedelta strings (:issue:`21877`) | |||
- :func:`read_json` now parse timedelta with `orient='table'` (:issue:`21140`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have missed this on previous comment but there's a small typo here. parse
-> parses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd I've pushed a fix, but now there is a merge conflict. How should I deal with it on github?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to deal locally and re-push. Assuming you have things set up as mentioned in the pandas contributing guide do this on your local branch
git fetch upstream
git merge upstream/master
You'll probably get a merge conflict there, so fix that up and do:
git merge --continue
Everything should be resolved then so re-push after that
Hello, @jreback |
i believe we handle astype for date times by calling to_datetme we should do the same for timedelta (which then would make this automatically work) |
For now such conversions throw error >>> s = pd.Series(['0:0:1'])
>>> s.astype('timedelta64[ns]')
WTF timedelta64[ns]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sergey/projects/pandas/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/home/sergey/projects/pandas/pandas/core/generic.py", line 5149, in astype
**kwargs)
File "/home/sergey/projects/pandas/pandas/core/internals/managers.py", line 555, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "/home/sergey/projects/pandas/pandas/core/internals/managers.py", line 422, in apply
applied = getattr(b, f)(**kwargs)
File "/home/sergey/projects/pandas/pandas/core/internals/blocks.py", line 564, in astype
**kwargs)
File "/home/sergey/projects/pandas/pandas/core/internals/blocks.py", line 655, in _astype
values = astype_nansafe(values.ravel(), dtype, copy=True)
File "/home/sergey/projects/pandas/pandas/core/dtypes/cast.py", line 717, in astype_nansafe
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
File "pandas/_libs/lib.pyx", line 455, in pandas._libs.lib.astype_intsafe
util.set_value_at_unsafe(result, i, v)
File "pandas/_libs/src/util.pxd", line 144, in util.set_value_at_unsafe
ValueError: Could not convert object to NumPy timedelta I think the problem is with: pandas/pandas/core/dtypes/cast.py Lines 712 to 723 in 114f415
We're not reaching the line 721 as np.issubdtype(np.timedelta64, np.integer) is True .Should I create a new pull request? |
@fjdiod if you are interested it would be preferable to have a new issue and PR, similar to what you just did for the |
can you merge master and update |
can you merge master and update |
Closing as stale though would be great to have this. Please ping if you'd like to pick this back up |
git diff upstream/master -u -- "*.py" | flake8 --diff