Skip to content

ENH: Add Timedelta Support to JSON Reader with orient=table (#21140) #21827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

fjdiod
Copy link
Contributor

@fjdiod fjdiod commented Jul 9, 2018

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change - one comment to address

if 'timedelta64' in dtypes.values():
raise NotImplementedError('table="orient" can not yet read '
'ISO-formatted Timedelta data')
for col, dtype in dtypes.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this block necessary? Assumed the subsequent astype call would take care of this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, support for iso-format timedeltas should be added to astype function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No just go with what jreback said for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, are you sure you need to do this? I believe .astype(dtypes) should handle this, and if not we should fix that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now

delta = pd.Timedelta(1e9).isoformat()
pd.DataFrame([delta]).astype('timedelta64[ns]')

results in error.

@WillAyd WillAyd added IO JSON read_json, to_json, json_normalize Timedelta Timedelta data type labels Jul 9, 2018
@codecov
Copy link

codecov bot commented Jul 9, 2018

Codecov Report

Merging #21827 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21827      +/-   ##
==========================================
+ Coverage   92.05%   92.05%   +<.01%     
==========================================
  Files         170      170              
  Lines       50708    50710       +2     
==========================================
+ Hits        46677    46679       +2     
  Misses       4031     4031
Flag Coverage Δ
#multiple 90.45% <100%> (ø) ⬆️
#single 42.36% <20%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/io/json/table_schema.py 98.31% <100%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be6ad72...0f28fd0. Read the comment docs.

'ISO-formatted Timedelta data')
for col, dtype in dtypes.items():
if dtype == 'timedelta64[ns]':
df[col] = df[col].apply(Timedelta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you would not convert this way, use pd.to_timedelta

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can pd.to_timedelta parse iso-formated timedeltas now? Should support for this format be added to this function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to change this now

@@ -306,10 +306,9 @@ def parse_table_schema(json, precise_float):
raise NotImplementedError('table="orient" can not yet read timezone '
'data')

# No ISO constructor for Timedelta as of yet, so need to raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this not hit in tests prior to this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have been covered by test_read_json_table_orient_raises through parametrization

@@ -183,7 +183,10 @@ cpdef convert_to_timedelta64(object ts, object unit):
ts = cast_from_unit(ts, unit)
ts = np.timedelta64(ts)
elif is_string_object(ts):
ts = np.timedelta64(parse_timedelta_string(ts))
if len(ts) > 0 and ts[0] == 'P':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what problem are you trying to solve with this addition? ISO format support should have been added back in #19191 as a precursor to this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pd.to_timedelta was not converting iso-formated strings
In pd.Timedelta constructor parse_iso_format_string function is used:

if len(value) > 0 and value[0] == 'P':
    value = parse_iso_format_string(value)
else:
    value = parse_timedelta_string(value)

while in pd.to_timedelta array_to_timedelta64 is used. Which is not using parse_iso_format_string, actually there're only two occurrences of this function in timedeltas.pyx, its definition and inside pd.Timedelta.
Also, why have parse_iso_format_string and parse_timedelta_string simultaneously?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK I see. If that's the case then it would be better to do a separate PR as a pre-cursor to this one. That way we can communicate the enhancement and ensure appropriate testing.

Since you started on that, do you want to open an issue about pd.to_timedelta not accepting the ISO duration and send a separate PR to fix accordingly? Can come back to this one after the fact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll open an issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened an issue #21877

@@ -308,7 +308,7 @@ def parse_table_schema(json, precise_float):

for col, dtype in dtypes.items():
if dtype == 'timedelta64[ns]':
df[col] = df[col].apply(Timedelta)
df[col] = to_timedelta(df[col])

df = df.astype(dtypes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any conflict between this and the loop above? Wondering if we should be removing any timedelta calls from the dtypes dict since they get accounted for in the loop

'ISO-formatted Timedelta data')
for col, dtype in dtypes.items():
if dtype == 'timedelta64[ns]':
df[col] = df[col].apply(Timedelta)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to change this now

@@ -6,7 +6,7 @@
import warnings

import pandas._libs.json as json
from pandas import DataFrame
from pandas import DataFrame, Timedelta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import to_timedelta instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I've pushed this accidentally

fjdiod added 2 commits July 20, 2018 21:25
…ev#21140)

add iso-format support to to_timedelta

Revert "add iso-format support to to_timedelta"

This reverts commit 3f5f176.
@fjdiod
Copy link
Contributor Author

fjdiod commented Jul 24, 2018

Hello, what is wrong with Travis test? Is it a problem on the testing side or on mine?

@WillAyd
Copy link
Member

WillAyd commented Jul 24, 2018

Looks like something on the Travis side in this particular instance - @TomAugspurger can you help restart that particular job?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 24, 2018 via email

@fjdiod
Copy link
Contributor Author

fjdiod commented Jul 25, 2018

@WillAyd is there anything else that I should change?

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one minor fix up but otherwise lgtm

if 'timedelta64' in dtypes.values():
raise NotImplementedError('table="orient" can not yet read '
'ISO-formatted Timedelta data')
for col, dtype in dtypes.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, are you sure you need to do this? I believe .astype(dtypes) should handle this, and if not we should fix that.

@@ -121,6 +121,7 @@ Other Enhancements
- :class:`IntervalIndex` has gained the :meth:`~IntervalIndex.set_closed` method to change the existing ``closed`` value (:issue:`21670`)
- :func:`~DataFrame.to_csv` and :func:`~DataFrame.to_json` now support ``compression='infer'`` to infer compression based on filename (:issue:`15008`)
- :func:`to_timedelta` now supports iso-formated timedelta strings (:issue:`21877`)
- :func:`read_json` now parse timedelta with `orient='table'` (:issue:`21140`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed this on previous comment but there's a small typo here. parse -> parses

Copy link
Contributor Author

@fjdiod fjdiod Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd I've pushed a fix, but now there is a merge conflict. How should I deal with it on github?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to deal locally and re-push. Assuming you have things set up as mentioned in the pandas contributing guide do this on your local branch

git fetch upstream
git merge upstream/master

You'll probably get a merge conflict there, so fix that up and do:

git merge --continue

Everything should be resolved then so re-push after that

@fjdiod
Copy link
Contributor Author

fjdiod commented Jul 27, 2018

Hello, @jreback
How should I proceed with astype? Should support for iso-formatted timedeltas be added there?
Now astype could not parse timedeltas and datetimes from strings. So, wouldn't it be inconsistent to add support for iso-format?

@jreback
Copy link
Contributor

jreback commented Jul 27, 2018

i believe we handle astype for date times by calling to_datetme we should do the same for timedelta (which then would make this automatically work)

@fjdiod
Copy link
Contributor Author

fjdiod commented Jul 27, 2018

For now such conversions throw error

>>> s = pd.Series(['0:0:1'])
>>> s.astype('timedelta64[ns]')
WTF timedelta64[ns]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sergey/projects/pandas/pandas/util/_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
  File "/home/sergey/projects/pandas/pandas/core/generic.py", line 5149, in astype
    **kwargs)
  File "/home/sergey/projects/pandas/pandas/core/internals/managers.py", line 555, in astype
    return self.apply('astype', dtype=dtype, **kwargs)
  File "/home/sergey/projects/pandas/pandas/core/internals/managers.py", line 422, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/sergey/projects/pandas/pandas/core/internals/blocks.py", line 564, in astype
    **kwargs)
  File "/home/sergey/projects/pandas/pandas/core/internals/blocks.py", line 655, in _astype
    values = astype_nansafe(values.ravel(), dtype, copy=True)
  File "/home/sergey/projects/pandas/pandas/core/dtypes/cast.py", line 717, in astype_nansafe
    return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
  File "pandas/_libs/lib.pyx", line 455, in pandas._libs.lib.astype_intsafe
    util.set_value_at_unsafe(result, i, v)
  File "pandas/_libs/src/util.pxd", line 144, in util.set_value_at_unsafe
ValueError: Could not convert object to NumPy timedelta

I think the problem is with:

elif is_object_dtype(arr):
# work around NumPy brokenness, #1987
if np.issubdtype(dtype.type, np.integer):
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
# if we have a datetime/timedelta array of objects
# then coerce to a proper dtype and recall astype_nansafe
elif is_datetime64_dtype(dtype):
from pandas import to_datetime
return astype_nansafe(to_datetime(arr).values, dtype, copy=copy)

We're not reaching the line 721 as np.issubdtype(np.timedelta64, np.integer) is True.
Should I create a new pull request?

@WillAyd
Copy link
Member

WillAyd commented Jul 27, 2018

@fjdiod if you are interested it would be preferable to have a new issue and PR, similar to what you just did for the to_timedelta change. Sorry for all of the pre-cursors!

@fjdiod
Copy link
Contributor Author

fjdiod commented Jul 28, 2018

Hello, @WillAyd. I've created an issue and a pull request #22100 #22107

@jreback
Copy link
Contributor

jreback commented Oct 11, 2018

can you merge master and update

@jreback
Copy link
Contributor

jreback commented Nov 1, 2018

can you merge master and update

@WillAyd
Copy link
Member

WillAyd commented Nov 23, 2018

Closing as stale though would be great to have this. Please ping if you'd like to pick this back up

@WillAyd WillAyd closed this Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Timedelta Support to JSON Reader with orient="table"
4 participants