Skip to content

read_json support for orient="table" #19039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 6, 2018

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Jan 2, 2018

This is a starting point for read_json supporting orient='table' for basic types. I'm explicitly raising in the instances that don't work, namely timezone aware datetimes andTimedelta. I marked floats with no decimal values as an xfail (these are being converted to int64 by the _try_convert_data method).

Timedelta would be best fixed in a separate change providing a constructor equivalent to the isoformat method (see #19040).

The new tests could also all be parametrized, but I've kept as is before going too far with this change in case of feedback

@@ -103,6 +110,28 @@ def make_field(arr, dtype=None):
return field


def revert_field(field):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a docstring

def parse_table_schema(json, precise_float):
"""
Builds a DataFrame from a given schema
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expand this docstring to discuss parameters and return value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, document the NotImplementedError cases.


@pytest.mark.parametrize("index_names", [[None, None], ['foo', 'bar']])
def test_multiindex(self, index_names):
# GH 18912
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you adding the issue reference here only?

Copy link
Member Author

@WillAyd WillAyd Jan 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only because the issue was where I pulled the code sample from. I was hesitant to include in the first place because the issue ends up touching on something larger than originally intended, namely that there is no read_json support for orient="table". Would you rather remove or do you think it's worth moving that reference to all of the test cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you said, given that your PR adds an entire new class of tests, I actually think what you've done is fine (since the issue directly points to the test). I was looking for some clarification.

@gfyoung gfyoung added IO JSON read_json, to_json, json_normalize Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 2, 2018
@@ -839,6 +839,9 @@ def _parse_no_numpy(self):
elif orient == "index":
self.obj = DataFrame(
loads(json, precise_float=self.precise_float), dtype=None).T
elif orient == 'table':
Copy link
Member

@gfyoung gfyoung Jan 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to docstring that supported was added for this option in 0.23.0.

@WillAyd WillAyd force-pushed the read-table-orient branch from 4319335 to 00a44b1 Compare January 3, 2018 06:42
@codecov
Copy link

codecov bot commented Jan 3, 2018

Codecov Report

Merging #19039 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19039      +/-   ##
==========================================
+ Coverage   91.53%   91.54%   +<.01%     
==========================================
  Files         148      148              
  Lines       48688    48729      +41     
==========================================
+ Hits        44566    44607      +41     
  Misses       4122     4122
Flag Coverage Δ
#multiple 89.91% <100%> (ø) ⬆️
#single 41.6% <17.02%> (-0.03%) ⬇️
Impacted Files Coverage Δ
pandas/io/json/table_schema.py 98.19% <100%> (+0.97%) ⬆️
pandas/io/json/json.py 92.6% <100%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2d8db1...36cab57. Read the comment docs.

@@ -103,6 +108,40 @@ def make_field(arr, dtype=None):
return field


def revert_field(field):
'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use triple-double quotes rather than single

@@ -103,6 +108,40 @@ def make_field(arr, dtype=None):
return field


def revert_field(field):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename make_field and this to something more descriptive. make
convert_json_typename_to_pandas (and the reverse)

Returns
-------
dtype
'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an example of tz and categorical here


class TestTableOrientReader(object):

def test_integer(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah would be best to paramaterize these tests

@WillAyd WillAyd force-pushed the read-table-orient branch from 00a44b1 to 81f1bd2 Compare January 3, 2018 14:17
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments, otherwise lgtm.

data = [1., 2., 3.]
kinds = [pd.Series(pd.to_datetime(data), name='values'),
pd.to_datetime(data)]
for kind in kinds:
result = make_field(kind)
result = convert_pandas_type_to_json_field(kind)
expected = {"name": "values", "type": 'datetime'}
assert result == expected

kinds = [pd.Series(pd.to_datetime(data, utc=True), name='values'),
pd.to_datetime(data, utc=True)]
for kind in kinds:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should come back later and parametrize these tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #19070 to handle this


class TestTableOrientReader(object):

@pytest.mark.parametrize("inp,exp", [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these first 2 can just be functions, don't need to be in this class (should put next to the tests for the convert_pandas_type_to_json)

@jreback
Copy link
Contributor

jreback commented Jan 4, 2018

cc @TomAugspurger

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good on a quick skim @WillAyd, thanks.

@@ -145,6 +145,7 @@ Other Enhancements
- ``Resampler`` objects now have a functioning :attr:`~pandas.core.resample.Resampler.pipe` method.
Previously, calls to ``pipe`` were diverted to the ``mean`` method (:issue:`17905`).
- :func:`~pandas.api.types.is_scalar` now returns ``True`` for ``DateOffset`` objects (:issue:`18943`).
- :func:`read_json` now supports ``table`` as a value to the ``orient`` argument (:issue:`18912`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make this into a full example here if you want. People have wanted a roundtrippable JSON format for a while.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll add that and re-push

@WillAyd WillAyd force-pushed the read-table-orient branch 2 times, most recently from 326ba31 to 32ae80c Compare January 5, 2018 00:02
}, index=pd.Index(range(4), name='idx'))
df

Previous Behavior:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't use this as a previous example, yes its the same 'idea'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What don't you like about it as an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the Previous Behavior at all, just show the orient='table' part. you can say that we currently can;t round trip but you don't really need to show this I think.


.. code-block:: ipython

In [29]: df.to_json("test.json", orient='table')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be an ipython block

1 b 2018-01-02 2 b
2 c 2018-01-03 3 c
3 d 2018-01-04 4 c

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show the df dtypes as well

2 c 2018-01-03 3 c
3 d 2018-01-04 4 c

Please note that the string `index` is not supported with the round trip format, as it is used by default in ``write_json`` to indicate a missing index name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be in the doc-string / io.rst doc section as well (and add this example there too)

@WillAyd WillAyd force-pushed the read-table-orient branch from 32ae80c to 071f26c Compare January 5, 2018 19:34
@jreback jreback merged commit e3251da into pandas-dev:master Jan 6, 2018
@jreback
Copy link
Contributor

jreback commented Jan 6, 2018

thanks @WillAyd very nice! keep em coming!

@WillAyd WillAyd deleted the read-table-orient branch January 6, 2018 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_json can't import own exported data with orient=table Roundtrip orient for JSON
4 participants