BUG:#29928 Fix to_json output 'table' orient for single level MultiIndex. #34375

LucasG0 · 2020-05-25T23:23:16Z

dataframe.to_json() was writing incorrect index field name, so applying read_json resulted in NaN index values.
dataframe.to_json() now converts single level MultiIndex into single Index before encoding.

closes Using to_json/read_json with orient='table' on a DataFrame with a single level MultiIndex does not work #29928
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jreback · 2020-05-26T13:27:17Z

pandas/io/json/_json.py

@@ -286,6 +286,9 @@ def __init__(
            )
            raise ValueError(msg)

+        if obj.index.nlevels == 1 and isinstance(obj.index, MultiIndex):


can you do this in build_table_schema instead

jreback · 2020-05-26T13:28:23Z

is there an associated issue?

LucasG0 · 2020-05-26T14:52:39Z

is there an associated issue?

Yes, #29928

WillAyd · 2020-05-27T04:38:30Z

pandas/io/json/_table_schema.py

@@ -230,6 +230,10 @@ def build_table_schema(data, index=True, primary_key=None, version=True):
    'pandas_version': '0.20.0',
    'primaryKey': ['idx']}
    """
+
+    if data.index.nlevels == 1 and isinstance(data.index, MultiIndex):
+        data.index = data.index.get_level_values(0)


Is data copied somewhere or is this mutating the frame that the user supplies?

data is indeed modified, thanks.
The simplest solution is to make a copy of data in this particular case.
However, a copy of data is made after the call of build_table_schema.
If we definitely want to avoid the copy, it is possible to make it by adding modifications in build_table_schema, and replacing index in constructor of JSONTableWriter once the copy is created.

Let's try that for now. We usually want to avoid extra copies

jreback · 2020-05-27T12:33:19Z

pandas/tests/io/json/test_json_table_schema.py

@@ -435,6 +435,23 @@ def test_to_json_categorical_index(self):

        assert result == expected

+    @pytest.mark.parametrize("name", [None, "foo"])


can you replicate all of the tests in the OP (there are 4 cases)?

I provided tests for cases 1 and 2 in the OP.
Case 3 is the standard behavior of orient='table' on MultiIndex with several levels, which is tested in test_read_json_table_orient.
Case 4 is a kind of workaround for single level MultiIndex.

pandas/tests/io/json/test_json_table_schema.py

jreback · 2020-05-27T12:34:28Z

pandas/tests/io/json/test_json_table_schema.py

+        )
+        result = df.to_json(orient="table")
+
+        assert result == expected


do these fully round-trip? can you test that

They should, I will complete tests.

pandas/io/json/_json.py

WillAyd · 2020-06-07T21:22:00Z

doc/source/whatsnew/v1.1.0.rst

@@ -951,9 +951,12 @@ I/O
 - :func:`pandas.read_hdf` has a more explicit error message when loading an
  unsupported HDF file (:issue:`9539`)
 - Bug in :meth:`~DataFrame.read_feather` was raising an `ArrowIOError` when reading an s3 or http file path (:issue:`29055`)
+- Bug in :meth:`read_parquet` was raising a ``FileNotFoundError`` when passed an s3 directory path. (:issue:`26388`)


I don’t think mean to include this line or the next

In deed, I will remove it.

jreback · 2020-06-14T15:05:29Z

doc/source/whatsnew/v1.1.0.rst

@@ -956,6 +956,7 @@ I/O
 - Bug in :meth:`~DataFrame.to_excel` could not handle the column name `render` and was raising an ``KeyError`` (:issue:`34331`)
 - Bug in :meth:`~SQLDatabase.execute` was raising a ``ProgrammingError`` for some DB-API drivers when the SQL statement contained the `%` character and no parameters were present (:issue:`34211`)
 - Bug in :meth:`~pandas.io.stata.StataReader` which resulted in categorical variables with difference dtypes when reading data using an iterator. (:issue:`31544`)
+- Bug in :meth:`~DataFrame.to_json` with 'table' orient was writting wrong index field name for MultiIndex Dataframe with a single level. (:issue:`29928`)


writting -> writing

pandas/io/json/_json.py

LucasG0 · 2020-06-27T10:29:10Z

Considering I have no news on this PR, is it suitable, or is there anything I can do to improve it ?

WillAyd · 2020-09-10T18:47:23Z

@LucasG0 can you fix merge conflicts and move note to 1.2? Someone will take a look again after that

LucasG0 · 2020-09-10T19:59:47Z

@WillAyd Done, I can restore test for timedeltas if needed.

WillAyd

lgtm though I think should merge master and repush to fix CI. @jreback

WillAyd · 2020-09-21T21:52:16Z

doc/source/whatsnew/v1.2.0.rst

@@ -284,6 +284,7 @@ MultiIndex
 ^^^^^^^^^^

 - Bug in :meth:`DataFrame.xs` when used with :class:`IndexSlice` raises ``TypeError`` with message `Expected label or tuple of labels` (:issue:`35301`)
+- Bug in :meth:`~DataFrame.to_json` with 'table' orient was writting wrong index field name for MultiIndex Dataframe with a single level (:issue:`29928`)


Actually can you move this to the I/O section?

WillAyd · 2020-09-25T02:34:07Z

lgtm @jreback

…e level MultiIndex. Index field name in written json was incorrect, so applying read_json resulted in NaN index values. Dataframe to_json with 'table' orient now treats single level MultiIndex like single Index.

LucasG0 · 2020-11-07T20:07:47Z

@jreback maybe you have an opinion on this ? :)

pandas/io/json/_json.py

LucasG0 · 2020-11-08T00:36:50Z

Thanks I will check it.

github-actions · 2020-12-15T00:14:40Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

jreback · 2021-02-11T01:41:36Z

closing as stale. if you want to continue, pls ping and can re-open.

jreback requested changes May 26, 2020

View reviewed changes

jreback added the IO JSON read_json, to_json, json_normalize label May 26, 2020

LucasG0 force-pushed the single_level branch from 3197472 to 499bf77 Compare May 26, 2020 14:49

WillAyd reviewed May 27, 2020

View reviewed changes

jreback requested changes May 27, 2020

View reviewed changes

LucasG0 force-pushed the single_level branch from 499bf77 to 23d5aa6 Compare May 27, 2020 18:11

jreback requested changes May 31, 2020

View reviewed changes

pandas/io/json/_json.py Show resolved Hide resolved

jreback added the MultiIndex label May 31, 2020

LucasG0 force-pushed the single_level branch from 2555ba4 to 7e7cb93 Compare June 3, 2020 17:54

WillAyd reviewed Jun 7, 2020

View reviewed changes

LucasG0 force-pushed the single_level branch 2 times, most recently from 98df8b7 to 689d50c Compare June 8, 2020 23:54

jreback requested changes Jun 14, 2020

View reviewed changes

LucasG0 force-pushed the single_level branch from b079e0e to d9ac451 Compare September 10, 2020 19:56

WillAyd approved these changes Sep 21, 2020

View reviewed changes

WillAyd requested changes Sep 21, 2020

View reviewed changes

LucasG0 force-pushed the single_level branch 2 times, most recently from d8bbabd to 5d43564 Compare September 24, 2020 22:52

LucasG0 force-pushed the single_level branch 2 times, most recently from cbbe8de to 350be02 Compare September 25, 2020 08:18

LucasG0 force-pushed the single_level branch from b27113c to 9707149 Compare November 7, 2020 10:35

LucasG0 force-pushed the single_level branch from 9707149 to e85dafe Compare November 7, 2020 16:48

jreback requested changes Nov 7, 2020

View reviewed changes

pandas/io/json/_json.py Show resolved Hide resolved

github-actions bot added the Stale label Dec 15, 2020

jreback closed this Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG:#29928 Fix to_json output 'table' orient for single level MultiIndex. #34375

BUG:#29928 Fix to_json output 'table' orient for single level MultiIndex. #34375

LucasG0 commented May 25, 2020 •

edited

Loading

jreback May 26, 2020

LucasG0 May 26, 2020

jreback commented May 26, 2020

LucasG0 commented May 26, 2020

WillAyd May 27, 2020

LucasG0 May 27, 2020 •

edited

Loading

WillAyd May 27, 2020

jreback May 27, 2020

LucasG0 May 27, 2020

jreback May 27, 2020

LucasG0 May 27, 2020

WillAyd Jun 7, 2020

LucasG0 Jun 8, 2020

jreback Jun 14, 2020

LucasG0 commented Jun 27, 2020

WillAyd commented Sep 10, 2020

LucasG0 commented Sep 10, 2020

WillAyd left a comment

WillAyd Sep 21, 2020

LucasG0 Sep 24, 2020

WillAyd commented Sep 25, 2020

LucasG0 commented Nov 7, 2020

LucasG0 commented Nov 8, 2020

github-actions bot commented Dec 15, 2020

jreback commented Feb 11, 2021

		@@ -435,6 +435,23 @@ def test_to_json_categorical_index(self):

		assert result == expected

		@pytest.mark.parametrize("name", [None, "foo"])

BUG:#29928 Fix to_json output 'table' orient for single level MultiIndex. #34375

BUG:#29928 Fix to_json output 'table' orient for single level MultiIndex. #34375

Conversation

LucasG0 commented May 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented May 26, 2020

LucasG0 commented May 26, 2020

Choose a reason for hiding this comment

LucasG0 May 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LucasG0 commented Jun 27, 2020

WillAyd commented Sep 10, 2020

LucasG0 commented Sep 10, 2020

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Sep 25, 2020

LucasG0 commented Nov 7, 2020

LucasG0 commented Nov 8, 2020

github-actions bot commented Dec 15, 2020

jreback commented Feb 11, 2021

LucasG0 commented May 25, 2020 •

edited

Loading

LucasG0 May 27, 2020 •

edited

Loading