Skip to content

BUG:#29928 Fix to_json output 'table' orient for single level MultiIndex. #34375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

LucasG0
Copy link
Contributor

@LucasG0 LucasG0 commented May 25, 2020

dataframe.to_json() was writing incorrect index field name, so applying read_json resulted in NaN index values.
dataframe.to_json() now converts single level MultiIndex into single Index before encoding.

@@ -286,6 +286,9 @@ def __init__(
)
raise ValueError(msg)

if obj.index.nlevels == 1 and isinstance(obj.index, MultiIndex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this in build_table_schema instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@jreback jreback added the IO JSON read_json, to_json, json_normalize label May 26, 2020
@jreback
Copy link
Contributor

jreback commented May 26, 2020

is there an associated issue?

@LucasG0
Copy link
Contributor Author

LucasG0 commented May 26, 2020

is there an associated issue?

Yes, #29928

@@ -230,6 +230,10 @@ def build_table_schema(data, index=True, primary_key=None, version=True):
'pandas_version': '0.20.0',
'primaryKey': ['idx']}
"""

if data.index.nlevels == 1 and isinstance(data.index, MultiIndex):
data.index = data.index.get_level_values(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is data copied somewhere or is this mutating the frame that the user supplies?

Copy link
Contributor Author

@LucasG0 LucasG0 May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data is indeed modified, thanks.
The simplest solution is to make a copy of data in this particular case.
However, a copy of data is made after the call of build_table_schema.
If we definitely want to avoid the copy, it is possible to make it by adding modifications in build_table_schema, and replacing index in constructor of JSONTableWriter once the copy is created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try that for now. We usually want to avoid extra copies

@@ -435,6 +435,23 @@ def test_to_json_categorical_index(self):

assert result == expected

@pytest.mark.parametrize("name", [None, "foo"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you replicate all of the tests in the OP (there are 4 cases)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I provided tests for cases 1 and 2 in the OP.
Case 3 is the standard behavior of orient='table' on MultiIndex with several levels, which is tested in test_read_json_table_orient.
Case 4 is a kind of workaround for single level MultiIndex.

)
result = df.to_json(orient="table")

assert result == expected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these fully round-trip? can you test that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should, I will complete tests.

@@ -951,9 +951,12 @@ I/O
- :func:`pandas.read_hdf` has a more explicit error message when loading an
unsupported HDF file (:issue:`9539`)
- Bug in :meth:`~DataFrame.read_feather` was raising an `ArrowIOError` when reading an s3 or http file path (:issue:`29055`)
- Bug in :meth:`read_parquet` was raising a ``FileNotFoundError`` when passed an s3 directory path. (:issue:`26388`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think mean to include this line or the next

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In deed, I will remove it.

@LucasG0 LucasG0 force-pushed the single_level branch 2 times, most recently from 98df8b7 to 689d50c Compare June 8, 2020 23:54
@@ -956,6 +956,7 @@ I/O
- Bug in :meth:`~DataFrame.to_excel` could not handle the column name `render` and was raising an ``KeyError`` (:issue:`34331`)
- Bug in :meth:`~SQLDatabase.execute` was raising a ``ProgrammingError`` for some DB-API drivers when the SQL statement contained the `%` character and no parameters were present (:issue:`34211`)
- Bug in :meth:`~pandas.io.stata.StataReader` which resulted in categorical variables with difference dtypes when reading data using an iterator. (:issue:`31544`)
- Bug in :meth:`~DataFrame.to_json` with 'table' orient was writting wrong index field name for MultiIndex Dataframe with a single level. (:issue:`29928`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writting -> writing

@LucasG0
Copy link
Contributor Author

LucasG0 commented Jun 27, 2020

Considering I have no news on this PR, is it suitable, or is there anything I can do to improve it ?

@WillAyd
Copy link
Member

WillAyd commented Sep 10, 2020

@LucasG0 can you fix merge conflicts and move note to 1.2? Someone will take a look again after that

@LucasG0
Copy link
Contributor Author

LucasG0 commented Sep 10, 2020

@WillAyd Done, I can restore test for timedeltas if needed.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm though I think should merge master and repush to fix CI. @jreback

@@ -284,6 +284,7 @@ MultiIndex
^^^^^^^^^^

- Bug in :meth:`DataFrame.xs` when used with :class:`IndexSlice` raises ``TypeError`` with message `Expected label or tuple of labels` (:issue:`35301`)
- Bug in :meth:`~DataFrame.to_json` with 'table' orient was writting wrong index field name for MultiIndex Dataframe with a single level (:issue:`29928`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually can you move this to the I/O section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done

@LucasG0 LucasG0 force-pushed the single_level branch 2 times, most recently from d8bbabd to 5d43564 Compare September 24, 2020 22:52
@WillAyd
Copy link
Member

WillAyd commented Sep 25, 2020

lgtm @jreback

@LucasG0 LucasG0 force-pushed the single_level branch 2 times, most recently from cbbe8de to 350be02 Compare September 25, 2020 08:18
…e level MultiIndex.

Index field name in written json was incorrect, so applying read_json resulted in NaN index values. Dataframe to_json with 'table' orient now treats single level MultiIndex like single Index.
@LucasG0
Copy link
Contributor Author

LucasG0 commented Nov 7, 2020

@jreback maybe you have an opinion on this ? :)

@LucasG0
Copy link
Contributor Author

LucasG0 commented Nov 8, 2020

Thanks I will check it.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Dec 15, 2020
@jreback
Copy link
Contributor

jreback commented Feb 11, 2021

closing as stale. if you want to continue, pls ping and can re-open.

@jreback jreback closed this Feb 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize MultiIndex Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using to_json/read_json with orient='table' on a DataFrame with a single level MultiIndex does not work
3 participants