Skip to content

Using to_json/read_json with orient='table' on a DataFrame with a single level MultiIndex does not work #29928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
larrymouse opened this issue Nov 29, 2019 · 4 comments
Labels
Bug IO JSON read_json, to_json, json_normalize MultiIndex

Comments

@larrymouse
Copy link

larrymouse commented Nov 29, 2019

Code Sample, a copy-pastable example if possible

import pandas

#Scenario1
df=pandas.DataFrame(data=[[1,1],[2,2],[3,3],[4,4]],columns=['A','B'], index=pandas.MultiIndex.from_tuples([(1,),(2,),(3,),(4,)]))
js=df.to_json()
pandas.read_json(js) #ok
js=df.to_json(orient='table')
new_df=pandas.read_json(js, orient='table') # this runs but creates NaN indexes

#Scenario2
df=pandas.DataFrame(data=[[1,1],[2,2],[3,3],[4,4]],columns=['A','B'], index=pandas.MultiIndex.from_tuples([(1,),(2,),(3,),(4,)], names=['ind1']))
js=df.to_json()
pandas.read_json(js) #ok
js=df.to_json(orient='table')
try:
    new_df==pandas.read_json(js, orient='table')  # this fails with KeyError: 'None of [None] are in the columns'
except KeyError as e:
    pass

#Scenario3
df=pandas.DataFrame(data=[[1,1],[2,2],[3,3],[4,4]],columns=['A','B'], index=pandas.MultiIndex.from_tuples([(1,1),(1,2),(2,1),(2,2)], names=['ind1', 'ind2']))
js=df.to_json()
pandas.read_json(js) #ok
js=df.to_json(orient='table')
new_df=pandas.read_json(js, orient='table')  # this works as is is multi level multiindex

#Scenario4
df=pandas.DataFrame(data=[[1,1],[2,2],[3,3],[4,4]],columns=['A','B'], index=pandas.MultiIndex.from_tuples([(1,),(2,),(3,),(4,)]))
df.index=df.index.to_flat_index() # a workaround?
js=df.to_json()
pandas.read_json(js) #ok
js=df.to_json(orient='table')
new_df=pandas.read_json(js, orient='table') # This is a workaround that does produce sensible results

Problem description

I have a DataFrame that has a MultiIndex, even though that index only has one level and could easily be a normal index, I am receiving this table from ipyaggrid, so am not in control of that.

Saving this table to json with df.to_json(orient='table) and then loading from pandas.read_json(js, orient='table) leads to either an exception or the index being changed to NaN values.

Neither of these outcomes seems desirable or correct.

I have a workaround to intercept DataFrames with this feature and convert their index so a flat Index, but it would be good if the table serialisation just worked.

I was drawn to orient='Table' by the comments on this similar issue, my main requirement here is to save/load the DataFrame preserving its row and column order. You don't get this with the standard to_json/from_json modes
#4889

Expected Output

new_df for scenario1 and scenario2 to should have a MultiIndex, in the first scenario with a level that has no name and in the second scenario with level called 'ind1'

Output of pd.show_versions()

pandas.show_versions()
INSTALLED VERSIONS

commit : None
python : 3.6.7.final.0
python-bits : 64
OS : Windows
OS-release : 7
machine : AMD64
processor : Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.0
numpy : 1.17.0
pytz : 2019.2
dateutil : 2.8.0
pip : 10.0.1
setuptools : 39.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None

@jbrockmendel jbrockmendel added the IO JSON read_json, to_json, json_normalize label Nov 30, 2019
@janheindejong
Copy link

I guess I have the same problem.

Creating two DFs, one with multiindex, and one without:

import pandas as pd
import numpy as np
from datetime import datetime 

index = [datetime(2000, 1, 1, 0, 0), datetime(2000, 1, 1, 0, 1), datetime(2000, 1, 1, 0, 2)]
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

tuples = list(zip(*arrays))
columns_mi = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
columns = ["A", "B", "C", "D", "E", "F", "G", "H"]

df = pd.DataFrame(data=np.random.randn(3, 8), index=index, columns=columns)
df_mi = pd.DataFrame(data=np.random.randn(3, 8), index=index, columns=columns_mi)

Converting the 'normal' DataFrame to json and back works fine (both in split and table orientation):

s = df.to_json(orient="split")
pd.read_json(s, orient="split")
s = df.to_json(orient="table")
pd.read_json(s, orient="table")

But doing the same with the multiindex one doesn't.

s = df_mi.to_json(orient="split")
pd.read_json(s, orient="split")

Raises: TypeError: <class 'tuple'> is not convertible to datetime

s = df_mi.to_json(orient="table")
pd.read_json(s, orient="table")

Raises: NotImplementedError: orient='table' is not supported for MultiIndex.

I'm trying to send multiindexed dataframes over HTTP; any suggestions on how I can do that?

LucasG0 added a commit to LucasG0/pandas that referenced this issue May 25, 2020
…e level MultiIndex.

Index field name in written json was incorrect, so applying read_json resulted in NaN index values. Dataframe to_json with 'table' orient now treats single level MultiIndex like single index.
LucasG0 added a commit to LucasG0/pandas that referenced this issue May 25, 2020
…e level MultiIndex.

Index field name in written json was incorrect, so applying read_json resulted in NaN index values. Dataframe to_json with 'table' orient now treats single level MultiIndex like single Index.
LucasG0 added a commit to LucasG0/pandas that referenced this issue May 26, 2020
…e level MultiIndex.

Index field name in written json was incorrect, so applying read_json resulted in NaN index values. Dataframe to_json with 'table' orient now treats single level MultiIndex like single Index.
@LucasG0
Copy link
Contributor

LucasG0 commented May 26, 2020

I submitted a PR for the author issue.

About janheindejong issues,

Raises: TypeError: <class 'tuple'> is not convertible to datetime

This may be a dupe of #4889, MultiIndex Dataframes and Series do not seem to be handled by read_json(orient='split').

NotImplementedError: orient='table' is not supported for MultiIndex

orient='table' does not support MultiIndex columns on Dataframes.

LucasG0 added a commit to LucasG0/pandas that referenced this issue May 27, 2020
…e level MultiIndex.

Index field name in written json was incorrect, so applying read_json resulted in NaN index values. Dataframe to_json with 'table' orient now treats single level MultiIndex like single Index.
LucasG0 added a commit to LucasG0/pandas that referenced this issue Sep 10, 2020
…e level MultiIndex.

Index field name in written json was incorrect, so applying read_json resulted in NaN index values. Dataframe to_json with 'table' orient now treats single level MultiIndex like single Index.
LucasG0 added a commit to LucasG0/pandas that referenced this issue Sep 22, 2020
…e level MultiIndex.

Index field name in written json was incorrect, so applying read_json resulted in NaN index values. Dataframe to_json with 'table' orient now treats single level MultiIndex like single Index.
LucasG0 added a commit to LucasG0/pandas that referenced this issue Sep 24, 2020
…e level MultiIndex.

Index field name in written json was incorrect, so applying read_json resulted in NaN index values. Dataframe to_json with 'table' orient now treats single level MultiIndex like single Index.
LucasG0 added a commit to LucasG0/pandas that referenced this issue Sep 25, 2020
…e level MultiIndex.

Index field name in written json was incorrect, so applying read_json resulted in NaN index values. Dataframe to_json with 'table' orient now treats single level MultiIndex like single Index.
LucasG0 added a commit to LucasG0/pandas that referenced this issue Sep 25, 2020
…e level MultiIndex.

Index field name in written json was incorrect, so applying read_json resulted in NaN index values. Dataframe to_json with 'table' orient now treats single level MultiIndex like single Index.
LucasG0 added a commit to LucasG0/pandas that referenced this issue Nov 7, 2020
…e level MultiIndex.

Index field name in written json was incorrect, so applying read_json resulted in NaN index values. Dataframe to_json with 'table' orient now treats single level MultiIndex like single Index.
LucasG0 added a commit to LucasG0/pandas that referenced this issue Nov 7, 2020
…e level MultiIndex.

Index field name in written json was incorrect, so applying read_json resulted in NaN index values. Dataframe to_json with 'table' orient now treats single level MultiIndex like single Index.
@Interesting6
Copy link

Interesting6 commented Nov 20, 2023

Same question, I'm trying to send multiindexed dataframes over HTTP.

When i serialise multi-index dataframe to an json by split or table orient, tuple inside the index would be converted to list. But when i try to deserialise it, it raise TypeError: unhashable type: 'list' ;

Any suggestions on how I can do that?

@developer992
Copy link

developer992 commented Mar 13, 2024

it's year 2024 and there is still no generic support to dump dataframe to json and get it back ?

NotImplementedError: orient='table' is not supported for MultiIndex columns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO JSON read_json, to_json, json_normalize MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants