DataFrame.to_json silently ignores index parameter for most orients. #25513

dargueta · 2019-03-01T19:43:00Z

Code Sample, a copy-pastable example if possible

from __future__ import print_function

import pandas as pd

df = pd.DataFrame.from_records(
    [
        {'id': 'abc', 'data': 'qwerty'},
        {'id': 'def', 'data': 'uiop'}
    ],
    index='id'
)

print(df.to_json(orient='records', index=True))

# Prints out:
[{"data":"qwerty"},{"data":"uiop"}]

series = df.squeeze()
print(series.to_json(orient='records', index=True))

# Prints out:
["qwerty","uiop"]

Problem description

When creating a DataFrame that has two columns, one to be used as an index and another for the data, if you call .to_json(orient='records') the index is omitted. I know that in theory I should be using a Series for this, but I'm using it to convert a CSV file into JSONL and I don't know what the CSV file is going to look like ahead of time.

In any case, squeezing the DataFrame into a Series doesn't work either. In fact, the bug in Series.to_json is even worse, as it produces an array of strings instead of an array of dictionaries.

This bug is present in master.

Expected Output

Expected output is:

[{"id":"abc", "data":"qwerty"},{"id":"def","data":"uiop"}]

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.1
pytest: 4.3.0
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: None
pyarrow: 0.12.1
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: None
s3fs: 0.2.0
fastparquet: 0.2.1
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

coderop2 · 2019-03-04T10:20:20Z

@mroeschke in the function description its clearly mentioned that by using orient = "records" the index is not preserved... so what to here..?
@dargueta u could use df.reset_index(inplace=True) to make the index as a column before using to_json()

TomAugspurger · 2019-03-04T14:07:49Z

Right, so the index= parameter only works for split and table orients.

I think we should change the default split to None, and

Make the default index be True when orient is split or table.
Raise if index is specified for any other orient.

We should deprecate the current behavior (ignoring index for other orients) first.

coderop2 · 2019-03-04T14:16:10Z

I think ignoring index for other orients might not be good because sometimes index like dates is used....so in that case the index might get ignored completely

TomAugspurger · 2019-03-04T14:19:44Z

@coderop2 can you clarify? My proposal is to stop ignoring the index parameter when it's not valid for the given orient.

dargueta · 2019-03-04T18:06:27Z

in the function description its clearly mentioned that by using orient = "records" the index is not preserved.

@coderop2 I disagree about "clearly". The documentation of the parameter states:

Whether to include the index values in the JSON string. Not including the index (index=False) is only supported when orient is ‘split’ or ‘table’.

I took this to mean the index is always included when the orientation is "records". We should at the very least rephrase this to be more explicit.

@TomAugspurger I agree with most of your proposed solution, though I don't understand why the index should be automatically dropped for orient="records". Dropping it would make sense if the index is the default sequential series, but if I explicitly specify a column as an index, I'd expect it to be included by default.

TomAugspurger · 2019-03-04T18:10:31Z

We wouldn't want data-dependent behavior for whether the index is included.

…

On Mon, Mar 4, 2019 at 12:06 PM Diego Argueta ***@***.***> wrote: in the function description its clearly mentioned that by using orient = "records" the index is not preserved. @coderop2 <https://github.com/coderop2> I disagree about "clearly". The documentation of the parameter states: Whether to include the index values in the JSON string. Not including the index (index=False) is only supported when orient is ‘split’ or ‘table’. I took this to mean the index is *always* included when the orientation is "records". @TomAugspurger <https://github.com/TomAugspurger> I agree with most of your proposed solution, though I don't understand why the index should be automatically dropped for orient="records". Dropping it would make sense *if* the index is the default sequential series, but if I explicitly specify a column as an index, I'd expect it to be included by default. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25513 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIl-dMOI4AmRz-6EbJWrOCMhDbBwEks5vTWCpgaJpZM4bZsDw> .

dargueta · 2019-03-04T18:11:57Z

Good point.

coderop2 · 2019-03-05T08:18:22Z

@TomAugspurger I am sorry I misunderstood for what you were explaining...I understand now
@dargueta I was talking about the description given in the code...it goes like this
Encoding/decoding a Dataframe using ``'records'`` formatted JSON.
Note that index labels are not preserved with this encoding.
@TomAugspurger since I am new to open source can I work up a solution and present it here...if that solution is acceptable then I will create a pull req

dargueta · 2019-03-05T15:47:53Z

I still think dropping the index is a bit of a nasty surprise, especially if my code gets handed a DataFrame that someone else's code created. I don't want to break their code if I change my serialization format on the back end. Do you think there should be a way to override the default behavior, at least for orient="records"?

TomAugspurger · 2019-03-05T15:56:20Z

@dargueta we need to consider backwards compatibility. Making df.to_json(orient='records') include the index would be an API breaking change.

We'll need a period where providing index=True with the records orient raises. After that time period we can reconsider allowing it.

In the meantime, you'll need to .reset_index() before calling .to_json with the records orient.

TomAugspurger · 2019-03-05T15:56:39Z

@TomAugspurger since I am new to open source can I work up a solution and present it here...if that solution is acceptable then I will create a pull req

That'd be great! Let us know if you need help getting started.

dargueta · 2019-03-05T16:03:20Z

Oh I didn't mean making it the default, just providing a way to override the default behavior so that we can include it if desired without breaking backwards compatibility.

TomAugspurger · 2019-03-05T16:07:14Z

I don't think there's a way though, since the default index is currently True, right?

…

On Tue, Mar 5, 2019 at 10:04 AM Diego Argueta ***@***.***> wrote: Oh I didn't mean *making* it the default, just providing a way to override the default behavior so that we can include it if desired without breaking backwards compatibility. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25513 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIlK3ra6VLIiqn-ybVtscU20H-Pefks5vTpWDgaJpZM4bZsDw> .

dargueta · 2019-03-05T16:39:30Z

Well if we change the default of index to be None like you said earlier, then passing True explicitly would indicate the intent, no?

WillAyd · 2019-03-05T16:45:32Z

Can you use the table orient in this case? It at least provides the data that you are looking for, albeit with some extra metadata attached.

We already have quite a few JSON serialization formats so I'd be hesitant to add more, especially given this functionality overlaps already with one of the other formats

TomAugspurger · 2019-03-05T16:50:49Z

Hmm, yes. I worry a bit about people who were previously passing `index=True`... But perhaps that's not a valid concern.

…

On Tue, Mar 5, 2019 at 10:40 AM Diego Argueta ***@***.***> wrote: Well if we change the default of index to be None like you said earlier, then passing True explicitly would indicate the intent, no? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25513 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIjPmjxRyEGUel0aZkBWwGTJCD63tks5vTp4LgaJpZM4bZsDw> .

dargueta · 2019-03-05T18:40:41Z

Can you use the table orient in this case? It at least provides the data that you are looking for, albeit with some extra metadata attached.

Sadly no, I'm dumping it using lines=True so that it can be read by a database. I didn't put that in my example to minimize the number of arguments that could be interacting.

We already have quite a few JSON serialization formats so I'd be hesitant to add more

I wasn't suggesting adding another format, only changing the behavior of index.

Hmm, yes. I worry a bit about people who were previously passing index=True

I wouldn't say that's an invalid concern, but it does make me question why someone would deliberately pass index=True and orient='records' if they weren't trying to get the behavior I wanted, based on what the documentation states. I imagine the change be an issue for users who're dynamically constructing the arguments and are relying on index being ignored. At the risk of coming off as a pedantic jerk, I'd say relying on that behavior is unwise to begin with.

coderop2 · 2019-03-06T12:49:24Z

@TomAugspurger if we are ignoring the index for the records orient then why cannot we just set it to false. But if we do a error is thrown accrding to:
if not index and orient not in ['split', 'table']:
raise ValueError("'index=False' is only valid when 'orient' is "" 'split' or 'table' ")
Because if you include records in the list and pass index as false no error is thrown

ARDivekar · 2020-01-16T11:38:20Z

FYI. For those looking for a solution, I got around this by doing a simple: df['index_name'] = df.index before calling df.to_json(...)

tommyjcarpenter · 2021-09-10T11:39:50Z

I just want to chime in and say, the docs are very subtle w.r.t. this issue: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html

It is very easy to miss the line Note that index labels are not preserved with this encoding.

The fact that orient=records, index=True is a LEGAL combination of parameters, yet it ...doesnt do that, is the unexpected behavior. Pandas basically says "we heard you but nah". I would prefer orient=records, index=True be an illegal combination and explicitly tell the user, "you cant do that" [for some reason] rather than just silently ignoring the intent.

Perhaps the solution above df['index_name'] = df.index before calling df.to_json(...) should be mentioned in those docs to help the reader understand what to do, other than googling and eventually finding this issue.

It would also be nice if index=True just did this for you, so that it "did" work.

- split and table allow index=True/False - records and values only allow index=False - index and columns only allow index=True - raise for contradictions in the latter two - see pandas-dev#25513

…2143) * API/BUG: Make to_json index= consistent with orient - split and table allow index=True/False - records and values only allow index=False - index and columns only allow index=True - raise for contradictions in the latter two - see #25513 * style: lint * style: make mypy happy * review: simplify * review: clarify and consolidate branches * style: add explainer comment * doc: change error message in _json * docs: update whatsnew 2.1.0 * docs: sort whatsnew

…ndas-dev#52143) * API/BUG: Make to_json index= consistent with orient - split and table allow index=True/False - records and values only allow index=False - index and columns only allow index=True - raise for contradictions in the latter two - see pandas-dev#25513 * style: lint * style: make mypy happy * review: simplify * review: clarify and consolidate branches * style: add explainer comment * doc: change error message in _json * docs: update whatsnew 2.1.0 * docs: sort whatsnew

mroeschke added Bug IO JSON read_json, to_json, json_normalize labels Mar 4, 2019

TomAugspurger added the API Design label Mar 4, 2019

TomAugspurger added this to the Contributions Welcome milestone Mar 4, 2019

TomAugspurger changed the title ~~to_json(orient='records') omits index if df has one column~~ DataFrame.to_json ignores index parameter for most orients. Mar 4, 2019

TomAugspurger changed the title ~~DataFrame.to_json ignores index parameter for most orients.~~ DataFrame.to_json silently ignores index parameter for most orients. Mar 4, 2019

jbrockmendel removed the API Design label Dec 18, 2019

juanmigueltiscar mentioned this issue Nov 3, 2020

BUG: DataFrame.to_json ignores index if it is repeated in the DataFrame. #37600

Closed

1 task

kinghuang mentioned this issue Nov 23, 2020

Missing index column in dataframe.to_json output dask/dask#6875

Closed

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

dshemetov mentioned this issue Mar 23, 2023

API/BUG: Make to_json index= arg consistent with orient arg #52143

Merged

5 tasks

mroeschke closed this as completed in #52143 May 11, 2023

cwharris mentioned this issue Sep 18, 2024

[BUG]: Incompatible to_json call w/Pandas 2 nv-morpheus/Morpheus#1895

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.to_json silently ignores index parameter for most orients. #25513

DataFrame.to_json silently ignores index parameter for most orients. #25513

dargueta commented Mar 1, 2019 •

edited by TomAugspurger

Loading

INSTALLED VERSIONS

coderop2 commented Mar 4, 2019 •

edited

Loading

TomAugspurger commented Mar 4, 2019

coderop2 commented Mar 4, 2019

TomAugspurger commented Mar 4, 2019

dargueta commented Mar 4, 2019 •

edited

Loading

TomAugspurger commented Mar 4, 2019 via email

dargueta commented Mar 4, 2019

coderop2 commented Mar 5, 2019 •

edited

Loading

dargueta commented Mar 5, 2019

TomAugspurger commented Mar 5, 2019

TomAugspurger commented Mar 5, 2019

dargueta commented Mar 5, 2019

TomAugspurger commented Mar 5, 2019 via email

dargueta commented Mar 5, 2019 •

edited

Loading

WillAyd commented Mar 5, 2019

TomAugspurger commented Mar 5, 2019 via email

dargueta commented Mar 5, 2019 •

edited

Loading

coderop2 commented Mar 6, 2019 •

edited

Loading

ARDivekar commented Jan 16, 2020

tommyjcarpenter commented Sep 10, 2021 •

edited

Loading

DataFrame.to_json silently ignores index parameter for most orients. #25513

DataFrame.to_json silently ignores index parameter for most orients. #25513

Comments

dargueta commented Mar 1, 2019 • edited by TomAugspurger Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

coderop2 commented Mar 4, 2019 • edited Loading

TomAugspurger commented Mar 4, 2019

coderop2 commented Mar 4, 2019

TomAugspurger commented Mar 4, 2019

dargueta commented Mar 4, 2019 • edited Loading

TomAugspurger commented Mar 4, 2019 via email

dargueta commented Mar 4, 2019

coderop2 commented Mar 5, 2019 • edited Loading

dargueta commented Mar 5, 2019

TomAugspurger commented Mar 5, 2019

TomAugspurger commented Mar 5, 2019

dargueta commented Mar 5, 2019

TomAugspurger commented Mar 5, 2019 via email

dargueta commented Mar 5, 2019 • edited Loading

WillAyd commented Mar 5, 2019

TomAugspurger commented Mar 5, 2019 via email

dargueta commented Mar 5, 2019 • edited Loading

coderop2 commented Mar 6, 2019 • edited Loading

ARDivekar commented Jan 16, 2020

tommyjcarpenter commented Sep 10, 2021 • edited Loading

dargueta commented Mar 1, 2019 •

edited by TomAugspurger

Loading

Output of `pd.show_versions()`

coderop2 commented Mar 4, 2019 •

edited

Loading

dargueta commented Mar 4, 2019 •

edited

Loading

coderop2 commented Mar 5, 2019 •

edited

Loading

dargueta commented Mar 5, 2019 •

edited

Loading

dargueta commented Mar 5, 2019 •

edited

Loading

coderop2 commented Mar 6, 2019 •

edited

Loading

tommyjcarpenter commented Sep 10, 2021 •

edited

Loading