-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame.to_json silently ignores index parameter for most orients. #25513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@mroeschke in the function description its clearly mentioned that by using orient = "records" the index is not preserved... so what to here..? |
Right, so the I think we should change the default
We should deprecate the current behavior (ignoring |
I think ignoring index for other orients might not be good because sometimes index like dates is used....so in that case the index might get ignored completely |
@coderop2 can you clarify? My proposal is to stop ignoring the |
@coderop2 I disagree about "clearly". The documentation of the parameter states:
I took this to mean the index is always included when the orientation is "records". We should at the very least rephrase this to be more explicit. @TomAugspurger I agree with most of your proposed solution, though I don't understand why the index should be automatically dropped for orient="records". Dropping it would make sense if the index is the default sequential series, but if I explicitly specify a column as an index, I'd expect it to be included by default. |
We wouldn't want data-dependent behavior for whether the index is included.
…On Mon, Mar 4, 2019 at 12:06 PM Diego Argueta ***@***.***> wrote:
in the function description its clearly mentioned that by using orient =
"records" the index is not preserved.
@coderop2 <https://github.com/coderop2> I disagree about "clearly". The
documentation of the parameter states:
Whether to include the index values in the JSON string. Not including the
index (index=False) is only supported when orient is ‘split’ or ‘table’.
I took this to mean the index is *always* included when the orientation
is "records".
@TomAugspurger <https://github.com/TomAugspurger> I agree with most of
your proposed solution, though I don't understand why the index should be
automatically dropped for orient="records". Dropping it would make sense
*if* the index is the default sequential series, but if I explicitly
specify a column as an index, I'd expect it to be included by default.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#25513 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIl-dMOI4AmRz-6EbJWrOCMhDbBwEks5vTWCpgaJpZM4bZsDw>
.
|
Good point. |
@TomAugspurger I am sorry I misunderstood for what you were explaining...I understand now |
I still think dropping the index is a bit of a nasty surprise, especially if my code gets handed a DataFrame that someone else's code created. I don't want to break their code if I change my serialization format on the back end. Do you think there should be a way to override the default behavior, at least for |
@dargueta we need to consider backwards compatibility. Making We'll need a period where providing In the meantime, you'll need to |
That'd be great! Let us know if you need help getting started. |
Oh I didn't mean making it the default, just providing a way to override the default behavior so that we can include it if desired without breaking backwards compatibility. |
I don't think there's a way though, since the default index is currently
True, right?
…On Tue, Mar 5, 2019 at 10:04 AM Diego Argueta ***@***.***> wrote:
Oh I didn't mean *making* it the default, just providing a way to
override the default behavior so that we can include it if desired without
breaking backwards compatibility.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#25513 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIlK3ra6VLIiqn-ybVtscU20H-Pefks5vTpWDgaJpZM4bZsDw>
.
|
Well if we change the default of |
Can you use the table orient in this case? It at least provides the data that you are looking for, albeit with some extra metadata attached. We already have quite a few JSON serialization formats so I'd be hesitant to add more, especially given this functionality overlaps already with one of the other formats |
Hmm, yes. I worry a bit about people who were previously passing
`index=True`... But perhaps that's
not a valid concern.
…On Tue, Mar 5, 2019 at 10:40 AM Diego Argueta ***@***.***> wrote:
Well if we change the default of index to be None like you said earlier,
then passing True explicitly would indicate the intent, no?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#25513 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIjPmjxRyEGUel0aZkBWwGTJCD63tks5vTp4LgaJpZM4bZsDw>
.
|
Sadly no, I'm dumping it using
I wasn't suggesting adding another format, only changing the behavior of
I wouldn't say that's an invalid concern, but it does make me question why someone would deliberately pass |
@TomAugspurger if we are ignoring the index for the records orient then why cannot we just set it to false. But if we do a error is thrown accrding to: |
FYI. For those looking for a solution, I got around this by doing a simple: |
I just want to chime in and say, the docs are very subtle w.r.t. this issue: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html It is very easy to miss the line The fact that Perhaps the solution above It would also be nice if |
- split and table allow index=True/False - records and values only allow index=False - index and columns only allow index=True - raise for contradictions in the latter two - see pandas-dev#25513
…2143) * API/BUG: Make to_json index= consistent with orient - split and table allow index=True/False - records and values only allow index=False - index and columns only allow index=True - raise for contradictions in the latter two - see #25513 * style: lint * style: make mypy happy * review: simplify * review: clarify and consolidate branches * style: add explainer comment * doc: change error message in _json * docs: update whatsnew 2.1.0 * docs: sort whatsnew
…ndas-dev#52143) * API/BUG: Make to_json index= consistent with orient - split and table allow index=True/False - records and values only allow index=False - index and columns only allow index=True - raise for contradictions in the latter two - see pandas-dev#25513 * style: lint * style: make mypy happy * review: simplify * review: clarify and consolidate branches * style: add explainer comment * doc: change error message in _json * docs: update whatsnew 2.1.0 * docs: sort whatsnew
…ndas-dev#52143) * API/BUG: Make to_json index= consistent with orient - split and table allow index=True/False - records and values only allow index=False - index and columns only allow index=True - raise for contradictions in the latter two - see pandas-dev#25513 * style: lint * style: make mypy happy * review: simplify * review: clarify and consolidate branches * style: add explainer comment * doc: change error message in _json * docs: update whatsnew 2.1.0 * docs: sort whatsnew
see
#25513 (comment)
Code Sample, a copy-pastable example if possible
Problem description
When creating a
DataFrame
that has two columns, one to be used as an index and another for the data, if you call.to_json(orient='records')
the index is omitted. I know that in theory I should be using aSeries
for this, but I'm using it to convert a CSV file into JSONL and I don't know what the CSV file is going to look like ahead of time.In any case, squeezing the
DataFrame
into aSeries
doesn't work either. In fact, the bug inSeries.to_json
is even worse, as it produces an array of strings instead of an array of dictionaries.This bug is present in master.
Expected Output
Expected output is:
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.1
pytest: 4.3.0
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: None
pyarrow: 0.12.1
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: None
s3fs: 0.2.0
fastparquet: 0.2.1
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: