-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DataFrame.to_json ignores index if it is repeated in the DataFrame. #37600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am not sure if this is a bug perse, since a json should not have a duplicate keys and some formats don't allow it, see here. We could maybe warn the user here, or document it properly. |
I agree, but it does not happen with to_pickle(), for example. I didn't know how to catalog this issue. Notice that only 'primaryKey': [] needs to be included in the resulting json. |
cc @WillAyd |
Comparing it to |
Ok, I get you point of view. If you think that it is okey I will close the ticket. My concern passed when I changed the way I stored my DataFrames. Everything started to work wrong, until I discovered this "problem". |
I think it is fine to keep open. Would need a community PR though. It does technically go against suggested JSON standards so unlikely that a core developer would prioritize a patch
…Sent from my iPhone
On Nov 10, 2020, at 9:53 AM, juanmigueltiscar ***@***.***> wrote:
Comparing it to to_pickle isn't a worthwhile endeavor as pickle is an entirely different format / protocol. As mentioned above, JSON should have unique keys, though it isn't explicitly forbidden.
Ok, I get you point of view. If you think that it is okey I will close the ticket. My concern passed when I changed the way I stored my DataFrames. Everything started to work wrong, until I discovered this "problem".
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Okey, you have convinced me. As @erfannariman also suggested, at least a warning could be a good idea. |
…SV output, due to a known issue in cudf & pandas (rapidsai/cudf#11317 & pandas-dev/pandas#37600) this option has no effect on JSON output
* instructions for manually testing of Morpheus using Kafka. Adds a Kafka version for each of the four validation scripts in `scripts/validation` * csv & json serializers now support an `include_index_col` flag to control exporting the Dataframe's index column. Note due to a limitation of cudf & pandas this has no impact on JSON: + pandas-dev/pandas#37600 + rapidsai/cudf#11317 * `morpheus.utils.logging` renamed to `morpheus.utils.logger` so that other modules in `morpheus.utils` can import the standard lib logging module. * Comparison logic in the `ValidationStage` has been moved to it's own module `morpheus.utils.compare_df` so that the functionality can be used outside of the stage. fixes #265 Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Pete MacKinnon (https://github.com/pdmack) - Michael Demoret (https://github.com/mdemoret-nv) URL: #290
Don't mean to necro and maybe this is a separate issue, but the issue I'm having seems more general than the title here suggests. The index is ignored period: import pandas as pd
# %%
df = pd.DataFrame({"a": [1, 2, 3, 4, 5, 6], "b": [1, 2, 3.0, None, np.nan, pd.NA]}).set_index("a")
df.to_json(orient="records")
# '[{"b":1},{"b":2},{"b":3.0},{"b":null},{"b":null},{"b":null}]'
# %%
df = pd.DataFrame({"a": [1, 2, 3, 4, 5, 6], "b": [1, 2, 3.0, None, np.nan, pd.NA]}).set_index("a")
df.to_json(orient="records", index=True)
# '[{"b":1},{"b":2},{"b":3.0},{"b":null},{"b":null},{"b":null}]'
# %%
df = pd.DataFrame({"b": [1, 2, 3.0, None, np.nan, pd.NA]}, index=range(5, 11))
df.to_json(orient="records", index=True)
# '[{"b":1},{"b":2},{"b":3.0},{"b":null},{"b":null},{"b":null}]' My workaround this is EDIT: Looks like my comment is a duplicate of #25513. EDIT 2: Found elsewhere, but
|
@dshemetov the |
Sure, I'll need a little time to get to it. |
@mroeschke Pretty sure that #52143 fixed a completely separate issue. It is still the case that the index is silently not written when the index is duplicated, and there really should be at least a warning, if not an error as it is in #52143. |
[x ] I have checked that this issue has not already been reported.
Similar to DataFrame.to_json silently ignores index parameter for most orients. #25513
[x ] I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd
df=pd.DataFrame({'column':[0,1,2,3,4], 'probe':[0,1,2,2,3]})
df.set_index('probe',inplace=True)
out = df.to_json(orient='table', index=True)
assert 'primaryKey' in out
The text was updated successfully, but these errors were encountered: