-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame.to_json() produces malformed JSON when DataFrame contains tuples as column #20500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not sure about the expected behavior here. How do you represent multiple-part keys in JSON? |
I don't know what is the best solution that can satisfy everyone. According to the JSON specification, including value other than string as object's key is impossible. In order to avoid this problem in my work, I usually eliminate all tuples and manage to get string expression in some way, before writing JSON: import json, pandas
def eliminate_column_tuple(df, glue):
columns = df.columns
new_columns = []
for column in columns:
if type(column) is tuple:
joined = glue.join(column)
if joined.endswith(glue):
joined = joined[:-len(glue)]
new_columns.append(joined)
else:
new_columns.append(column)
df.columns = new_columns
return df
test = pandas.DataFrame(data = {
'key': ['a', 'a', 'b', 'b', 'a'],
'value': [1, 2, 3, 4, 5]
})
stat = test.groupby('key').agg(['sum', 'mean', 'count'])
simple_stat = eliminate_column_tuple(stat, '|')
stat_json = simple_stat.to_json()
print(stat_json)
json.loads(stat_json) However, such kind of solution strongly depends on the purpose. If this problem won't be fixed, at least it is recommended to write into the API document that |
I'd prefer to raise an exception (like Python does) rather than document
that we produce invalid JSON.
…On Tue, Mar 27, 2018 at 8:37 PM, k-yaegashi ***@***.***> wrote:
I don't know what is the best solution that can satisfy everyone.
According to the JSON specification
<http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf>,
including value other than string as object's key is impossible.
In order to avoid this problem in my work, I usually eliminate all tuples
and manage to get string expression in some way, before writing JSON:
import json, pandasdef eliminate_column_tuple(df, glue):
columns = df.columns
new_columns = []
for column in columns:
if type(column) is tuple:
joined = glue.join(column)
if joined.endswith(glue):
joined = joined[:-len(glue)]
new_columns.append(joined)
else:
new_columns.append(column)
df.columns = new_columns
return df
test = pandas.DataFrame(data = {
'key': ['a', 'a', 'b', 'b', 'a'],
'value': [1, 2, 3, 4, 5]
})
stat = test.groupby('key').agg(['sum', 'mean', 'count'])
simple_stat = eliminate_column_tuple(stat, '|')
stat_json = simple_stat.to_json()print(stat_json)
json.loads(stat_json)
However, such kind of solution strongly depends on the purpose.
If this problem won't be fixed, at least it is recommended to write into
the API document that to_json() is useless for DataFrame with tuple
columns.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20500 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIpLHwxdZaj8caZxw0CVbnVHqwkDcks5tiulxgaJpZM4S81Ul>
.
|
yeah I agree, can put a check here that if an index is
prob allow [10] but not [11] |
Code Sample, a copy-pastable example if possible
Result by Python 2.7
Result by Python 3
Problem description
DataFrame.to_json()
returns malformed JSON when its column contains tuple object, such as('value', 'sum')
,('value', 'mean')
etc. in this case.Expected Output
I think, at least in this case, correct output of double quote in strings should be escaped like this:
Output of
pd.show_versions()
Python2.7
Python3
I have checked past issues briefly, but I'm sorry if same issue was already filed.
The text was updated successfully, but these errors were encountered: