Skip to content

ENH: read_json behaviour with bytes object #45935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
abdulniyaspm opened this issue Feb 11, 2022 · 4 comments
Closed

ENH: read_json behaviour with bytes object #45935

abdulniyaspm opened this issue Feb 11, 2022 · 4 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@abdulniyaspm
Copy link

abdulniyaspm commented Feb 11, 2022

Is your feature request related to a problem?

The latest version of pandas.read_json does not directly operate on bytes object.

pandas version 1.1.4

>>> import pandas as pd
>>> pd.__version__
'1.1.4'
>>> df = pd.read_json(b'{"col": [1, 2, 3]}')
>>> df
   col
0    1
1    2
2    3

pandas version 1.4.0

>>> import pandas as pd
>>> pd.__version__
'1.4.0'
>>>
>>> df = pd.read_json(b'{"col": [1, 2, 3]}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
..... # Skipped traceback for brevity
raise TypeError(
TypeError: Expected file path name or file-like object, got <class 'bytes'> type

Describe the solution you'd like

I don't know the discussion that leads to the removal of this behavior from the latest releases, but I would like the latest version behave exactly same as version 1.1.4.

API breaking implications

I don't think that this will affect the API, but we may need to update the documentation.

Describe alternatives you've considered

We need to wrap the bytes within in-memory bytes buffer(BytesIO) to get the expected behavior.

>>> import pandas as pd
>>>
>>> pd.__version__
'1.4.0'
>>>
>>> from io import BytesIO
>>> df = pd.read_json(BytesIO(b'{"col": [1, 2, 3]}'))
>>> df
   col
0    1
1    2
2    3

But I would like pandas to do this work implicitly like in this case.

Additional context

pandas read_json behavior with bytes object

@abdulniyaspm abdulniyaspm added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 11, 2022
@jreback
Copy link
Contributor

jreback commented Feb 11, 2022

cc @twoertwein

@twoertwein
Copy link
Member

I don't think that this will affect the API, but we may need to update the documentation.

read_json is not documented to support bytes. If it is intended that read_json should accept bytes, you probably need to add bytes to this isinstance checks and likely an if-statement here to convert it to StringIO(filepath_or_buffer.decode(encoding, errors=encoding_errors))

Personally, I'm not a fan of accepting both content and file handles (but read_json unfortunately already accepts json-strings).

@jreback
Copy link
Contributor

jreback commented Feb 11, 2022

yeah i believe we not accept bytes here as json is by definition a string format (all that said it can surely come over the wire as bytes, but that is a boundry that the user should cross)

@mroeschke
Copy link
Member

Thanks for the request, but agreed I think not accepting bytes was an intentional decision due to json's definition. Closing as it appears there's not a lot of support to change this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

4 participants