-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Deprecate literal json string input to read_json #53330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
dd86d39
84e1a35
610fd81
ab8392a
1406d90
8d86a4c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,13 +51,8 @@ | |
from pandas.io.common import ( | ||
IOHandles, | ||
dedup_names, | ||
extension_to_compression, | ||
file_exists, | ||
get_handle, | ||
is_fsspec_url, | ||
is_potential_multi_index, | ||
is_url, | ||
stringify_path, | ||
) | ||
from pandas.io.json._normalize import convert_to_line_delimits | ||
from pandas.io.json._table_schema import ( | ||
|
@@ -498,7 +493,7 @@ def read_json( | |
decompression_options=_shared_docs["decompression_options"] % "path_or_buf", | ||
) | ||
def read_json( | ||
path_or_buf: FilePath | ReadBuffer[str] | ReadBuffer[bytes], | ||
path_or_buf: ReadBuffer[str] | ReadBuffer[bytes], | ||
*, | ||
orient: str | None = None, | ||
typ: Literal["frame", "series"] = "frame", | ||
|
@@ -523,12 +518,7 @@ def read_json( | |
|
||
Parameters | ||
---------- | ||
path_or_buf : a valid JSON str, path object or file-like object | ||
Any valid string path is acceptable. The string could be a URL. Valid | ||
URL schemes include http, ftp, s3, and file. For file URLs, a host is | ||
expected. A local file could be: | ||
``file://localhost/path/to/table.json``. | ||
|
||
path_or_buf : a path object or file-like object | ||
If you want to pass in a path object, pandas accepts any | ||
``os.PathLike``. | ||
|
||
|
@@ -750,6 +740,8 @@ def read_json( | |
}}\ | ||
' | ||
""" | ||
if isinstance(path_or_buf, str): | ||
raise TypeError("cannot pass literal json to pandas.read_json") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can only add a warning for now and then with 3.0 (or 2.2?) remove support for json literals. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh wow, that would be drastically different compared to the original ask in the GitHub issue. Though, that would follow along the same lines of what we have seen with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although I wrote the bug as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @wence- That makes perfect sense. I may hit pause on this until for a day or two until we hear from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah @twoertwein and @wence- are correct here. We will need to add a warning to deprecate the current behavior of parsing string json in this PR before we can outright raise an error There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mroeschke Awesome! Thanks! |
||
if orient == "table" and dtype: | ||
raise ValueError("cannot pass both dtype and orient='table'") | ||
if orient == "table" and convert_axes: | ||
|
@@ -868,8 +860,9 @@ def __init__( | |
) | ||
self.data = filepath_or_buffer | ||
elif self.engine == "ujson": | ||
data = self._get_data_from_filepath(filepath_or_buffer) | ||
self.data = self._preprocess_data(data) | ||
self.handles = get_handle(filepath_or_buffer, "r") | ||
self.data = self._preprocess_data(filepath_or_buffer) | ||
self.close() | ||
|
||
def _preprocess_data(self, data): | ||
""" | ||
|
@@ -887,47 +880,6 @@ def _preprocess_data(self, data): | |
|
||
return data | ||
|
||
def _get_data_from_filepath(self, filepath_or_buffer): | ||
""" | ||
The function read_json accepts three input types: | ||
1. filepath (string-like) | ||
2. file-like object (e.g. open file object, StringIO) | ||
3. JSON string | ||
|
||
This method turns (1) into (2) to simplify the rest of the processing. | ||
It returns input types (2) and (3) unchanged. | ||
|
||
It raises FileNotFoundError if the input is a string ending in | ||
one of .json, .json.gz, .json.bz2, etc. but no such file exists. | ||
""" | ||
# if it is a string but the file does not exist, it might be a JSON string | ||
filepath_or_buffer = stringify_path(filepath_or_buffer) | ||
if ( | ||
not isinstance(filepath_or_buffer, str) | ||
or is_url(filepath_or_buffer) | ||
or is_fsspec_url(filepath_or_buffer) | ||
or file_exists(filepath_or_buffer) | ||
): | ||
self.handles = get_handle( | ||
filepath_or_buffer, | ||
"r", | ||
encoding=self.encoding, | ||
compression=self.compression, | ||
storage_options=self.storage_options, | ||
errors=self.encoding_errors, | ||
) | ||
filepath_or_buffer = self.handles.handle | ||
elif ( | ||
isinstance(filepath_or_buffer, str) | ||
and filepath_or_buffer.lower().endswith( | ||
(".json",) + tuple(f".json{c}" for c in extension_to_compression) | ||
) | ||
and not file_exists(filepath_or_buffer) | ||
): | ||
raise FileNotFoundError(f"File {filepath_or_buffer} does not exist") | ||
|
||
return filepath_or_buffer | ||
|
||
def _combine_lines(self, lines) -> str: | ||
""" | ||
Combines a list of JSON objects into one JSON object. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check is probably too restrictive. I think we want to allow
Going forward. We just don't want
So rather than checking here. The deprecation probably wants enforced in
JsonReader
, perhaps like this?WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the value in doing that. Though, that doesn't seem to be what our repo maintainers are going after
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite sure @mroeschke wants to allow
os.PathLike
andstr
that represent file paths (that would be consistent with read/to_csv). My understanding of the issue is resolving the ambiguity of whether a string is a path or a "JSON-string" (that has caused issues in the past). When making changes that break current behavior (disallowing JSON-strings), we have to have at least one release with a deprecation warning.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh okay! I was thinking that they would want to turn strings into StringIO objects! And yeah I definitely understand that we would want to have at least one release with a deprecation warning! 💯