Skip to content

ENH: Deprecate literal json string input to read_json #53330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

ENH: Deprecate literal json string input to read_json #53330

wants to merge 6 commits into from

Conversation

rmhowe425
Copy link
Contributor

@rmhowe425 rmhowe425 marked this pull request as draft May 21, 2023 22:52
@rmhowe425
Copy link
Contributor Author

rmhowe425 commented May 21, 2023

@mroeschke This looks like its going to be a pretty bulky PR after all of the unit tests are updated.

Was hoping that someone could make sure that I'm on the right track before I finish updating the remainder of the unit tests?

@@ -750,6 +740,8 @@ def read_json(
}}\
'
"""
if isinstance(path_or_buf, str):
raise TypeError("cannot pass literal json to pandas.read_json")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can only add a warning for now and then with 3.0 (or 2.2?) remove support for json literals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, that would be drastically different compared to the original ask in the GitHub issue. Though, that would follow along the same lines of what we have seen with DataFrame.concat and DataFrame.append

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I wrote the bug as ENH, my plan (although thanks for picking it up since I didn't get enough roundtoits until now), was to introduce a deprecation warning and then remove in an upcoming release. I think it's too user- facing to just remove immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wence- That makes perfect sense.

I may hit pause on this until for a day or two until we hear from mroeschke or another reviewer for official direction. I don't want to steer too far away from the original acceptance criteria / guidance in the Feature Description, although I think that they'll probably echo what you're suggesting 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah @twoertwein and @wence- are correct here. We will need to add a warning to deprecate the current behavior of parsing string json in this PR before we can outright raise an error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke Awesome! Thanks!

@@ -750,6 +740,8 @@ def read_json(
}}\
'
"""
if isinstance(path_or_buf, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check is probably too restrictive. I think we want to allow

pd.read_json("path/to/file.json")

Going forward. We just don't want

pd.read_json("{json: object}") # or however it's spelt

So rather than checking here. The deprecation probably wants enforced in JsonReader, perhaps like this?

diff --git a/pandas/io/json/_json.py b/pandas/io/json/_json.py
index 8775c65f14..78f541d3df 100644
--- a/pandas/io/json/_json.py
+++ b/pandas/io/json/_json.py
@@ -18,6 +18,7 @@ from typing import (
     TypeVar,
     overload,
 )
+import warnings
 
 import numpy as np
 
@@ -30,6 +31,7 @@ from pandas._libs.tslibs import iNaT
 from pandas.compat._optional import import_optional_dependency
 from pandas.errors import AbstractMethodError
 from pandas.util._decorators import doc
+from pandas.util._exceptions import find_stack_level
 from pandas.util._validators import check_dtype_backend
 
 from pandas.core.dtypes.common import ensure_str
@@ -916,7 +918,14 @@ class JsonReader(abc.Iterator, Generic[FrameSeriesStrT]):
             and not file_exists(filepath_or_buffer)
         ):
             raise FileNotFoundError(f"File {filepath_or_buffer} does not exist")
-
+        else:
+            warnings.warn(
+                "Passing literal json to 'read_json' is deprecated and "
+                "will be removed in a future version. To read from a "
+                "literal string, wrap it in a 'StringIO' object.",
+                FutureWarning,
+                stacklevel=find_stack_level(),
+            )
         return filepath_or_buffer
 
     def _combine_lines(self, lines) -> str:

WDYT?

Copy link
Contributor Author

@rmhowe425 rmhowe425 May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the value in doing that. Though, that doesn't seem to be what our repo maintainers are going after

Copy link
Member

@twoertwein twoertwein May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the value in doing that. Though, that doesn't seem to be what our repo maintainers are going after

I'm quite sure @mroeschke wants to allow os.PathLike and str that represent file paths (that would be consistent with read/to_csv). My understanding of the issue is resolving the ambiguity of whether a string is a path or a "JSON-string" (that has caused issues in the past). When making changes that break current behavior (disallowing JSON-strings), we have to have at least one release with a deprecation warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay! I was thinking that they would want to turn strings into StringIO objects! And yeah I definitely understand that we would want to have at least one release with a deprecation warning! 💯

@mroeschke mroeschke added the IO JSON read_json, to_json, json_normalize label May 22, 2023
@asishm
Copy link
Contributor

asishm commented May 23, 2023

Do we want to achieve the consistency as mentioned in #52271 (comment) in this PR? If not, maybe open a follow-up issue with the xref.

@rmhowe425 rmhowe425 closed this May 26, 2023
@rmhowe425 rmhowe425 deleted the dev/read_json branch May 26, 2023 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Deprecate literal json string input to read_json
5 participants