Skip to content

DEPR: Deprecate literal json string input to read_json #53409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Jun 20, 2023
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
763bd7a
Adding logic to throw a deprecation warning when a literal json strin…
rmhowe425 May 27, 2023
ffc47a6
Adding logic to throw a deprecation warning when a literal json strin…
rmhowe425 May 27, 2023
b23e548
Updating documentation and adding PR num to unit test
rmhowe425 May 27, 2023
639a212
Adding a deprecation warning to the user guide
rmhowe425 May 27, 2023
57c96df
Updating unit tests to check for FutureWarning
rmhowe425 May 27, 2023
93c47ea
Fixing unit tests
rmhowe425 May 27, 2023
f45c0f2
Fixing unit tests
rmhowe425 May 27, 2023
d62aa6d
Fixing unit tests
rmhowe425 May 27, 2023
5938434
Fixing unit tests
rmhowe425 May 27, 2023
e5e3b09
Fixing documentation errors in PR feedback
rmhowe425 May 30, 2023
097b3f2
Fixing documentation errors in PR feedback
rmhowe425 May 30, 2023
c9a1a1a
Updating unit tests to use StringIO rather than catch FutureWarning
rmhowe425 May 31, 2023
df64bf1
Finishing updating unit tests to use StringIO rather than catch Futur…
rmhowe425 May 31, 2023
ee427c9
Fixing indendation errors in unit tests. Moved one unit test to anoth…
rmhowe425 Jun 2, 2023
7fb9d7c
Updating unit test name
rmhowe425 Jun 2, 2023
5999342
Merge branch 'main' into dev/read_json
rmhowe425 Jun 3, 2023
89a8c62
Adding additional checks to unit tests
rmhowe425 Jun 5, 2023
7e81563
Merge branch 'dev/read_json' of github.com:rmhowe425/pandas into dev/…
rmhowe425 Jun 5, 2023
83d94e6
Fixing unit tests
rmhowe425 Jun 6, 2023
ef25857
Merge branch 'main' into dev/read_json
rmhowe425 Jun 7, 2023
81b7ab2
Fixing unit tests
rmhowe425 Jun 7, 2023
8213148
Merge branch 'main' into dev/read_json
rmhowe425 Jun 7, 2023
7d2a80a
Updating whatsnew documentation per reviewer recommendations.
rmhowe425 Jun 12, 2023
12c937d
Merge branch 'main' into dev/read_json
rmhowe425 Jun 12, 2023
90ebf54
Merge branch 'main' into dev/read_json
rmhowe425 Jun 13, 2023
bf2e686
Fixing failing code tests
rmhowe425 Jun 13, 2023
09524a4
Merge branch 'dev/read_json' of github.com:rmhowe425/pandas into dev/…
rmhowe425 Jun 13, 2023
c55ea18
Fixing failing code tests
rmhowe425 Jun 13, 2023
85ce639
Adding import to doc string example
rmhowe425 Jun 13, 2023
0773ef0
Fixing documentation formatting error
rmhowe425 Jun 13, 2023
89180d3
Fixing documentation formatting error
rmhowe425 Jun 13, 2023
62b3a79
Merge branch 'main' into dev/read_json
rmhowe425 Jun 14, 2023
543b725
Fixing documentation error after fixing merge conflict
rmhowe425 Jun 14, 2023
b11a80c
Fixing formatting errors in whatsnew file
rmhowe425 Jun 14, 2023
4c29f5f
Updating formatting errors in documentation
rmhowe425 Jun 14, 2023
eee3b3d
Updating formatting errors in documentation
rmhowe425 Jun 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2072,6 +2072,10 @@ is ``None``. To explicitly force ``Series`` parsing, pass ``typ=series``
* ``engine``: Either ``"ujson"``, the built-in JSON parser, or ``"pyarrow"`` which dispatches to pyarrow's ``pyarrow.json.read_json``.
The ``"pyarrow"`` is only available when ``lines=True``

.. warning::

Passing json literal strings will be deprecated in a future release of pandas.

The parser will raise one of ``ValueError/TypeError/AssertionError`` if the JSON is not parseable.

If a non-default ``orient`` was used when encoding to JSON be sure to pass the same
Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -268,9 +268,11 @@ Deprecations
- Deprecated allowing arbitrary ``fill_value`` in :class:`SparseDtype`, in a future version the ``fill_value`` will need to be compatible with the ``dtype.subtype``, either a scalar that can be held by that subtype or ``NaN`` for integer or bool subtypes (:issue:`23124`)
- Deprecated behavior of :func:`assert_series_equal` and :func:`assert_frame_equal` considering NA-like values (e.g. ``NaN`` vs ``None`` as equivalent) (:issue:`52081`)
- Deprecated constructing :class:`SparseArray` from scalar data, pass a sequence instead (:issue:`53039`)
- Deprecated literal json input to :func:`read_json`. Moving forward the method only accepts file-like objects (:issue:`53409`)
- Deprecated positional indexing on :class:`Series` with :meth:`Series.__getitem__` and :meth:`Series.__setitem__`, in a future version ``ser[item]`` will *always* interpret ``item`` as a label, not a position (:issue:`50617`)
-


.. ---------------------------------------------------------------------------
.. _whatsnew_210.performance:

Expand Down
11 changes: 10 additions & 1 deletion pandas/io/json/_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
TypeVar,
overload,
)
import warnings

import numpy as np

Expand All @@ -30,6 +31,7 @@
from pandas.compat._optional import import_optional_dependency
from pandas.errors import AbstractMethodError
from pandas.util._decorators import doc
from pandas.util._exceptions import find_stack_level
from pandas.util._validators import check_dtype_backend

from pandas.core.dtypes.common import ensure_str
Expand Down Expand Up @@ -925,7 +927,14 @@ def _get_data_from_filepath(self, filepath_or_buffer):
and not file_exists(filepath_or_buffer)
):
raise FileNotFoundError(f"File {filepath_or_buffer} does not exist")

else:
warnings.warn(
"Passing literal json to 'read_json' is deprecated and "
"will be removed in a future version. To read from a "
"literal string, wrap it in a 'StringIO' object.",
FutureWarning,
stacklevel=find_stack_level(),
)
return filepath_or_buffer

def _combine_lines(self, lines) -> str:
Expand Down
24 changes: 18 additions & 6 deletions pandas/tests/io/json/test_compression.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@
from pandas.tests.io.test_compression import _compression_to_extension


def generateDepMsg():
return (
"Passing literal json to 'read_json' is deprecated and "
"will be removed in a future version. To read from a "
"literal string, wrap it in a 'StringIO' object."
)


def test_compression_roundtrip(compression):
df = pd.DataFrame(
[[0.123456, 0.234567, 0.567567], [12.32112, 123123.2, 321321.2]],
Expand All @@ -23,7 +31,8 @@ def test_compression_roundtrip(compression):
# explicitly ensure file was compressed.
with tm.decompress_file(path, compression) as fh:
result = fh.read().decode("utf8")
tm.assert_frame_equal(df, pd.read_json(result))
with tm.assert_produces_warning(FutureWarning, match=generateDepMsg()):
tm.assert_frame_equal(df, pd.read_json(result))


def test_read_zipped_json(datapath):
Expand All @@ -40,8 +49,8 @@ def test_read_zipped_json(datapath):
@pytest.mark.single_cpu
def test_with_s3_url(compression, s3_resource, s3so):
# Bucket "pandas-test" created in tests/io/conftest.py

df = pd.read_json('{"a": [1, 2, 3], "b": [4, 5, 6]}')
with tm.assert_produces_warning(FutureWarning, match=generateDepMsg()):
df = pd.read_json('{"a": [1, 2, 3], "b": [4, 5, 6]}')

with tm.ensure_clean() as path:
df.to_json(path, compression=compression)
Expand All @@ -56,15 +65,17 @@ def test_with_s3_url(compression, s3_resource, s3so):

def test_lines_with_compression(compression):
with tm.ensure_clean() as path:
df = pd.read_json('{"a": [1, 2, 3], "b": [4, 5, 6]}')
with tm.assert_produces_warning(FutureWarning, match=generateDepMsg()):
df = pd.read_json('{"a": [1, 2, 3], "b": [4, 5, 6]}')
df.to_json(path, orient="records", lines=True, compression=compression)
roundtripped_df = pd.read_json(path, lines=True, compression=compression)
tm.assert_frame_equal(df, roundtripped_df)


def test_chunksize_with_compression(compression):
with tm.ensure_clean() as path:
df = pd.read_json('{"a": ["foo", "bar", "baz"], "b": [4, 5, 6]}')
with tm.assert_produces_warning(FutureWarning, match=generateDepMsg()):
df = pd.read_json('{"a": ["foo", "bar", "baz"], "b": [4, 5, 6]}')
df.to_json(path, orient="records", lines=True, compression=compression)

with pd.read_json(
Expand All @@ -75,7 +86,8 @@ def test_chunksize_with_compression(compression):


def test_write_unsupported_compression_type():
df = pd.read_json('{"a": [1, 2, 3], "b": [4, 5, 6]}')
with tm.assert_produces_warning(FutureWarning, match=generateDepMsg()):
df = pd.read_json('{"a": [1, 2, 3], "b": [4, 5, 6]}')
with tm.ensure_clean() as path:
msg = "Unrecognized compression type: unsupported"
with pytest.raises(ValueError, match=msg):
Expand Down
10 changes: 9 additions & 1 deletion pandas/tests/io/json/test_deprecated_kwargs.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,17 @@
from pandas.io.json import read_json


def generateDepMsg():
return (
"Passing literal json to 'read_json' is deprecated and "
"will be removed in a future version. To read from a "
"literal string, wrap it in a 'StringIO' object."
)


def test_good_kwargs():
df = pd.DataFrame({"A": [2, 4, 6], "B": [3, 6, 9]}, index=[0, 1, 2])
with tm.assert_produces_warning(None):
with tm.assert_produces_warning(FutureWarning, match=generateDepMsg()):
tm.assert_frame_equal(df, read_json(df.to_json(orient="split"), orient="split"))
tm.assert_frame_equal(
df, read_json(df.to_json(orient="columns"), orient="columns")
Expand Down
23 changes: 18 additions & 5 deletions pandas/tests/io/json/test_json_table_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,14 @@
)


def generateDepMsg():
return (
"Passing literal json to 'read_json' is deprecated and "
"will be removed in a future version. To read from a "
"literal string, wrap it in a 'StringIO' object."
)


@pytest.fixture
def df_schema():
return DataFrame(
Expand Down Expand Up @@ -254,7 +262,8 @@ def test_read_json_from_to_json_results(self):
"name_en": {"row_0": "Hakata Dolls Matsuo"},
}
)
result1 = pd.read_json(df.to_json())
with tm.assert_produces_warning(FutureWarning, match=generateDepMsg()):
result1 = pd.read_json(df.to_json())
result2 = DataFrame.from_dict(json.loads(df.to_json()))
tm.assert_frame_equal(result1, df)
tm.assert_frame_equal(result2, df)
Expand Down Expand Up @@ -795,7 +804,8 @@ def test_comprehensive(self):
)

out = df.to_json(orient="table")
result = pd.read_json(out, orient="table")
with tm.assert_produces_warning(FutureWarning, match=generateDepMsg()):
result = pd.read_json(out, orient="table")
tm.assert_frame_equal(df, result)

@pytest.mark.parametrize(
Expand All @@ -811,15 +821,17 @@ def test_multiindex(self, index_names):
)
df.index.names = index_names
out = df.to_json(orient="table")
result = pd.read_json(out, orient="table")
with tm.assert_produces_warning(FutureWarning, match=generateDepMsg()):
result = pd.read_json(out, orient="table")
tm.assert_frame_equal(df, result)

def test_empty_frame_roundtrip(self):
# GH 21287
df = DataFrame(columns=["a", "b", "c"])
expected = df.copy()
out = df.to_json(orient="table")
result = pd.read_json(out, orient="table")
with tm.assert_produces_warning(FutureWarning, match=generateDepMsg()):
result = pd.read_json(out, orient="table")
tm.assert_frame_equal(expected, result)

def test_read_json_orient_table_old_schema_version(self):
Expand All @@ -841,5 +853,6 @@ def test_read_json_orient_table_old_schema_version(self):
}
"""
expected = DataFrame({"a": [1, 2.0, "s"]})
result = pd.read_json(df_json, orient="table")
with tm.assert_produces_warning(FutureWarning, match=generateDepMsg()):
result = pd.read_json(df_json, orient="table")
tm.assert_frame_equal(expected, result)
14 changes: 12 additions & 2 deletions pandas/tests/io/json/test_json_table_schema_ext_dtype.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,14 @@
)


def generateDepMsg():
return (
"Passing literal json to 'read_json' is deprecated and "
"will be removed in a future version. To read from a "
"literal string, wrap it in a 'StringIO' object."
)


class TestBuildSchema:
def test_build_table_schema(self):
df = DataFrame(
Expand Down Expand Up @@ -287,7 +295,8 @@ def test_json_ext_dtype_reading_roundtrip(self):
)
expected = df.copy()
data_json = df.to_json(orient="table", indent=4)
result = read_json(data_json, orient="table")
with tm.assert_produces_warning(FutureWarning, match=generateDepMsg()):
result = read_json(data_json, orient="table")
tm.assert_frame_equal(result, expected)

def test_json_ext_dtype_reading(self):
Expand All @@ -311,6 +320,7 @@ def test_json_ext_dtype_reading(self):
}
]
}"""
result = read_json(data_json, orient="table")
with tm.assert_produces_warning(FutureWarning, match=generateDepMsg()):
result = read_json(data_json, orient="table")
expected = DataFrame({"a": Series([2, NA], dtype="Int64")})
tm.assert_frame_equal(result, expected)
Loading