Skip to content

Commit 1efeb2b

Browse files
rmhowe425im-vinicius
authored and
im-vinicius
committed
DEPR: Deprecate literal json string input to read_json (pandas-dev#53409)
* Adding logic to throw a deprecation warning when a literal json string is passed to read_json * Adding logic to throw a deprecation warning when a literal json string is passed to read_json * Updating documentation and adding PR num to unit test * Adding a deprecation warning to the user guide * Updating unit tests to check for FutureWarning * Fixing unit tests * Fixing unit tests * Fixing unit tests * Fixing unit tests * Fixing documentation errors in PR feedback * Fixing documentation errors in PR feedback * Updating unit tests to use StringIO rather than catch FutureWarning * Finishing updating unit tests to use StringIO rather than catch FutureWarning * Fixing indendation errors in unit tests. Moved one unit test to another file. * Updating unit test name * Adding additional checks to unit tests * Fixing unit tests * Fixing unit tests * Updating whatsnew documentation per reviewer recommendations. * Fixing failing code tests * Fixing failing code tests * Adding import to doc string example * Fixing documentation formatting error * Fixing documentation formatting error * Fixing documentation error after fixing merge conflict * Fixing formatting errors in whatsnew file * Updating formatting errors in documentation * Updating formatting errors in documentation
1 parent 36c2a48 commit 1efeb2b

File tree

10 files changed

+248
-125
lines changed

10 files changed

+248
-125
lines changed

doc/source/user_guide/io.rst

+10-6
Original file line numberDiff line numberDiff line change
@@ -2111,7 +2111,8 @@ Reading from a JSON string:
21112111

21122112
.. ipython:: python
21132113
2114-
pd.read_json(json)
2114+
from io import StringIO
2115+
pd.read_json(StringIO(json))
21152116
21162117
Reading from a file:
21172118

@@ -2135,6 +2136,7 @@ Preserve string indices:
21352136

21362137
.. ipython:: python
21372138
2139+
from io import StringIO
21382140
si = pd.DataFrame(
21392141
np.zeros((4, 4)), columns=list(range(4)), index=[str(i) for i in range(4)]
21402142
)
@@ -2143,7 +2145,7 @@ Preserve string indices:
21432145
si.columns
21442146
json = si.to_json()
21452147
2146-
sij = pd.read_json(json, convert_axes=False)
2148+
sij = pd.read_json(StringIO(json), convert_axes=False)
21472149
sij
21482150
sij.index
21492151
sij.columns
@@ -2152,18 +2154,19 @@ Dates written in nanoseconds need to be read back in nanoseconds:
21522154

21532155
.. ipython:: python
21542156
2157+
from io import StringIO
21552158
json = dfj2.to_json(date_unit="ns")
21562159
21572160
# Try to parse timestamps as milliseconds -> Won't Work
2158-
dfju = pd.read_json(json, date_unit="ms")
2161+
dfju = pd.read_json(StringIO(json), date_unit="ms")
21592162
dfju
21602163
21612164
# Let pandas detect the correct precision
2162-
dfju = pd.read_json(json)
2165+
dfju = pd.read_json(StringIO(json))
21632166
dfju
21642167
21652168
# Or specify that all timestamps are in nanoseconds
2166-
dfju = pd.read_json(json, date_unit="ns")
2169+
dfju = pd.read_json(StringIO(json), date_unit="ns")
21672170
dfju
21682171
21692172
By setting the ``dtype_backend`` argument you can control the default dtypes used for the resulting DataFrame.
@@ -2251,11 +2254,12 @@ For line-delimited json files, pandas can also return an iterator which reads in
22512254

22522255
.. ipython:: python
22532256
2257+
from io import StringIO
22542258
jsonl = """
22552259
{"a": 1, "b": 2}
22562260
{"a": 3, "b": 4}
22572261
"""
2258-
df = pd.read_json(jsonl, lines=True)
2262+
df = pd.read_json(StringIO(jsonl), lines=True)
22592263
df
22602264
df.to_json(orient="records", lines=True)
22612265

doc/source/whatsnew/v1.5.0.rst

+8-5
Original file line numberDiff line numberDiff line change
@@ -474,19 +474,22 @@ upon serialization. (Related issue :issue:`12997`)
474474
475475
.. code-block:: ipython
476476
477-
In [4]: a.to_json(date_format='iso')
478-
Out[4]: '{"2020-12-28T00:00:00.000Z":0,"2020-12-28T01:00:00.000Z":1,"2020-12-28T02:00:00.000Z":2}'
477+
In [4]: from io import StringIO
479478
480-
In [5]: pd.read_json(a.to_json(date_format='iso'), typ="series").index == a.index
481-
Out[5]: array([False, False, False])
479+
In [5]: a.to_json(date_format='iso')
480+
Out[5]: '{"2020-12-28T00:00:00.000Z":0,"2020-12-28T01:00:00.000Z":1,"2020-12-28T02:00:00.000Z":2}'
481+
482+
In [6]: pd.read_json(StringIO(a.to_json(date_format='iso')), typ="series").index == a.index
483+
Out[6]: array([False, False, False])
482484
483485
*New Behavior*
484486

485487
.. ipython:: python
486488
489+
from io import StringIO
487490
a.to_json(date_format='iso')
488491
# Roundtripping now works
489-
pd.read_json(a.to_json(date_format='iso'), typ="series").index == a.index
492+
pd.read_json(StringIO(a.to_json(date_format='iso')), typ="series").index == a.index
490493
491494
.. _whatsnew_150.notable_bug_fixes.groupby_value_counts_categorical:
492495

doc/source/whatsnew/v2.1.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -293,6 +293,7 @@ Deprecations
293293
- Deprecated behavior of :func:`assert_series_equal` and :func:`assert_frame_equal` considering NA-like values (e.g. ``NaN`` vs ``None`` as equivalent) (:issue:`52081`)
294294
- Deprecated constructing :class:`SparseArray` from scalar data, pass a sequence instead (:issue:`53039`)
295295
- Deprecated falling back to filling when ``value`` is not specified in :meth:`DataFrame.replace` and :meth:`Series.replace` with non-dict-like ``to_replace`` (:issue:`33302`)
296+
- Deprecated literal json input to :func:`read_json`. Wrap literal json string input in ``io.StringIO`` instead. (:issue:`53409`)
296297
- Deprecated option "mode.use_inf_as_na", convert inf entries to ``NaN`` before instead (:issue:`51684`)
297298
- Deprecated parameter ``obj`` in :meth:`GroupBy.get_group` (:issue:`53545`)
298299
- Deprecated positional indexing on :class:`Series` with :meth:`Series.__getitem__` and :meth:`Series.__setitem__`, in a future version ``ser[item]`` will *always* interpret ``item`` as a label, not a position (:issue:`50617`)

pandas/io/json/_json.py

+30-4
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
TypeVar,
1919
overload,
2020
)
21+
import warnings
2122

2223
import numpy as np
2324

@@ -30,6 +31,7 @@
3031
from pandas.compat._optional import import_optional_dependency
3132
from pandas.errors import AbstractMethodError
3233
from pandas.util._decorators import doc
34+
from pandas.util._exceptions import find_stack_level
3335
from pandas.util._validators import check_dtype_backend
3436

3537
from pandas.core.dtypes.common import ensure_str
@@ -535,6 +537,10 @@ def read_json(
535537
By file-like object, we refer to objects with a ``read()`` method,
536538
such as a file handle (e.g. via builtin ``open`` function)
537539
or ``StringIO``.
540+
541+
.. deprecated:: 2.1.0
542+
Passing json literal strings is deprecated.
543+
538544
orient : str, optional
539545
Indication of expected JSON string format.
540546
Compatible JSON strings can be produced by ``to_json()`` with a
@@ -695,6 +701,7 @@ def read_json(
695701
696702
Examples
697703
--------
704+
>>> from io import StringIO
698705
>>> df = pd.DataFrame([['a', 'b'], ['c', 'd']],
699706
... index=['row 1', 'row 2'],
700707
... columns=['col 1', 'col 2'])
@@ -709,7 +716,7 @@ def read_json(
709716
"data":[["a","b"],["c","d"]]\
710717
}}\
711718
'
712-
>>> pd.read_json(_, orient='split')
719+
>>> pd.read_json(StringIO(_), orient='split')
713720
col 1 col 2
714721
row 1 a b
715722
row 2 c d
@@ -719,7 +726,7 @@ def read_json(
719726
>>> df.to_json(orient='index')
720727
'{{"row 1":{{"col 1":"a","col 2":"b"}},"row 2":{{"col 1":"c","col 2":"d"}}}}'
721728
722-
>>> pd.read_json(_, orient='index')
729+
>>> pd.read_json(StringIO(_), orient='index')
723730
col 1 col 2
724731
row 1 a b
725732
row 2 c d
@@ -729,7 +736,7 @@ def read_json(
729736
730737
>>> df.to_json(orient='records')
731738
'[{{"col 1":"a","col 2":"b"}},{{"col 1":"c","col 2":"d"}}]'
732-
>>> pd.read_json(_, orient='records')
739+
>>> pd.read_json(StringIO(_), orient='records')
733740
col 1 col 2
734741
0 a b
735742
1 c d
@@ -860,6 +867,18 @@ def __init__(
860867
self.nrows = validate_integer("nrows", self.nrows, 0)
861868
if not self.lines:
862869
raise ValueError("nrows can only be passed if lines=True")
870+
if (
871+
isinstance(filepath_or_buffer, str)
872+
and not self.lines
873+
and "\n" in filepath_or_buffer
874+
):
875+
warnings.warn(
876+
"Passing literal json to 'read_json' is deprecated and "
877+
"will be removed in a future version. To read from a "
878+
"literal string, wrap it in a 'StringIO' object.",
879+
FutureWarning,
880+
stacklevel=find_stack_level(),
881+
)
863882
if self.engine == "pyarrow":
864883
if not self.lines:
865884
raise ValueError(
@@ -925,7 +944,14 @@ def _get_data_from_filepath(self, filepath_or_buffer):
925944
and not file_exists(filepath_or_buffer)
926945
):
927946
raise FileNotFoundError(f"File {filepath_or_buffer} does not exist")
928-
947+
else:
948+
warnings.warn(
949+
"Passing literal json to 'read_json' is deprecated and "
950+
"will be removed in a future version. To read from a "
951+
"literal string, wrap it in a 'StringIO' object.",
952+
FutureWarning,
953+
stacklevel=find_stack_level(),
954+
)
929955
return filepath_or_buffer
930956

931957
def _combine_lines(self, lines) -> str:

pandas/tests/io/json/test_compression.py

+10-7
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
1-
from io import BytesIO
1+
from io import (
2+
BytesIO,
3+
StringIO,
4+
)
25

36
import pytest
47

@@ -22,7 +25,8 @@ def test_compression_roundtrip(compression):
2225
# explicitly ensure file was compressed.
2326
with tm.decompress_file(path, compression) as fh:
2427
result = fh.read().decode("utf8")
25-
tm.assert_frame_equal(df, pd.read_json(result))
28+
data = StringIO(result)
29+
tm.assert_frame_equal(df, pd.read_json(data))
2630

2731

2832
def test_read_zipped_json(datapath):
@@ -39,8 +43,7 @@ def test_read_zipped_json(datapath):
3943
@pytest.mark.single_cpu
4044
def test_with_s3_url(compression, s3_resource, s3so):
4145
# Bucket "pandas-test" created in tests/io/conftest.py
42-
43-
df = pd.read_json('{"a": [1, 2, 3], "b": [4, 5, 6]}')
46+
df = pd.read_json(StringIO('{"a": [1, 2, 3], "b": [4, 5, 6]}'))
4447

4548
with tm.ensure_clean() as path:
4649
df.to_json(path, compression=compression)
@@ -55,15 +58,15 @@ def test_with_s3_url(compression, s3_resource, s3so):
5558

5659
def test_lines_with_compression(compression):
5760
with tm.ensure_clean() as path:
58-
df = pd.read_json('{"a": [1, 2, 3], "b": [4, 5, 6]}')
61+
df = pd.read_json(StringIO('{"a": [1, 2, 3], "b": [4, 5, 6]}'))
5962
df.to_json(path, orient="records", lines=True, compression=compression)
6063
roundtripped_df = pd.read_json(path, lines=True, compression=compression)
6164
tm.assert_frame_equal(df, roundtripped_df)
6265

6366

6467
def test_chunksize_with_compression(compression):
6568
with tm.ensure_clean() as path:
66-
df = pd.read_json('{"a": ["foo", "bar", "baz"], "b": [4, 5, 6]}')
69+
df = pd.read_json(StringIO('{"a": ["foo", "bar", "baz"], "b": [4, 5, 6]}'))
6770
df.to_json(path, orient="records", lines=True, compression=compression)
6871

6972
with pd.read_json(
@@ -74,7 +77,7 @@ def test_chunksize_with_compression(compression):
7477

7578

7679
def test_write_unsupported_compression_type():
77-
df = pd.read_json('{"a": [1, 2, 3], "b": [4, 5, 6]}')
80+
df = pd.read_json(StringIO('{"a": [1, 2, 3], "b": [4, 5, 6]}'))
7881
with tm.ensure_clean() as path:
7982
msg = "Unrecognized compression type: unsupported"
8083
with pytest.raises(ValueError, match=msg):
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
"""
22
Tests for the deprecated keyword arguments for `read_json`.
33
"""
4+
from io import StringIO
45

56
import pandas as pd
67
import pandas._testing as tm
@@ -10,9 +11,11 @@
1011

1112
def test_good_kwargs():
1213
df = pd.DataFrame({"A": [2, 4, 6], "B": [3, 6, 9]}, index=[0, 1, 2])
14+
1315
with tm.assert_produces_warning(None):
14-
tm.assert_frame_equal(df, read_json(df.to_json(orient="split"), orient="split"))
15-
tm.assert_frame_equal(
16-
df, read_json(df.to_json(orient="columns"), orient="columns")
17-
)
18-
tm.assert_frame_equal(df, read_json(df.to_json(orient="index"), orient="index"))
16+
data1 = StringIO(df.to_json(orient="split"))
17+
tm.assert_frame_equal(df, read_json(data1, orient="split"))
18+
data2 = StringIO(df.to_json(orient="columns"))
19+
tm.assert_frame_equal(df, read_json(data2, orient="columns"))
20+
data3 = StringIO(df.to_json(orient="index"))
21+
tm.assert_frame_equal(df, read_json(data3, orient="index"))

pandas/tests/io/json/test_json_table_schema.py

+7-5
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
"""Tests for Table Schema integration."""
22
from collections import OrderedDict
3+
from io import StringIO
34
import json
45

56
import numpy as np
@@ -254,7 +255,8 @@ def test_read_json_from_to_json_results(self):
254255
"name_en": {"row_0": "Hakata Dolls Matsuo"},
255256
}
256257
)
257-
result1 = pd.read_json(df.to_json())
258+
259+
result1 = pd.read_json(StringIO(df.to_json()))
258260
result2 = DataFrame.from_dict(json.loads(df.to_json()))
259261
tm.assert_frame_equal(result1, df)
260262
tm.assert_frame_equal(result2, df)
@@ -794,7 +796,7 @@ def test_comprehensive(self):
794796
index=pd.Index(range(4), name="idx"),
795797
)
796798

797-
out = df.to_json(orient="table")
799+
out = StringIO(df.to_json(orient="table"))
798800
result = pd.read_json(out, orient="table")
799801
tm.assert_frame_equal(df, result)
800802

@@ -810,15 +812,15 @@ def test_multiindex(self, index_names):
810812
columns=["Aussprache", "Griechisch", "Args"],
811813
)
812814
df.index.names = index_names
813-
out = df.to_json(orient="table")
815+
out = StringIO(df.to_json(orient="table"))
814816
result = pd.read_json(out, orient="table")
815817
tm.assert_frame_equal(df, result)
816818

817819
def test_empty_frame_roundtrip(self):
818820
# GH 21287
819821
df = DataFrame(columns=["a", "b", "c"])
820822
expected = df.copy()
821-
out = df.to_json(orient="table")
823+
out = StringIO(df.to_json(orient="table"))
822824
result = pd.read_json(out, orient="table")
823825
tm.assert_frame_equal(expected, result)
824826

@@ -841,5 +843,5 @@ def test_read_json_orient_table_old_schema_version(self):
841843
}
842844
"""
843845
expected = DataFrame({"a": [1, 2.0, "s"]})
844-
result = pd.read_json(df_json, orient="table")
846+
result = pd.read_json(StringIO(df_json), orient="table")
845847
tm.assert_frame_equal(expected, result)

pandas/tests/io/json/test_json_table_schema_ext_dtype.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from collections import OrderedDict
44
import datetime as dt
55
import decimal
6+
from io import StringIO
67
import json
78

89
import pytest
@@ -287,7 +288,7 @@ def test_json_ext_dtype_reading_roundtrip(self):
287288
)
288289
expected = df.copy()
289290
data_json = df.to_json(orient="table", indent=4)
290-
result = read_json(data_json, orient="table")
291+
result = read_json(StringIO(data_json), orient="table")
291292
tm.assert_frame_equal(result, expected)
292293

293294
def test_json_ext_dtype_reading(self):
@@ -311,6 +312,6 @@ def test_json_ext_dtype_reading(self):
311312
}
312313
]
313314
}"""
314-
result = read_json(data_json, orient="table")
315+
result = read_json(StringIO(data_json), orient="table")
315316
expected = DataFrame({"a": Series([2, NA], dtype="Int64")})
316317
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)