-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: pd.read_json May Not Maintain Numeric String Index #38727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: pd.read_json May Not Maintain Numeric String Index #38727
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @theoniko for the PR!
Some comments. We'll also need a whatsnew for this (1.3)
pandas/io/json/_json.py
Outdated
and obj._get_axis(axis_name).dtype == "object" | ||
and obj.index.inferred_type == "string" | ||
): | ||
index_list = list(obj._get_axis(axis_name).values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to convert to a list here? We should be able to inspect values
directly I think
pandas/io/json/_json.py
Outdated
and obj.index.inferred_type == "string" | ||
): | ||
index_list = list(obj._get_axis(axis_name).values) | ||
are_all_strings = all(type(element) == str for element in index_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have helpers got these kinds of checks in pandas/core/dtypes/common.py
, may as well use those
pandas/io/json/_json.py
Outdated
are_all_int_or_floats = all( | ||
are_int_or_floats is True for element in are_int_or_floats | ||
) | ||
if are_all_int_or_floats and are_all_strings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this ever hit? only one of those can be true at any time, no?
@@ -913,7 +933,6 @@ def _try_convert_data(self, name, data, use_dtypes=True, convert_dates=True): | |||
result = True | |||
except (TypeError, ValueError): | |||
pass | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls revert this to minimize the diff
pandas/tests/io/json/test_pandas.py
Outdated
@@ -192,11 +192,14 @@ def test_roundtrip_str_axes(self, orient, convert_axes, numpy, dtype): | |||
# to disambiguate whether those keys actually were strings or numeric | |||
# beforehand and numeric wins out. | |||
# TODO: Split should be able to support this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we delete this TODO now?
pandas/io/json/_json.py
Outdated
@@ -850,6 +850,26 @@ def _convert_axes(self): | |||
obj = self.obj | |||
assert obj is not None # for mypy | |||
for axis_name in obj._AXIS_ORDERS: | |||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
condition seems too specific. Is there a more generic way to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as i understood the minimum preconditions to keep string index should in read_json orient equal to split or index and the index of json format to be a list of int or float strings. Am i right? Should i move the checks in _try_convert_data?
Do you have a more conrete and robust suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a slightly different way. Could you please re-review?
dd46945
to
42f912e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be less specialized (as indicated by comments)
28b31fb
to
bb21639
Compare
01488e4
to
bfd8e89
Compare
pandas/tests/io/json/test_pandas.py
Outdated
expected.columns = expected.columns.astype(np.int64) | ||
expected.index = expected.index.astype(np.int64) | ||
elif orient == "records" and convert_axes: | ||
expected.columns = expected.columns.astype(np.int64) | ||
elif convert_axes and orient == "split": | ||
expected.columns = expected.columns.astype(np.int64) | ||
expected.index = expected.index.astype(str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this conversion? I think the problem is that we are converting when we don't need to
bfd8e89
to
db10b73
Compare
All requested changes were made. Please re-review the pr. |
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -267,7 +267,7 @@ I/O | |||
- Bug in :func:`to_hdf` raising ``KeyError`` when trying to apply | |||
for subclasses of ``DataFrame`` or ``Series`` (:issue:`33748`). | |||
- Bug in :func:`json_normalize` resulting in the first element of a generator object not being included in the returned ``DataFrame`` (:issue:`35923`) | |||
|
|||
- Bug in :func:`read_json` does not maintan numeric string index (:issue:`28556`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not specific enough to the actual case, pls update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/io/json/_json.py
Outdated
and self.orient == "split" | ||
and len(data) | ||
): | ||
result = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of using result at all, just return on a conversion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the condition to the end and removed the result variable.
pandas/io/json/_json.py
Outdated
|
||
if data.dtype == "object": | ||
|
||
if ( | ||
isinstance(data, Index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think you need to test if an Index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
merge master as well |
3ba5e89
to
0762111
Compare
pandas/io/json/_json.py
Outdated
except (TypeError, ValueError): | ||
pass | ||
|
||
return data, result | ||
if name == "index" and self.orient == "split" and len(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok looks a lot better. can you add a comment here and structure this more like
# if we have an index, we want to preserve dtypes
if name == "index" and len(data):
if self.orient == "split":
return data, False
return data, True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
9234406
to
2e5e743
Compare
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -272,6 +272,7 @@ I/O | |||
- Bug in :func:`json_normalize` resulting in the first element of a generator object not being included in the returned ``DataFrame`` (:issue:`35923`) | |||
- Bug in :func:`read_excel` forward filling :class:`MultiIndex` names with multiple header and index columns specified (:issue:`34673`) | |||
- :func:`pandas.read_excel` now respects :func:``pandas.set_option`` (:issue:`34252`) | |||
- Bug in :func:`read_json` when ``orient="split"`` does not maintan numeric string index (:issue:`28556`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug in :func:read_json
when orient="split"
does not maintain numeric string index (:issue:28556
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
2e5e743
to
f2a2785
Compare
lgtm pushed a small doc change and can merge on green |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff