-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: #31464 Fix error when parsing JSON list of bool into Series #33373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…eries Add a missing exception type to the except clause, to cover the TypeError that is thrown by Cythonized array_to_datetime function when trying to convert bool to nonseconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I think looks pretty good
@@ -982,7 +982,7 @@ def _try_convert_to_date(self, data): | |||
for date_unit in date_units: | |||
try: | |||
new_data = to_datetime(new_data, errors="raise", unit=date_unit) | |||
except (ValueError, OverflowError): | |||
except (ValueError, OverflowError, TypeError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel any thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the usual: if we can handle this case before the try/except that would be preferable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel I’ll give that approach a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel Pushed an alternate approach that escapes from the _try_convert_to_date() method if the dtype is bool, avoiding the try/except but adding a conditional to the method (which I don't love, but at least the method already has dtype-based conditionals).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this is ok. an alternative approach is to infer_dtype
on this and then dispatch. If someone wants to try that (later PR) would be ok.
pandas/io/json/_json.py
Outdated
@@ -978,6 +978,10 @@ def _try_convert_to_date(self, data): | |||
if not in_range.all(): | |||
return data, False | |||
|
|||
# ignore bool | |||
if new_data.dtype == "bool": | |||
return data, False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment inside the if # GH#33373 ignore bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Test failures appear to be unrelated; the FDIC site that the html tests use is returning intermittent 404s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm actually not sure about doing it this way; we've done this type of checking in groupby and its not a great precedent. We do already catch the three errors a few lines above this - shouldn't we just do the same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first approach seems a bit simpler to me—this seems to be a case where whoever put the TypeError into the other except clauses accidentally omitted it from the one that I originally added it to. Let me know what you and @jbrockmendel decide and I’ll be happy to update accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im happy to defer to @WillAyd in this part of the code. catching TypeError is definitely more idiomatic python, but we've definitely seen how catching too much has let bugs remain un-surfaced for a long time elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think should catch the TypeError as done originally; would match what we have on L968 anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good -- pushed the revert back to the original change. Thanks for the quick turnaround -- let me know if anything else needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit otherwise lgtm
@@ -179,3 +179,9 @@ def test_readjson_unicode(monkeypatch): | |||
result = read_json(path) | |||
expected = pd.DataFrame({"£©µÀÆÖÞßéöÿ": ["АБВГДабвгд가"]}) | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
|
|||
def test_readjson_bool_series(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to test_pandas
and add GH 31464
as a comment in the function body?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
thanks @jessefarnham very nice. |
Add a missing exception type to the except clause, to cover
the TypeError that is thrown by Cythonized array_to_datetime
function when trying to convert bool to nonseconds.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff