Skip to content

BUG: #31464 Fix error when parsing JSON list of bool into Series #33373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 8, 2020

Conversation

jessefarnham
Copy link
Contributor

@jessefarnham jessefarnham commented Apr 7, 2020

Add a missing exception type to the except clause, to cover
the TypeError that is thrown by Cythonized array_to_datetime
function when trying to convert bool to nonseconds.

…eries

Add a missing exception type to the except clause, to cover
the TypeError that is thrown by Cythonized array_to_datetime
function when trying to convert bool to nonseconds.
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I think looks pretty good

@@ -982,7 +982,7 @@ def _try_convert_to_date(self, data):
for date_unit in date_units:
try:
new_data = to_datetime(new_data, errors="raise", unit=date_unit)
except (ValueError, OverflowError):
except (ValueError, OverflowError, TypeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel any thoughts here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the usual: if we can handle this case before the try/except that would be preferable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel I’ll give that approach a try.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel Pushed an alternate approach that escapes from the _try_convert_to_date() method if the dtype is bool, avoiding the try/except but adding a conditional to the method (which I don't love, but at least the method already has dtype-based conditionals).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is ok. an alternative approach is to infer_dtype on this and then dispatch. If someone wants to try that (later PR) would be ok.

@WillAyd WillAyd added the IO JSON read_json, to_json, json_normalize label Apr 7, 2020
@@ -978,6 +978,10 @@ def _try_convert_to_date(self, data):
if not in_range.all():
return data, False

# ignore bool
if new_data.dtype == "bool":
return data, False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment inside the if # GH#33373 ignore bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Test failures appear to be unrelated; the FDIC site that the html tests use is returning intermittent 404s.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually not sure about doing it this way; we've done this type of checking in groupby and its not a great precedent. We do already catch the three errors a few lines above this - shouldn't we just do the same here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first approach seems a bit simpler to me—this seems to be a case where whoever put the TypeError into the other except clauses accidentally omitted it from the one that I originally added it to. Let me know what you and @jbrockmendel decide and I’ll be happy to update accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im happy to defer to @WillAyd in this part of the code. catching TypeError is definitely more idiomatic python, but we've definitely seen how catching too much has let bugs remain un-surfaced for a long time elsewhere

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think should catch the TypeError as done originally; would match what we have on L968 anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good -- pushed the revert back to the original change. Thanks for the quick turnaround -- let me know if anything else needed.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit otherwise lgtm

@@ -179,3 +179,9 @@ def test_readjson_unicode(monkeypatch):
result = read_json(path)
expected = pd.DataFrame({"£©µÀÆÖÞßéöÿ": ["АБВГДабвгд가"]})
tm.assert_frame_equal(result, expected)


def test_readjson_bool_series():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to test_pandas and add GH 31464 as a comment in the function body?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@WillAyd WillAyd added this to the 1.1 milestone Apr 8, 2020
@jreback jreback merged commit d857cd1 into pandas-dev:master Apr 8, 2020
@jreback
Copy link
Contributor

jreback commented Apr 8, 2020

thanks @jessefarnham very nice.

@jessefarnham jessefarnham deleted the fix-31464 branch April 8, 2020 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_json with typ="series" of json list of bools results in timestamps/Exception
4 participants