Skip to content

BUG: Parse missing values using read_json with dtype=False to NaN instead of None (GH28501) #37834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 20, 2020

Conversation

avinashpancham
Copy link
Contributor

@jreback jreback added the IO JSON read_json, to_json, json_normalize label Nov 14, 2020
@jreback jreback added this to the 1.2 milestone Nov 14, 2020
@jreback jreback added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Nov 14, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep this is way more consistent, thanks @avinashpancham
cc @WillAyd merge when ready

@WillAyd
Copy link
Member

WillAyd commented Nov 15, 2020

Nice PR! Looks like there is a merge conflict in the whatsnew - if you can fix that up we can get this merged

@avinashpancham
Copy link
Contributor Author

Merged master, but CI fails due to unrelated reasons. Will retry with new updates from master later today

@jreback
Copy link
Contributor

jreback commented Nov 15, 2020

@avinashpancham can you merge master once again

@@ -624,6 +624,7 @@ I/O
- Bug in :func:`read_html` was raising a ``TypeError`` when supplying a ``pathlib.Path`` argument to the ``io`` parameter (:issue:`37705`)
- :meth:`to_excel` and :meth:`to_markdown` support writing to fsspec URLs such as S3 and Google Cloud Storage (:issue:`33987`)
- Bug in :meth:`read_fw` was not skipping blank lines (even with ``skip_blank_lines=True``) (:issue:`37758`)
- Parse missing values using :func:`read_json` with ``dtype=False`` to NaN instead of None (:issue:`28501`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double backticks around NaN and None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@avinashpancham
Copy link
Contributor Author

@jreback merged master and updated whatsnew entry

@jreback
Copy link
Contributor

jreback commented Nov 18, 2020

hmm precommit checks are failing, can you merge master and ping on green

@avinashpancham
Copy link
Contributor Author

Still some issues, will try again tomorrow

@jreback jreback merged commit 5c35871 into pandas-dev:master Nov 20, 2020
@jreback
Copy link
Contributor

jreback commented Nov 20, 2020

thanks @avinashpancham

@simonjayhawkins
Copy link
Member

I'm not sure that there was consensus in the issue discussion on the correct behaviour or whether it is a bug. Maybe worth considering moving the release note into the breaking changes section in case anyone is depending on this behaviour.

@jreback
Copy link
Contributor

jreback commented Nov 20, 2020

I'm not sure that there was consensus in the issue discussion on the correct behaviour or whether it is a bug. Maybe worth considering moving the release note into the breaking changes section in case anyone is depending on this behaviour.

type dtype option is actually pretty odd anyhow (and should deprecate)

@jorisvandenbossche
Copy link
Member

I think most voices in the issue actually argued that the current behaviour was correct, and so IMO we should not have changed it to start with.

@jreback
Copy link
Contributor

jreback commented Nov 20, 2020

there was exactly 1 comment

@jorisvandenbossche
Copy link
Member

Will opened the issue, and 2 people commented on it arguing not to change it (and a third one commented as well, but off-topic), no one else argued in favor of the change.

Anyway, I don't think counting the numbers is that important, but I agree with @simonjayhawkins that there was certainly no clear consensus

@jorisvandenbossche
Copy link
Member

On the None vs NaN, I am personally quite ambivalent, but when disabling parsing I think we should not return float dtype for this case.

@jreback
Copy link
Contributor

jreback commented Nov 20, 2020

you can revert if u want
not that important

this was removing a special case though - it was very odd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_json with dtype=False infers Missing Values as None
5 participants