-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: clarify pitfalls of NA vs NaN in nullable floats #51383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Asking @jorisvandenbossche if interested in reviewing, as suggested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a-reich thanks for taking a look at this trying to clarify the docstring!
Personally, I would limit the addition more to the "facts", i.e. simply that for nullable floating dtype, only pd.NA is recognized.
Hi @jorisvandenbossche, I've tried to slim down the language to address your feedback - thanks. However, I did realize a slight issue - I wanted to give guidance on using both |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
I am still interested in completing this! I think it needs a little bit of pandas maintainer feedback - at least, to clarify whether there’s a supported way to detect NaN values in an arrow-dtype array. |
I think the topic of #32265 will be discussed and decided on soon which should address this so closing for now |
stored but will **not** get mapped to True (these values can be detected | ||
by :func:`numpy.isnan`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe s.map(pd.isna)
or df.map(pd.isna)
is a better suggestion?
@mroeschke 10 months later it is still being decided. How about re-opening this PR? I am under impression that some sort of consensus has started to crystallise around options 2, 3, or 4 with the inclination towards the current behaviour ( I propose to also add a clarification to "User Guide". Nothing in the documentation says explicitly that Placing the link to the aforementioned discussion in documentation seems like a good idea to me as a user. |
hasnans
not accounting fornp.nan
inFloatingArray
#49818 and such until the discussion in API: distinguish NA vs NaN in floating dtypes #32265 is resolveddoc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Following up on my comment: since the current behavior easily leads to mistakes when trying to detect missing/NaN data and does not match the docs' described semantics, this PR is trying to update the docs to clarify this edge case so users can avoid the pitfalls.
Notes/questions on PR work:
Thanks!