DOC: clarify pitfalls of NA vs NaN in nullable floats #51383

a-reich · 2023-02-14T16:16:53Z

stopgap solution addressing BUG: hasnans not accounting for np.nan in FloatingArray #49818 and such until the discussion in API: distinguish NA vs NaN in floating dtypes #32265 is resolved
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Following up on my comment: since the current behavior easily leads to mistakes when trying to detect missing/NaN data and does not match the docs' described semantics, this PR is trying to update the docs to clarify this edge case so users can avoid the pitfalls.

Notes/questions on PR work:

There may be other places to update, but I figure once the language is approved it's easy to add it elsewhere
Should we link to the issue API: distinguish NA vs NaN in floating dtypes #32265 discussing what the behavior should be? That would enable users to weigh in with their preferences but might be too detailed/confusing.
Should the docstring describe this scenario in terms of Arrays or the corresponding dtypes?

Thanks!

a-reich · 2023-02-14T16:20:04Z

Asking @jorisvandenbossche if interested in reviewing, as suggested

jorisvandenbossche

@a-reich thanks for taking a look at this trying to clarify the docstring!
Personally, I would limit the addition more to the "facts", i.e. simply that for nullable floating dtype, only pd.NA is recognized.

pandas/core/generic.py

a-reich · 2023-02-22T03:10:14Z

Hi @jorisvandenbossche, I've tried to slim down the language to address your feedback - thanks.

However, I did realize a slight issue - I wanted to give guidance on using both Float<> and float<>[pyarrow] dtypes, since pandas is starting to support both as "nullable dtype backends". But while users can detect NaN with np.isnan for the former, it seems like there isn't a documented way for the latter (pyarrow has pc.is_nan/is_null but getting the underlying arrow array seems to require private attributes). Wanted to bring that up in case it makes sense to have pandas offer something more there. For this PR, I guess we can just only cover the Float case?

github-actions · 2023-03-25T00:05:16Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

a-reich · 2023-04-01T00:35:33Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

I am still interested in completing this! I think it needs a little bit of pandas maintainer feedback - at least, to clarify whether there’s a supported way to detect NaN values in an arrow-dtype array.

mroeschke · 2023-08-01T17:10:51Z

I think the topic of #32265 will be discussed and decided on soon which should address this so closing for now

avm19 · 2024-06-06T13:41:10Z

pandas/core/generic.py

+        stored but will **not** get mapped to True (these values can be detected
+        by :func:`numpy.isnan`).


Maybe s.map(pd.isna) or df.map(pd.isna) is a better suggestion?

avm19 · 2024-06-06T13:42:24Z

I think the topic of #32265 will be discussed and decided on soon which should address this so closing for now

@mroeschke 10 months later it is still being decided. How about re-opening this PR?

I am under impression that some sort of consensus has started to crystallise around options 2, 3, or 4 with the inclination towards the current behaviour (s.isna() returns False for NaN).

I propose to also add a clarification to "User Guide". Nothing in the documentation says explicitly that NaN is no longer treated as missing in nullable float arrays. On the contrary, examples with df.fillna(0), df.ffill() and df.bfill() clearly suggest that NaN is equivalent to NA in float64[pyarrow].

Placing the link to the aforementioned discussion in documentation seems like a good idea to me as a user.

add note to docstring re NA vs NaN

24008fa

mroeschke added Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Feb 14, 2023

jorisvandenbossche reviewed Feb 14, 2023

View reviewed changes

pandas/core/generic.py Outdated Show resolved Hide resolved

pandas/core/generic.py Outdated Show resolved Hide resolved

revise docstring per feedback

ead6e86

github-actions bot added the Stale label Mar 25, 2023

mroeschke closed this Aug 1, 2023

avm19 reviewed Jun 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: clarify pitfalls of NA vs NaN in nullable floats #51383

DOC: clarify pitfalls of NA vs NaN in nullable floats #51383

a-reich commented Feb 14, 2023 •

edited

Loading

a-reich commented Feb 14, 2023

jorisvandenbossche left a comment

a-reich commented Feb 22, 2023

github-actions bot commented Mar 25, 2023

a-reich commented Apr 1, 2023

mroeschke commented Aug 1, 2023

avm19 Jun 6, 2024

avm19 commented Jun 6, 2024

		stored but will not get mapped to True (these values can be detected
		by :func:`numpy.isnan`).

DOC: clarify pitfalls of NA vs NaN in nullable floats #51383

DOC: clarify pitfalls of NA vs NaN in nullable floats #51383

Conversation

a-reich commented Feb 14, 2023 • edited Loading

a-reich commented Feb 14, 2023

jorisvandenbossche left a comment

Choose a reason for hiding this comment

a-reich commented Feb 22, 2023

github-actions bot commented Mar 25, 2023

a-reich commented Apr 1, 2023

mroeschke commented Aug 1, 2023

avm19 Jun 6, 2024

Choose a reason for hiding this comment

avm19 commented Jun 6, 2024

a-reich commented Feb 14, 2023 •

edited

Loading