-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API/BUG: pd.array(float_ndarray_with_nans) casts np.nan to pd.NA #39039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
setting |
Docs sound like this is intended? https://pandas.pydata.org/docs/reference/api/pandas.array.html
|
@jorisvandenbossche is the one to ask on this |
Yes, this is not a bug but intentional behaviour. See the 4th bullet points in the top post of the PR implementing it (#34307), but this just mentions it. Some actual discussion about is scattered through the issue on this: #32265 The main reason is that any code that currently deals with numpy arrays and pandas (also internally in pandas) assumes that NaNs are treated as missing data. So for compatibility with this (preserving the "missing" semantics), we convert any NaN to NA by default when converting numpy array to a masked/nullable array. Now, this certainly is a topic that still needs more attention (eg adding an option to be able to preserve NaNs on input instead of converting to NA, we still need to discuss how to treat NaNs that get introduced by computations, default conversion to numpy, ..) |
Thanks @jorisvandenbossche. Is there (supposed to be?) a way to actually set |
duplicate of/covered by #32265 |
Does look similar to #32265, continued discussion can happen on that issue |
I was under the impression we meant to keep these as distinct concepts
The text was updated successfully, but these errors were encountered: