-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Implement pd.NA checking function #33490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isna already does this
What I was thinking here was a way to get a mask for only pd.NA (so not including np.NaN, None or other missing value indicators like isna does). The main motivation was trying to think of a way to fix #31990 without writing the equivalent of a Python for loop to do this check |
Py_ssize_t i, n | ||
ndarray[uint8_t] result | ||
|
||
assert arr.ndim == 1, "'arr' must be 1-D." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a big deal, but is this restriction necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not, the main application I had in mind was when operating on single columns though
related helper function ive wanted is |
Can't you use the mask that the StringDtype has to check this? |
Closing since it looks like others aren't in favor of the idea. I do think some kind of "dtype agnostic" way of checking for NA could be helpful later |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This may already exist in some form, but if not I think it could be useful to have a fast function for checking specifically for the presence of pd.NA (e.g. if we're looking at an arbitrary array and don't know if we'll have access to a _mask attribute, or if we're handling some other data type that uses pd.NA for missing values but doesn't have this attribute).