-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: avoid copies in lib.infer_dtype #45057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Bad news @mroeschke:
|
:( This looks like it occurred on Windows? The check I added doesn't work there since idk the equivalent |
wow, looks pretty good. |
pandas/_libs/lib.pyx
Outdated
@@ -1882,7 +1889,7 @@ cdef class StringValidator(Validator): | |||
|
|||
cdef bint is_valid_null(self, object value) except -1: | |||
# We deliberately exclude None / NaN here since StringArray uses NA | |||
return value is C_NA | |||
return value is C_NA or value is None or util.is_nan(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question:
If I do pd.arrays.StringArray(np.array(["a", np.nan], dtype=object))
, won't this result in a StringArray with np.nan
in it? (something like this? #30966 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't this result in a StringArray with np.nan in it?
yep. probably ought to do something about that. have something min mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to fix this in #41412. This PR is probably stuck in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lithomas1 is this un-blocked?
this WIP? |
@lithomas1 says this needs to wait on #41412 |
nice! |
In the
skipna=True
case, instead of doingvalues = values[~isnaobj(values)]
, just passskipna
to the relevant validator functions. This causes trouble because those validator functions have different rules about what NA values they allow (analogous to is_valid_na_for_dtype).I'm down to 1 test case failing locally, fixing it will require changing the StringArray constructor to allow None and np.nan (xref #40839) in addition to just pd.NA (AFAICT pd.array and pd.Series(... dtype="string") will not be affected). I'm fine with that, but someone more involved with the string code might want to weigh in cc @simonjayhawkins