-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add nullable keyword to read_sql #50048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ts_nullable_boolean # Conflicts: # pandas/_libs/lib.pyx
I think makes sense and like your vision on the interaction with Cython. Is there any performance difference here moving away from the |
Also discussed this with @mroeschke in another pr, he leaned more towards keeping maybe_convert_objects as is. Maybe a bit faster? The previous implementation was running through the same function that we are calling directly now. Definitively no slowdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this as is - @mroeschke
I think in a future PR it might make sense to do this; I didn't realized this was the only location where this is used. I guess the only slight downside is run time importing the Arrays in the Cython code? |
Thanks @phofl |
|
||
if dtype is None: | ||
if arr.dtype == np.dtype("O"): | ||
# i.e. maybe_convert_objects didn't convert | ||
arr = maybe_infer_to_datetimelike(arr) | ||
if use_nullable_dtypes and arr.dtype == np.dtype("O"): | ||
arr = StringDtype().construct_array_type()._from_sequence(arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems weird to me. why are we casting potentially non-strings to strings?
arr = IntegerArray(arr, np.zeros(arr.shape, dtype=np.bool_)) | ||
elif is_bool_dtype(arr.dtype): | ||
arr = BooleanArray(arr, np.zeros(arr.shape, dtype=np.bool_)) | ||
elif is_float_dtype(arr.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could re-use pd.array for L1011-L1016?
dtype=None, | ||
coerce_float=coerce_float, | ||
use_nullable_dtypes=use_nullable_dtypes, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this has gotten some Arrow-specific logic added here in the interim. can we move the backend-specific stuff from core.internals here?
Sits on top of #50047
Functionality wise, this should work now.
More broadly, I am not sure that this is the best approach we could take here. Since the
convert_to_nullable_type
inlib.maybe_convert_objects
is not used right now except here, we could also make this strict and return the appropriate Array from the Cython code, not only when nulls are present. This would avoid the re-cast in the non-cython code part.