-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
String dtype: still return nullable NA-variant in object inference (maybe_converts_object
) if requested
#59487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String dtype: still return nullable NA-variant in object inference (maybe_converts_object
) if requested
#59487
Conversation
if using_string_dtype() and is_string_array(objects, skipna=True): | ||
if convert_to_nullable_dtype and is_string_array(objects, skipna=True): | ||
from pandas.core.arrays.string_ import StringDtype | ||
|
||
dtype = StringDtype(na_value=np.nan) | ||
dtype = StringDtype() | ||
return dtype.construct_array_type()._from_sequence(objects, dtype=dtype) | ||
|
||
elif convert_to_nullable_dtype and is_string_array(objects, skipna=True): | ||
elif using_string_dtype() and is_string_array(objects, skipna=True): | ||
from pandas.core.arrays.string_ import StringDtype | ||
|
||
dtype = StringDtype() | ||
dtype = StringDtype(na_value=np.nan) | ||
return dtype.construct_array_type()._from_sequence(objects, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially this diff is just switching the order of the if
/elif
blocks, to first check if we want a nullable dtype and only then check if we want the future default string dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
pandas/core/arrays/datetimelike.py
Outdated
value = sanitize_array(value, index=None) | ||
value = ensure_wrapped_if_datetimelike(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is essentially the default input sanitation/inference we do for the Series constructor (and other places), so switched to use that here instead of pd.array(..)
(which doesn't use the default dtypes for all types)
(this mostly impacts the name of the type in the error message)
maybe_converts_object
) if requested
I think generally should just stick with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @jorisvandenbossche |
…maybe_converts_object`) if requested (pandas-dev#59487) * String dtype: maybe_converts_object give precedence to nullable dtype * update datetimelike input validation * update tests and remove xfails * explicitly test pd.array() behaviour (remove xfail) * fixup allow_2d * undo changes related to datetimelike input validation * fix test for str on current main --------- Co-authored-by: Matthew Roeschke <[email protected]> (cherry picked from commit 851639d)
…maybe_converts_object`) if requested (pandas-dev#59487) * String dtype: maybe_converts_object give precedence to nullable dtype * update datetimelike input validation * update tests and remove xfails * explicitly test pd.array() behaviour (remove xfail) * fixup allow_2d * undo changes related to datetimelike input validation * fix test for str on current main --------- Co-authored-by: Matthew Roeschke <[email protected]>
…maybe_converts_object`) if requested (pandas-dev#59487) * String dtype: maybe_converts_object give precedence to nullable dtype * update datetimelike input validation * update tests and remove xfails * explicitly test pd.array() behaviour (remove xfail) * fixup allow_2d * undo changes related to datetimelike input validation * fix test for str on current main --------- Co-authored-by: Matthew Roeschke <[email protected]>
…maybe_converts_object`) if requested (pandas-dev#59487) * String dtype: maybe_converts_object give precedence to nullable dtype * update datetimelike input validation * update tests and remove xfails * explicitly test pd.array() behaviour (remove xfail) * fixup allow_2d * undo changes related to datetimelike input validation * fix test for str on current main --------- Co-authored-by: Matthew Roeschke <[email protected]>
…maybe_converts_object`) if requested (pandas-dev#59487) * String dtype: maybe_converts_object give precedence to nullable dtype * update datetimelike input validation * update tests and remove xfails * explicitly test pd.array() behaviour (remove xfail) * fixup allow_2d * undo changes related to datetimelike input validation * fix test for str on current main --------- Co-authored-by: Matthew Roeschke <[email protected]>
…maybe_converts_object`) if requested (pandas-dev#59487) * String dtype: maybe_converts_object give precedence to nullable dtype * update datetimelike input validation * update tests and remove xfails * explicitly test pd.array() behaviour (remove xfail) * fixup allow_2d * undo changes related to datetimelike input validation * fix test for str on current main --------- Co-authored-by: Matthew Roeschke <[email protected]>
…maybe_converts_object`) if requested (#59487) * String dtype: maybe_converts_object give precedence to nullable dtype * update datetimelike input validation * update tests and remove xfails * explicitly test pd.array() behaviour (remove xfail) * fixup allow_2d * undo changes related to datetimelike input validation * fix test for str on current main --------- Co-authored-by: Matthew Roeschke <[email protected]>
The
lib.maybe_converts_object()
function (which is used for object dtype inference in various places) has aconvert_to_nullable_dtype=Flase/True
keyword to control whether to return dtypes using NA or not. Currently we were giving the futureinfer_string
option priority. But also when the future dtype is enabled, we should still return an NA-variant of the dtype if explicitly asked for. So this PR switches the order of those two checks to let theconvert_to_nullable_dtype
keyword take precedence over the option.This came up in SQL tests, but actually this also impacts the
pd.array(..)
behaviour. Currently on main,pd.array(["some", "strings"])
will infer the nullableStringDtype()
, consistent with the other dtypes that have such a nullable variant (i.e. returning our Int64 instead of np.int64 for integers, etc).This is tested behaviour, but we were still xfailing those tests when
future.infer_string
was enabled. With the change in this PR, also in the futurepd.array(..)
will infer the NA-variant.(sidenote, we should update this simple
convert_to_nullable_dtype
True/False flag to something that allows passing whether we want a numpy_nullable or pyarrow array, but that's for a separate issue)