-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Allow convert_dtypes to convert to pd.ArrowDtype #50094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Allow convert_dtypes to convert to pd.ArrowDtype #50094
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice @mroeschke!
I definitely think the io.nullable_backend
-> mode.nullable_backend
change makes sense given it's being generalized outside of just read_*
I/O functions.
def test_pyarrow_nullable_backend_already_pyarrow(self): | ||
pytest.importorskip("pyarrow") | ||
expected = pd.DataFrame([1, 2, 3], dtype="int64[pyarrow]") | ||
with pd.option_context("mode.nullable_backend", "pyarrow"): | ||
result = expected.convert_dtypes() | ||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does setting mode.nullable_backend
to "pyarrow"
work when the original dtypes are already numpy-backed extension dtypes? For example, something like
expected = pd.DataFrame([1, 2, 3], dtype="Int64")
with pd.option_context("mode.nullable_backend", "pyarrow"):
result = expected.convert_dtypes()
tm.assert_frame_equal(result, expected)
or adding a df.convert_dtypes()
line before with pd.option_context("mode.nullable_backend", "pyarrow"):
in test_pyarrow_nullable_backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup this should convert the numpy-backed types to pyarrow-backed types. Will add a unit test for it
@@ -979,6 +983,11 @@ def convert_dtypes( | |||
infer_objects : bool, defaults False | |||
Whether to also infer objects to float/int if possible. Is only hit if the | |||
object array contains pd.NA. | |||
nullable_backend : str, default "pandas" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this interact with the mode.nullable_backend
option? My guess is folks would expect nullable_backend
to take precedent if it's specified, but for mode.nullable_backend
to take precedent if nullable_backend
isn't given.
Here's a tiny example of the use case where I'm asking what happens:
with pd.option_context("mode.nullable_backend", "pyarrow"):
result = expected.convert_dtypes(nullable_backend="pandas")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line specifically is a private helper function that backs the public Series/DataFrame.convert_dtypes
.
The public Series/DataFrame.convert_dtypes
will not accept a keyword argument nullable_backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see -- thanks for clarifying. So the nullable_backend
config option is how public APIs toggle here. Good to know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
convert_dtypes
to convert topd.ArrowDtype
#49997 (Replace xxxx with the GitHub issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Additionally changed
io.nullable_backend
tomode.nullable_backend
to make it more general