Skip to content

ENH: Allow convert_dtypes to convert to pd.ArrowDtype #50094

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Dec 14, 2022

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Dec 6, 2022

Additionally changed io.nullable_backend to mode.nullable_backend to make it more general

@mroeschke mroeschke added Enhancement Dtype Conversions Unexpected or buggy dtype conversions Arrow pyarrow functionality labels Dec 6, 2022
Copy link
Contributor

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice @mroeschke!

I definitely think the io.nullable_backend -> mode.nullable_backend change makes sense given it's being generalized outside of just read_* I/O functions.

Comment on lines +93 to +98
def test_pyarrow_nullable_backend_already_pyarrow(self):
pytest.importorskip("pyarrow")
expected = pd.DataFrame([1, 2, 3], dtype="int64[pyarrow]")
with pd.option_context("mode.nullable_backend", "pyarrow"):
result = expected.convert_dtypes()
tm.assert_frame_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does setting mode.nullable_backend to "pyarrow" work when the original dtypes are already numpy-backed extension dtypes? For example, something like

        expected = pd.DataFrame([1, 2, 3], dtype="Int64")
        with pd.option_context("mode.nullable_backend", "pyarrow"):
            result = expected.convert_dtypes()
        tm.assert_frame_equal(result, expected)

or adding a df.convert_dtypes() line before with pd.option_context("mode.nullable_backend", "pyarrow"): in test_pyarrow_nullable_backend

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup this should convert the numpy-backed types to pyarrow-backed types. Will add a unit test for it

@@ -979,6 +983,11 @@ def convert_dtypes(
infer_objects : bool, defaults False
Whether to also infer objects to float/int if possible. Is only hit if the
object array contains pd.NA.
nullable_backend : str, default "pandas"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this interact with the mode.nullable_backend option? My guess is folks would expect nullable_backend to take precedent if it's specified, but for mode.nullable_backend to take precedent if nullable_backend isn't given.

Here's a tiny example of the use case where I'm asking what happens:

with pd.option_context("mode.nullable_backend", "pyarrow"):
    result = expected.convert_dtypes(nullable_backend="pandas")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line specifically is a private helper function that backs the public Series/DataFrame.convert_dtypes.

The public Series/DataFrame.convert_dtypes will not accept a keyword argument nullable_backend

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see -- thanks for clarifying. So the nullable_backend config option is how public APIs toggle here. Good to know.

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mroeschke mroeschke added this to the 2.0 milestone Dec 14, 2022
@mroeschke mroeschke merged commit 7b0d4dd into pandas-dev:main Dec 14, 2022
@mroeschke mroeschke deleted the enh/convert_dtypes/pyarrow branch December 14, 2022 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Dtype Conversions Unexpected or buggy dtype conversions Enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Allow convert_dtypes to convert to pd.ArrowDtype
3 participants