ENH: Implement io.nullable_backend config for read_parquet #49039

mroeschke · 2022-10-11T00:42:33Z

xref ENH: Add global option io.nullable_type="pandas"|"pyarrow" to control IO reader use_nullable_dtype #48957 (Replace xxxx with the Github issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

…kend

phofl · 2022-10-11T20:35:14Z

doc/source/whatsnew/v1.6.0.rst

@@ -33,6 +33,7 @@ Other enhancements
 - :meth:`Series.add_suffix`, :meth:`DataFrame.add_suffix`, :meth:`Series.add_prefix` and :meth:`DataFrame.add_prefix` support an ``axis`` argument. If ``axis`` is set, the default behaviour of which axis to consider can be overwritten (:issue:`47819`)
 - :func:`assert_frame_equal` now shows the first element where the DataFrames differ, analogously to ``pytest``'s output (:issue:`47910`)
 - Added new argument ``use_nullable_dtypes`` to :func:`read_csv` to enable automatic conversion to nullable dtypes (:issue:`36712`)
+- Added new global configuration, ``io.nullable_backend`` to allow ``use_nullable_dtypes=True`` to return pyarrow-backed dtypes when set to ``"pyarrow"`` in :func:`read_parquet` (:issue:`48957`)


Should engine default to arrow if the option Is set to arrow?

I'd be a little hesitant to override a user specifying pd.read_*(..., engine="not-arrow") and this option forcing arrow to be used instead

Agreed, but we could make engine=NoDefault and only override if not given.

But I agree that we should consider this carefully

Fair. Another consideration is that not all IO methods will have associated arrow parsers (immediately). e.g. read_html

…kend

phofl · 2022-10-13T22:23:44Z

pandas/tests/io/test_parquet.py

@@ -1021,6 +1021,43 @@ def test_read_parquet_manager(self, pa, using_array_manager):
        else:
            assert isinstance(result._mgr, pd.core.internals.BlockManager)

+    def test_read_use_nullable_types_pyarrow_config(self, pa, df_full):


This is failing in the min versions build

Thanks, fixed

…kend

mroeschke · 2022-10-20T18:12:00Z

@phofl any follow-ups here?

phofl · 2022-10-21T21:46:55Z

pandas/tests/io/test_parquet.py

@@ -1021,6 +1021,46 @@ def test_read_parquet_manager(self, pa, using_array_manager):
        else:
            assert isinstance(result._mgr, pd.core.internals.BlockManager)

+    @pytest.mark.xfail(
+        pa_version_under2p0, reason="Timezone conversion incorrect for pyarrow < 2.0"


Can remove this

…kend

phofl · 2022-10-22T00:39:16Z

thx @mroeschke

…v#49039)

ENH: Implement io.nullable_backend config for read_parquet

caf68cf

mroeschke added IO Data IO issues that don't fit into a more specific label IO Parquet parquet, feather Arrow pyarrow functionality labels Oct 11, 2022

mroeschke added this to the 1.6 milestone Oct 11, 2022

mroeschke added 2 commits October 11, 2022 10:23

Merge remote-tracking branch 'upstream/main' into enh/io/nullable_bac…

e02411c

…kend

add whatsnew

368ef09

phofl reviewed Oct 11, 2022

View reviewed changes

mroeschke added 2 commits October 11, 2022 16:00

Merge remote-tracking branch 'upstream/main' into enh/io/nullable_bac…

7a89bdb

…kend

Merge remote-tracking branch 'upstream/main' into enh/io/nullable_bac…

9f49203

…kend

mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022

Merge remote-tracking branch 'upstream/main' into enh/io/nullable_bac…

2a1e8ba

…kend

phofl reviewed Oct 13, 2022

View reviewed changes

mroeschke added 2 commits October 13, 2022 17:19

Merge remote-tracking branch 'upstream/main' into enh/io/nullable_bac…

7695d49

…kend

xfail for min version

df31bb1

phofl reviewed Oct 21, 2022

View reviewed changes

mroeschke added 2 commits October 21, 2022 15:05

Merge remote-tracking branch 'upstream/main' into enh/io/nullable_bac…

593446a

…kend

Remove unnecessary xfails

a4cd9f5

phofl approved these changes Oct 22, 2022

View reviewed changes

phofl merged commit 0dce285 into pandas-dev:main Oct 22, 2022

mroeschke deleted the enh/io/nullable_backend branch October 24, 2022 17:20

noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022

ENH: Implement io.nullable_backend config for read_parquet (pandas-de…

bbaf091

…v#49039)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Implement io.nullable_backend config for read_parquet #49039

ENH: Implement io.nullable_backend config for read_parquet #49039

mroeschke commented Oct 11, 2022 •

edited

Loading

phofl Oct 11, 2022

mroeschke Oct 12, 2022

phofl Oct 12, 2022

mroeschke Oct 12, 2022

phofl Oct 13, 2022

mroeschke Oct 14, 2022

mroeschke commented Oct 20, 2022

phofl Oct 21, 2022

phofl commented Oct 22, 2022

ENH: Implement io.nullable_backend config for read_parquet #49039

ENH: Implement io.nullable_backend config for read_parquet #49039

Conversation

mroeschke commented Oct 11, 2022 • edited Loading

phofl Oct 11, 2022

Choose a reason for hiding this comment

mroeschke Oct 12, 2022

Choose a reason for hiding this comment

phofl Oct 12, 2022

Choose a reason for hiding this comment

mroeschke Oct 12, 2022

Choose a reason for hiding this comment

phofl Oct 13, 2022

Choose a reason for hiding this comment

mroeschke Oct 14, 2022

Choose a reason for hiding this comment

mroeschke commented Oct 20, 2022

phofl Oct 21, 2022

Choose a reason for hiding this comment

phofl commented Oct 22, 2022

mroeschke commented Oct 11, 2022 •

edited

Loading