-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Implement io.nullable_backend config for read_parquet #49039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
doc/source/whatsnew/v1.6.0.rst
Outdated
@@ -33,6 +33,7 @@ Other enhancements | |||
- :meth:`Series.add_suffix`, :meth:`DataFrame.add_suffix`, :meth:`Series.add_prefix` and :meth:`DataFrame.add_prefix` support an ``axis`` argument. If ``axis`` is set, the default behaviour of which axis to consider can be overwritten (:issue:`47819`) | |||
- :func:`assert_frame_equal` now shows the first element where the DataFrames differ, analogously to ``pytest``'s output (:issue:`47910`) | |||
- Added new argument ``use_nullable_dtypes`` to :func:`read_csv` to enable automatic conversion to nullable dtypes (:issue:`36712`) | |||
- Added new global configuration, ``io.nullable_backend`` to allow ``use_nullable_dtypes=True`` to return pyarrow-backed dtypes when set to ``"pyarrow"`` in :func:`read_parquet` (:issue:`48957`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should engine default to arrow if the option Is set to arrow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be a little hesitant to override a user specifying pd.read_*(..., engine="not-arrow")
and this option forcing arrow to be used instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but we could make engine=NoDefault and only override if not given.
But I agree that we should consider this carefully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. Another consideration is that not all IO methods will have associated arrow parsers (immediately). e.g. read_html
@@ -1021,6 +1021,43 @@ def test_read_parquet_manager(self, pa, using_array_manager): | |||
else: | |||
assert isinstance(result._mgr, pd.core.internals.BlockManager) | |||
|
|||
def test_read_use_nullable_types_pyarrow_config(self, pa, df_full): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is failing in the min versions build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed
@phofl any follow-ups here? |
pandas/tests/io/test_parquet.py
Outdated
@@ -1021,6 +1021,46 @@ def test_read_parquet_manager(self, pa, using_array_manager): | |||
else: | |||
assert isinstance(result._mgr, pd.core.internals.BlockManager) | |||
|
|||
@pytest.mark.xfail( | |||
pa_version_under2p0, reason="Timezone conversion incorrect for pyarrow < 2.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove this
thx @mroeschke |
io.nullable_type="pandas"|"pyarrow"
to control IO readeruse_nullable_dtype
#48957 (Replace xxxx with the Github issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.