ENH: Add io.nullable_backend=pyarrow support to read_excel #49965

mroeschke · 2022-11-30T00:16:46Z

xref ENH: Add global option io.nullable_type="pandas"|"pyarrow" to control IO reader use_nullable_dtype #48957 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

mroeschke · 2022-11-30T00:17:40Z

doc/source/whatsnew/v2.0.0.rst


 Configuration option, ``io.nullable_backend``, to return pyarrow-backed dtypes from IO functions
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-A new global configuration, ``io.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in :func:`read_parquet`, :func:`read_orc` and :func:`read_csv` (with ``engine="pyarrow"``)
-to return pyarrow-backed dtypes when set to ``"pyarrow"`` (:issue:`48957`).
+The ``use_nullable_dtypes`` keyword argument has been expanded to :func:`read_csv` and :func:`read_excel` to enable automatic conversion to nullable dtypes (:issue:`36712`)


cc @phofl this is the section where we can expand on use_nullable_dtype. Let me know if I should expand on anything in this PR

I was thinking about adding a really short example with only one or 2 columns for read_csv. To show a bit better how to use it.

Should I expand on the example below (it's hidden in the diff) to show pd.option_context("io.nullable_backend", "pandas") as well?

Ah sorry, missed that. Yes a pandas example would be great.

Another thing: Maybe make the functions bullet points, e.g.

read_csv

read_excel

So that they become a bit more prominent?

Good idea with the bullet points. Also added an example with the pandas example

mroeschke · 2022-11-30T00:19:08Z

pandas/tests/io/excel/test_readers.py

+                expected["i"].array._data.cast(pa.timestamp(unit="us"))
+            )
+            # pyarrow supports a null type, so don't have to default to Int64
+            expected["j"] = ArrowExtensionArray(pa.array([None, None]))


@phofl I noticed that in _infer_types that result_mask.all() (all nulls I think) would default the dtype to Int64. Any specific reason why Int64 was chosen?

Yes, if you do:

df = pd.DataFrame([[np.nan, np.nan]]) df.convert_dtypes()

you get back Int64 for all columns. Not saying this is perfect, but it made sense to model it after the existing behavior.

Ah okay makes sense why this is the case then

…w_nullable

phofl · 2022-12-02T11:13:25Z

thx @mroeschke for addressing the doc comments!

ENH: Add io.nullable_backend=pyarrow support to read_excel

d5c82b6

mroeschke added IO Excel read_excel, to_excel Arrow pyarrow functionality labels Nov 30, 2022

mroeschke added this to the 2.0 milestone Nov 30, 2022

mroeschke commented Nov 30, 2022

View reviewed changes

mroeschke added 4 commits December 1, 2022 13:14

Merge remote-tracking branch 'upstream/main' into enh/io/excel_pyarro…

b9cb462

…w_nullable

Address review for whatsnew

b5d87aa

Merge remote-tracking branch 'upstream/main' into enh/io/excel_pyarro…

434418a

…w_nullable

Seek StringIO

a106949

phofl approved these changes Dec 2, 2022

View reviewed changes

phofl merged commit 7e5a95c into pandas-dev:main Dec 2, 2022

mroeschke deleted the enh/io/excel_pyarrow_nullable branch December 2, 2022 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add io.nullable_backend=pyarrow support to read_excel #49965

ENH: Add io.nullable_backend=pyarrow support to read_excel #49965

mroeschke commented Nov 30, 2022 •

edited

Loading

mroeschke Nov 30, 2022

phofl Nov 30, 2022 •

edited

Loading

mroeschke Nov 30, 2022

phofl Nov 30, 2022

mroeschke Dec 1, 2022

mroeschke Nov 30, 2022

phofl Nov 30, 2022 •

edited

Loading

mroeschke Dec 1, 2022

phofl commented Dec 2, 2022

ENH: Add io.nullable_backend=pyarrow support to read_excel #49965

ENH: Add io.nullable_backend=pyarrow support to read_excel #49965

Conversation

mroeschke commented Nov 30, 2022 • edited Loading

mroeschke Nov 30, 2022

Choose a reason for hiding this comment

phofl Nov 30, 2022 • edited Loading

Choose a reason for hiding this comment

mroeschke Nov 30, 2022

Choose a reason for hiding this comment

phofl Nov 30, 2022

Choose a reason for hiding this comment

mroeschke Dec 1, 2022

Choose a reason for hiding this comment

mroeschke Nov 30, 2022

Choose a reason for hiding this comment

phofl Nov 30, 2022 • edited Loading

Choose a reason for hiding this comment

mroeschke Dec 1, 2022

Choose a reason for hiding this comment

phofl commented Dec 2, 2022

mroeschke commented Nov 30, 2022 •

edited

Loading

phofl Nov 30, 2022 •

edited

Loading

phofl Nov 30, 2022 •

edited

Loading