Skip to content

ENH: Add use_nullable_dtypes and nullable_backend global option to read_orc #49827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 23, 2022

Conversation

mroeschke
Copy link
Member

Additionally

  • Show the dtypes in the whatsnew for clarity
  • Note in the docs that read_csv also supports the global nullable_backend option

@mroeschke mroeschke added Enhancement IO Data IO issues that don't fit into a more specific label Arrow pyarrow functionality labels Nov 22, 2022
@@ -33,7 +33,7 @@ sql-other, html, xml, plot, output_formatting, clipboard, compression, test]`` (
Configuration option, ``io.nullable_backend``, to return pyarrow-backed dtypes from IO functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A new global configuration, ``io.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in :func:`read_parquet` and :func:`read_csv` (with ``engine="pyarrow"``)
A new global configuration, ``io.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in :func:`read_parquet`, :func:`read_orc` and :func:`read_csv` (with ``engine="pyarrow"``)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-topic, but it seems read_excel supports use_nullable_dtypes but not io.nullable_backend. We should fix this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll add this in a follow up PR.


.. note

Currently only ``io.nullable_backend`` set to ``"pyarrow"`` is supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intend to implement the flag for pandas as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would want to do this in a follow up PR (unless you're interested :) )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No is fine, just wanted to understand if this is intended at all.

i want to tackle json and sql next

"float": np.arange(4.0, 7.0, dtype="float64"),
"float_with_nan": [2.0, np.nan, 3.0],
"bool": [True, False, None],
"datetime": pd.date_range("20130101", periods=3),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add bool without na?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added.

],
}
)
bytes_data = df.to_orc()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to avoid something subtle: can you do df.copy().to… since you are using df below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Added the copy

@mroeschke mroeschke added this to the 2.0 milestone Nov 23, 2022
@mroeschke mroeschke merged commit d8cfbd2 into pandas-dev:main Nov 23, 2022
@mroeschke mroeschke deleted the enh/pyarrow_types/orc branch November 23, 2022 00:00
mliu08 pushed a commit to mliu08/pandas that referenced this pull request Nov 27, 2022
…ad_orc (pandas-dev#49827)

* ENH: Add use_nullable_dtypes and nullable_backend to read_orc

* Skip if not required pa version

* Address review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Enhancement IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants