Skip to content

DIS: Should the keyword use_nullable_dtypes use nullable dtypes in the absence of nulls #42973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lithomas1 opened this issue Aug 10, 2021 · 5 comments
Labels
Closing Candidate May be closeable, needs more eyeballs IO Data IO issues that don't fit into a more specific label NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Discussion Requires discussion from core team before further action

Comments

@lithomas1
Copy link
Member

Currently the keyword use_nullable_dtypes(only implemented for read_parquet now) will use nullable dtypes for columns even when that column does not have nulls.

Should this behavior change?

xref discussion #40687 (comment) #42588 (comment)
#42588 (comment)

(Note that always using nullable dtypes always(the current behavior) is a choice made on our end https://github.com/pandas-dev/pandas/blob/master/pandas/io/parquet.py#L215-L227 not by the pyarrow engine)

cc @pandas-dev/pandas-core

@lithomas1 lithomas1 added IO Data IO issues that don't fit into a more specific label Needs Discussion Requires discussion from core team before further action NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Aug 10, 2021
@bashtage
Copy link
Contributor

I would vote that it should change to only use when necessary, and to use standard dtypes when there are no nulls. Users can always upcast to nullable should they want to use nulls.

@phofl
Copy link
Member

phofl commented Dec 17, 2022

I think we can close this. We implemented this for other functions like for read_parquet, which makes the most sense imo

@phofl phofl added the Closing Candidate May be closeable, needs more eyeballs label Dec 17, 2022
@MarcoGorelli
Copy link
Member

Agreed - personally I'm not keen on value-dependent behaviour such as

only use when necessary, and to use standard dtypes when there are no nulls

so I'd vote for keeping as-is (use_nullable_dtypes uses nullable types - which is clear and simple)

@lithomas1
Copy link
Member Author

We'll have to tell fastparquet then. I think they still do it the other way.

@phofl
Copy link
Member

phofl commented Dec 17, 2022

Could you open an issue there?

@phofl phofl closed this as completed Dec 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs IO Data IO issues that don't fit into a more specific label NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

4 participants