-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: New Name for "numpy_nullable" dtype_backend #59032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jorisvandenbossche maybe a good follow up to the discussion we had as part of PDEP-14 |
@pandas-dev/pandas-core this wasn't major enough to include as part of PDEP-14, but I think is a logical follow up to clean up semantics. Curious what others may think |
I think it should be |
Hey! I’d like to work on this |
Take |
Any other team feedback on this? I think would be good to use the new name starting with 3.0 |
We have Given that the term “pandas dtype” already has a precedent, using |
I also have a slight preference for pandas because it is shorter, and I don't see us every introducing a non-nullable type system, so "_nullable" is superfluous |
On the other hand, when specifying So I don't think |
masked |
That's true as a matter of implementation, but I don't think end users are going to know that |
I did suggest |
That's a fair point, though I'm not sure that adding _nullable prevents that. I think that would only prevent an issue if we decided to offer non-nullable types |
Or offer something else that we can't foresee today |
PyArrow types indeed are pandas extension types, enhancing the functionality of the base PyArrow library to suit our use case of backing DataFrames or Series. We don't always rigidly adhere to the behavior of NumPy arrays for a Series with a NumPy dtype. We allow expansion, upcasting, and other conversions that may diverge from NumPy behavior, even though we return a NumPy type as the dtype. But I see no problems when we use the terms "pyarrow" or "numpy" when we talk about the backend. So it would seem reasonable to me to use the term "pandas" to describe the pandas nullable extension types.
The Presently, the available options for If we aim to allow users to continue using legacy types even when nullable types become the default, introducing an additional argument makes sense. Considering package names, options like |
I'm on board with what @simonjayhawkins is suggesting - pyarrow, pandas, and numpy as arguments reflect the core of the type system evolution, even if they may not be 100% technically accurate |
If we do decide on those terms, I also wonder if we should change the default value of |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
Many I/O methods today accept a "numpy_nullable" argument for the dtype_backend= parameter. While historically our extension arrays exclusively used NumPy, this is no longer true with the string dtype so the name "numpy_nullable" is a misnomer.
Feature Description
To make for a less confusing API, I would suggest adding "pandas_nullable" or maybe even just "pandas" as an argument. This can have the exact same behavior as "numpy_nullable" today but abstracts and corrects the semantics. "numpy_nullable" can be slowly deprecated over time
Alternative Solutions
n/a
Additional Context
dtype_backend="pandas" would also make for a smoother transition into the logical type system proposed as part of PDEP-13 #58455
...but even if that PDEP is not accepted, I still see value in changing the value "numpy_nullable" to something else
The text was updated successfully, but these errors were encountered: