Skip to content

Convert ExtensionDtype class objects to instances of that class #47108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

patrickmckenna
Copy link

This enables behavior that would previously raise (and often confuse), e.g :

pd.Series(["a", "b"], dtype=pd.StringDtype)
pd.api.types.is_bool_dtype(pd.BooleanDtype) == True

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm we deliberately make this an error now

i am ok with having a better error message but i don't think changing this is a good idea

@patrickmckenna
Copy link
Author

Ahh, thanks for the clarification. I must have misread #45417 (comment) to mean that the changes here were desirable.

Just to confirm (please correct as needed):

@jreback
Copy link
Contributor

jreback commented May 25, 2022

yep that sounds right

@@ -67,6 +68,12 @@
_is_scipy_sparse = None

ensure_float64 = algos.ensure_float64
ensure_int64 = algos.ensure_int64
Copy link
Author

@patrickmckenna patrickmckenna May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changed line count in this module look gnarly, but theres's only one functional change (pointed out below). Everything is else is just rearranging lines so that like functions are grouped together, for easier visual grepping and maintenance.

For example, this module defines 5 is_datetime*_dtype(arr_or_dtype) functions. They used be in 2 different sections 500 lines apart, but are now grouped together.

@@ -1760,6 +1599,10 @@ def pandas_dtype(dtype) -> DtypeObj:
------
TypeError if not a dtype
"""
if inspect.isclass(dtype) and issubclass(dtype, (np.dtype, ExtensionDtype)):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one substantive change in this module is these 3 lines.

@@ -24,6 +29,28 @@
from pandas.arrays import SparseArray


@pytest.fixture(name="ea_dtype", params=_registry.dtypes, scope="module")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like the most efficient way to retrieve all ExtensionDtype subclasses, but please LMK if there's a different, preferred method. I didn't see any existing fixture, or constant in _test/__init__.py, that enumerated these.

params=(f for f in dir(com) if re.fullmatch(r"^is_\w+_dtype$", f)),
scope="module",
)
def fixture_is_dtype_func(request):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move a pre-existing function (get_is_dtype_funcs) into a fixture, since its output is now used by multiple tests.

@@ -107,6 +147,37 @@ def test_period_dtype(self, dtype):
assert com.pandas_dtype(dtype) == dtype


@pytest.mark.xfail(reason="not yet implemented")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test currently fails for all parameterizations.

It sounds like the is_*_dtype functions should be changed so that this tests passes, but I left that out of this PR for 2 reasons:

  • Modifying only pandas_dtype is enough to get a (hopefully) more useful error message, which was the request in AssertionError creating array with StringDtype instead of StringDtype(), can we have a friendlier error message?  #31356.

    >>> pd.Series(dtype=pd.Int64Dtype)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/pandas/pandas/core/series.py", line 395, in __init__
        dtype = self._validate_dtype(dtype)
      File "/home/pandas/pandas/core/generic.py", line 436, in _validate_dtype
        dtype = pandas_dtype(dtype)
      File "/home/pandas/pandas/core/dtypes/common.py", line 1604, in pandas_dtype
        raise TypeError(msg)
    TypeError: Must pass dtype instance, not dtype class
    
  • Want to keep this PR narrowly scoped.

@patrickmckenna patrickmckenna marked this pull request as ready for review May 26, 2022 01:10
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jun 26, 2022
@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants