Skip to content

ENH: Select numeric ExtensionDtypes with DataFrame.select_dtypes #35341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
8 changes: 8 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3547,6 +3547,7 @@ def select_dtypes(self, include=None, exclude=None) -> "DataFrame":
4 True 1.0
5 False 2.0
"""

if not is_list_like(include):
include = (include,) if include is not None else ()
if not is_list_like(exclude):
Expand Down Expand Up @@ -3578,7 +3579,14 @@ def extract_unique_dtypes_from_dtypes_set(
unique_dtype
for unique_dtype in unique_dtypes
if issubclass(unique_dtype.type, tuple(dtypes_set)) # type: ignore
or (
np.number in dtypes_set
and is_extension_array_dtype(unique_dtype)
and unique_dtype._is_numeric
)
]
if np.number in dtypes_set:
extracted_dtypes.extend([])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not looked at this in detail, but from the name of the function, i'm assuming that this is not the correct location for this special casing.

There is another issue regarding extension arrays and dtypes, #9581. maybe a compat function for np.issubdtype could help, where we could add the special cases for EAs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I've added function to pandas\compat\numpy_init_.py

return extracted_dtypes

unique_dtypes = self.dtypes.unique()
Expand Down
50 changes: 50 additions & 0 deletions pandas/tests/extension/test_select_dtypes_numeric.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import pytest
import numpy as np

import pandas as pd
from pandas.core.arrays import ExtensionArray
from pandas.core.dtypes.dtypes import ExtensionDtype


class DummyDtype(ExtensionDtype):
type = int
_numeric = False

@property
def name(self):
return "Dummy"

@property
def _is_numeric(self):
return self._numeric


class DummyArray(ExtensionArray):
_dtype = DummyDtype()

def __init__(self, data):
self.data = data

def __array__(self, dtype):
return self.data

@property
def dtype(self):
return self._dtype

def __len__(self) -> int:
return len(self.data)

def __getitem__(self, item):
pass


@pytest.mark.parametrize("numeric", [True, False])
def test_select_dtypes_numeric(numeric):
da = DummyArray([1, 2])
da._dtype._numeric = numeric
df = pd.DataFrame(da)
if numeric:
assert df.select_dtypes(np.number).shape == df.shape
else:
assert df.select_dtypes(np.number).shape != df.shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the extension tests are predominantly testing specific EAs. could you add a test to pandas\tests\extension\base\dtype.py instead and override as necessary for the different EAs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention isn't to test the different EAs, but to test a dummy EA that emulates an external EA, like PintArray.

The most similar tests I could find were in pandas\tests\extension\test_common.py. Perhaps I should add the tests there?