Skip to content

Provide ExtensionDtype.construct_from_string by default #26562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 6, 2019
11 changes: 0 additions & 11 deletions pandas/core/arrays/integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,17 +78,6 @@ def construct_array_type(cls):
"""
return IntegerArray

@classmethod
def construct_from_string(cls, string):
"""
Construction from a string, raise a TypeError if not
possible
"""
if string == cls.name:
return cls()
raise TypeError("Cannot construct a '{}' from "
"'{}'".format(cls, string))


def integer_array(values, dtype=None, copy=False):
"""
Expand Down
39 changes: 27 additions & 12 deletions pandas/core/dtypes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,17 +172,27 @@ def construct_array_type(cls):
raise NotImplementedError

@classmethod
def construct_from_string(cls, string):
"""
Attempt to construct this type from a string.
def construct_from_string(cls, string: str):
r"""
Construct this type from a string.

This is useful mainly for data types that accept parameters.
For example, a period dtype accepts a frequency parameter that
can be set as ``period[H]`` (where H means hourly frequency).

By default, in the abstract class, just the name of the type is
expected. But subclasses can overwrite this method to accept
parameters.

Parameters
----------
string : str
The name of the type, for example ``category``.

Returns
-------
self : instance of 'cls'
ExtensionDtype
Instance of the dtype.

Raises
------
Expand All @@ -191,21 +201,26 @@ def construct_from_string(cls, string):

Examples
--------
If the extension dtype can be constructed without any arguments,
the following may be an adequate implementation.
For extension dtypes with arguments the following may be an
adequate implementation.

>>> @classmethod
... def construct_from_string(cls, string)
... if string == cls.name:
... return cls()
... def construct_from_string(cls, string):
... pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
... match = pattern.match(string)
... if match:
... return cls(**match.groupdict())
... else:
... raise TypeError("Cannot construct a '{}' from "
... "'{}'".format(cls, string))
... "'{}'".format(cls.__name__, string))
"""
raise AbstractMethodError(cls)
if string != cls.name:
raise TypeError("Cannot construct a '{}' from '{}'".format(
cls.__name__, string))
return cls()

@classmethod
def is_dtype(cls, dtype):
def is_dtype(cls, dtype) -> bool:
"""Check if we match 'dtype'.

Parameters
Expand Down
13 changes: 0 additions & 13 deletions pandas/core/dtypes/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -440,19 +440,6 @@ def construct_array_type(cls):
from pandas import Categorical
return Categorical

@classmethod
def construct_from_string(cls, string):
"""
attempt to construct this type from a string, raise a TypeError if
it's not possible """
try:
if string == 'category':
return cls()
else:
raise TypeError("cannot construct a CategoricalDtype")
except AttributeError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anybody know why we had this "except AttributeError: pass" here?


@staticmethod
def validate_ordered(ordered):
"""
Expand Down
32 changes: 31 additions & 1 deletion pandas/tests/dtypes/test_dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,36 @@ def test_pickle(self):
assert result == self.dtype


class TestBaseDtype:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such tests are already in the base extension tests (tests/extension/base/dtype), or can be expanded there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the feeling that those were only base tests for implementing tests for custom extensions, couldn't find a place where I thought it'd make sense to have the abstract class test. Where exactly would you have them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those were only base tests for implementing tests for custom extensions

Yes, but we have custom extensions ourselves, so we use those tests ourselves (and in addition we also have a few extra test EAs).

For example in tests/extension/decimal/array.py you can remove the custom implementation and let it inherit, and then it is already tested in that way.

where I thought it'd make sense to have the abstract class test

What do you mean with abstract class test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand what you want to test.

@jreback wanted a test for the existence of the method construct_from_string, which makes sense to me. Ideally, we'd already have a test, testing that the method raises an exception when called directly from ExtensionDtype (what I called the abstract class before). Then I'd just replace that by checking that now it's implemented, and it can construct an instance given the string of the name of the dtype.

I saw that in extension/base/dtype.py we already have a test for construct_from_string, but that's meant for custom extensions (ours or from a third-party) to test that their implementation of construct_from_string follows the expected behavior. The way of using that test is to create a subclass of BaseDtypeTests.

I don't think we want to test that the method is implemented in the ExtensionDtype (the base/abstract class) by checking a subclass of it. So, I see two options:

  • Create a subclass of BaseDtypeTests defining that the dtype to test is not a subclass but ExtensionDtype itself (it will fail for the abstract methods that still exist in `ExtensionDtype).
  • Create an independent test for ExtensionDtype and not their subclasses, that doesn't subclass BaseDtypeTests (this is what I've done here, not testing only construct_from_string but all the methods that I could tests, since there weren't tests for the the "default"/base implementations).

Am I missing something here? Sorry I don't understand exactly what you propose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see @jorisvandenbossche comments on where tests should go, we have a very general framework for these already.

def setup_method(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we really don't want to use setup_method generally at all.

class DummyDtype(pd.core.dtypes.base.ExtensionDtype):
name = 'dummy'
self.dtype = DummyDtype()

def test_str(self):
assert str(self.dtype) == self.dtype.name

def test_eq(self):
assert self.dtype == 'dummy'
assert self.dtype != 'anonther_type'

def test_default_kind(self):
assert self.dtype.kind == 'O'

def test_construct_from_string(self):
dtype_instance = self.dtype.__class__.construct_from_string('dummy')
assert isinstance(dtype_instance, self.dtype.__class__)
with pytest.raises(TypeError, match="Cannot construct a 'DummyDtype' "
"from 'another_type'"):
self.dtype.__class__.construct_from_string('another_type')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a useful test to add to the base extension tests, and the test_eq as well. The defaults tests are maybe harder though.


def test_default_is_numeric(self):
assert not self.dtype._is_numeric

def test_default_is_boolean(self):
assert not self.dtype._is_numeric


class TestCategoricalDtype(Base):

def create(self):
Expand Down Expand Up @@ -82,7 +112,7 @@ def test_equality(self):
def test_construction_from_string(self):
result = CategoricalDtype.construct_from_string('category')
assert is_dtype_equal(self.dtype, result)
msg = "cannot construct a CategoricalDtype"
msg = "Cannot construct a 'CategoricalDtype' from 'foo'"
with pytest.raises(TypeError, match=msg):
CategoricalDtype.construct_from_string('foo')

Expand Down