Skip to content

CategoricalDtype.update_dtype fails with Categorical #27338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WillAyd opened this issue Jul 11, 2019 · 8 comments · Fixed by #38873
Closed

CategoricalDtype.update_dtype fails with Categorical #27338

WillAyd opened this issue Jul 11, 2019 · 8 comments · Fixed by #38873
Assignees
Labels
Categorical Categorical Data Type Dtype Conversions Unexpected or buggy dtype conversions good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@WillAyd
Copy link
Member

WillAyd commented Jul 11, 2019

ref #27318 (comment) it appears that CategoricalDtype might want to accept a Categorical as an argument but fails to handle that properly. Not sure intent here but we either want to fix to allow Categorical or alternately simplify and only allow str/CategoricalDtype input

@WillAyd WillAyd added Categorical Categorical Data Type ExtensionArray Extending pandas with custom dtypes or arrays. API Design labels Jul 11, 2019
@jorisvandenbossche jorisvandenbossche added this to the 0.25.0 milestone Jul 11, 2019
@jorisvandenbossche
Copy link
Member

Looking at the base implementation:

def is_dtype(cls, dtype) -> bool:
"""Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
is_dtype : bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
"""
dtype = getattr(dtype, "dtype", dtype)
if isinstance(dtype, (ABCSeries, ABCIndexClass, ABCDataFrame, np.dtype)):
# https://github.com/pandas-dev/pandas/issues/22960
# avoid passing data to `construct_from_string`. This could
# cause a FutureWarning from numpy about failing elementwise
# comparison from, e.g., comparing DataFrame == 'category'.
return False
elif dtype is None:
return False
elif isinstance(dtype, cls):
return True
try:
return cls.construct_from_string(dtype) is not None
except TypeError:
return False

there is a dtype = getattr(dtype, "dtype", dtype), so it seems rather on purpose that it accepts not only dtype objects itself, but also containers that have a dtype.

(personally I don't really like this polymorphism, I think it would be cleaner to just allow dtype objects and have the called to obj.dtype if necessary)

@jorisvandenbossche
Copy link
Member

cc @jschendel

@jschendel jschendel changed the title CategoricalDtype.update_type fails with Categorical CategoricalDtype.update_dtype fails with Categorical Jul 13, 2019
@jschendel
Copy link
Member

I think only accepting CategoricalDtype is fine. The docstring states that the input should be a CategoricalDtype, with no mention of Categorical/CategoricalIndex, and the behavior is only tested for CategoricalDtype. If Categorical worked in the past it was only due to happenstance, likely due to is_dtype being a bit confusing in that it returns True for things that aren't actually dtypes (though it appears this is the intended behavior for is_dtype). Is there a tangible benefit to explicitly supporting Categorical here?

@jreback jreback modified the milestones: 0.25.0, 1.0 Jul 17, 2019
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 12, 2019

Anything to do here for 1.0, or OK to push?

@TomAugspurger
Copy link
Contributor

FWIW, on master, this does succeed

In [31]: pd.CategoricalDtype(['a']).update_dtype(pd.Categorical(['b'], ordered=True))
Out[31]: CategoricalDtype(categories=['b'], ordered=True)

@TomAugspurger TomAugspurger modified the milestones: 1.0, Contributions Welcome Dec 30, 2019
@mroeschke
Copy link
Member

Could use a test for the behavior above.

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed API Design Categorical Categorical Data Type ExtensionArray Extending pandas with custom dtypes or arrays. labels Jun 28, 2020
@pratyushprakash
Copy link

@mroeschke Is this up for grabs. If it is I'd like to work on it

@pratyushprakash
Copy link

take

@jreback jreback added Categorical Categorical Data Type Dtype Conversions Unexpected or buggy dtype conversions labels Jan 1, 2021
@jreback jreback modified the milestones: Contributions Welcome, 1.3 Jan 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Dtype Conversions Unexpected or buggy dtype conversions good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants