Skip to content

Deprecated fastpath parameter in Categorial constructor #17889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

kaichogami
Copy link

Hello everyone!
I am new to Pandas developer community and your feedback would help me a lot. Thank you!

@pep8speaks
Copy link

pep8speaks commented Oct 16, 2017

Hello @kaichogami! Thanks for updating the PR.

Line 655:1: W293 blank line contains whitespace
Line 657:13: E303 too many blank lines (2)
Line 667:1: W293 blank line contains whitespace
Line 668:9: E303 too many blank lines (2)
Line 669:13: E129 visually indented line with same indent as next logical line

Comment last updated on November 13, 2017 at 12:27 Hours UTC

@@ -563,7 +566,7 @@ def _from_inferred_categories(cls, inferred_categories, inferred_codes,
dtype = CategoricalDtype(cats, ordered=False)
codes = inferred_codes

return cls(codes, dtype=dtype, fastpath=True)
return cls(codes, dtype=dtype)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite confused on what should be done for the fastpath parameters used in class method's to create a new Categorial. Should I replace them with dtype or let the __init__ handle CategorialDtype construction?

@kaichogami
Copy link
Author

@TomAugspurger

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 18, 2017

Thanks @kaichogami!

I didn't quite understand what fastpath was doing earlier. It seems to only be valid when values is actually codes, so I think we should focus the effort on fixing from_codes (but the warning you put in place looks good and should stay).

Ok, so, we'll want to modify Categorical.from_codes to also take a dtype argument.

def from_codes(cls, codes, categories, ordered=None, dtype=None):
   ...

We'll do the validation of categories, ordered, and dtype, basically refactor the first section of __init__ to something like

def validate_dtype(values=None, categories=None, ordered=None, dtype=None):

        # Ways of specifying the dtype (prioritized ordered)
        # 1. dtype is a CategoricalDtype
        #    a.) with known categories, use dtype.categories
        #    b.) else with Categorical values, use values.dtype
        #    c.) else, infer from values
        #    d.) specifying dtype=CategoricalDtype and categories is an error
        # 2. dtype is a string 'category'
        #    a.) use categories, ordered
        #    b.) use values.dtype
        #    c.) infer from values
        # 3. dtype is None
        #    a.) use categories, ordered
        #    b.) use values.dtype
        #    c.) infer from values

        if dtype is not None:
            if isinstance(dtype, compat.string_types):
                if dtype == 'category':
                    dtype = CategoricalDtype(categories, ordered)
                else:
                    msg = "Unknown `dtype` {dtype}"
                    raise ValueError(msg.format(dtype=dtype))
            elif categories is not None or ordered is not None:
                raise ValueError("Cannot specify both `dtype` and `categories`"
                                 " or `ordered`.")

            categories = dtype.categories
            ordered = dtype.ordered

        elif is_categorical(values):
            dtype = values.dtype._from_categorical_dtype(values.dtype,
                                                         categories, ordered)
        else:
            dtype = CategoricalDtype(categories, ordered)
        return categories, ordered, dtype

And use that in both __init__ and from_codes.

Then, in from_codes, instead of return cls(codes, categories...), you'll do

self = cls.__new__(cls)
self._codes = coerce_indexer_dtype(codes, dtype.categories)
self._dtype = dtype
return self

I think that'll work. Then, anywhere you see a fastpath=True, you should be able to change it to use from_codes with a dtype, and it'll be as fast as possible. Make sense?

@kaichogami
Copy link
Author

Yes, it does. I will go change it as soon as I can. Thanks for the feedback! 😄

@gfyoung gfyoung added Categorical Categorical Data Type Deprecate Functionality to remove in pandas labels Oct 20, 2017
@kaichogami
Copy link
Author

Hi, I am a bit busy so if its something important somebody else can take it over, otherwise I will do it by next week. Thank you.

@TomAugspurger
Copy link
Contributor

No rush on this.

@jreback
Copy link
Contributor

jreback commented Nov 12, 2017

can you rebase

@jreback
Copy link
Contributor

jreback commented Dec 28, 2017

closing as stale, if you want to work on this, pls ping.

@jreback jreback closed this Dec 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Deprecate Categorical fastpath argument
5 participants