Skip to content

CLN: make CategoricalIndex._create_categorical a classmethod #21618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Jun 24, 2018

Currently, CategoricalIndex._create_categorical is a staticmethod, and is being called internally using either instances or classes as its first argument, e.g.:

  • in _is_dtype_compat an instance is supplied as the first argument,
  • in __new__ a class is supplied as the first argument.

This is confusing and makes the code paths different depending on how the method is called. It makes it difficult to reason about the precise output of the method.

This PR cleans this up by making _create_categorical a classmethod. This simplifies stuff, and we can also remove the method's first parameter when calling it.

Calling _create_categorical unneccesarily is one reason for the slowness of #20395. After this cleanup PR I will do another that should get #20395 come down to 1.6 ms as well as give some other related performance improvements.

@@ -1130,7 +1130,8 @@ def to_frame(self, index=True):
"""

from pandas import DataFrame
result = DataFrame(self._shallow_copy(), columns=[self.name or 0])
Copy link
Contributor Author

@topper-123 topper-123 Jun 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_shallow_copy does not guarantee that _data is copied, so a test fails. to_frame should get a copy, else smething like this could happen:

>>> ci = pd.CategoricalIndex(['a', 'b', 'c'])
>>> df = ci.to_frame()
>>> df.iloc[0] = 'c'
>>> df
   0
c  c  # index label changed!
b  b
c  c

By using self.values.copy() rather than _shallow_copy, this issue is avoided.

- Improved performance of membership checks in :class:`CategoricalIndex`
(i.e. ``x in ci``-style checks are much faster). :meth:`CategoricalIndex.contains`
- Improved performance of membership checks in :class:`CategoricalIndex` and :class:`Categorical`
(i.e. ``x in cat``-style checks are much faster). :meth:`CategoricalIndex.contains`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a cleanup of the whatsnew for #21508, not related to this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this in a separate PR then.

@topper-123 topper-123 force-pushed the CategoricalIndex._create_categorical branch from df8a333 to c32844f Compare June 24, 2018 21:56
@codecov
Copy link

codecov bot commented Jun 24, 2018

Codecov Report

Merging #21618 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21618      +/-   ##
==========================================
+ Coverage    91.9%    91.9%   +<.01%     
==========================================
  Files         153      153              
  Lines       49547    49548       +1     
==========================================
+ Hits        45537    45538       +1     
  Misses       4010     4010
Flag Coverage Δ
#multiple 90.3% <100%> (ø) ⬆️
#single 41.77% <66.66%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.63% <100%> (ø) ⬆️
pandas/core/indexes/category.py 97.28% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4922c0e...fa378e2. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine, pls do the 0.23.2 change separately though.

- Improved performance of membership checks in :class:`CategoricalIndex`
(i.e. ``x in ci``-style checks are much faster). :meth:`CategoricalIndex.contains`
- Improved performance of membership checks in :class:`CategoricalIndex` and :class:`Categorical`
(i.e. ``x in cat``-style checks are much faster). :meth:`CategoricalIndex.contains`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this in a separate PR then.

@jreback jreback added this to the 0.24.0 milestone Jun 25, 2018
@jreback jreback added Categorical Categorical Data Type Clean labels Jun 25, 2018
@topper-123 topper-123 force-pushed the CategoricalIndex._create_categorical branch from c32844f to fa378e2 Compare June 25, 2018 12:37
@topper-123
Copy link
Contributor Author

Comments addressed & green

@jreback jreback merged commit 44f7b14 into pandas-dev:master Jun 25, 2018
@jreback
Copy link
Contributor

jreback commented Jun 25, 2018

thanks @topper-123

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
@topper-123 topper-123 deleted the CategoricalIndex._create_categorical branch October 27, 2018 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Clean
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants