-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Compile Factorizer class for all numeric dtypes #49624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @jbrockmendel would you mind having a look? |
Adding factorize to the base class simplifies the follow up |
i guess the follow-up will change how |
@@ -101,6 +101,20 @@ from pandas._libs.khash cimport ( | |||
from pandas._libs.tslibs.util cimport get_c_string | |||
from pandas._libs.missing cimport C_NA | |||
|
|||
|
|||
cdef class Factorizer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any particular reason for this to be in the pxi.in instead of the pyx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some failures initially, but looks like this was caused by something else. Moved it back
a few questions, generally looks good |
Yeah and also avoids having a million typing errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM merge when ready @jbrockmendel
thanks @phofl |
* ENH: Compile Factorizer class for all numeric dtypes * Fix test * Fix test * Add factorize to base class * Remove ignores * Move factorizer
* ENH: Compile Factorizer class for all numeric dtypes * Fix test * Fix test * Add factorize to base class * Remove ignores * Move factorizer
This patch may have induced a potential regression. Please check the links below. If any ASVs are parameterized, the combinations of parameters that a regression has been detected appear as subbullets. This is a partially automated message.
|
I think this is a false positive, looks to me as this is impacted by the klib quadratic probing and seems to go back to normal when this pr was reverted |
This patch may have induced a potential regression. Please check the links below. If any ASVs are parameterized, the combinations of parameters that a regression has been detected appear as subbullets. This is a partially automated message. Subsequent benchmarks may have skipped some commits. See the link below to see which commits are between the two benchmark runs where the regression was identified.
|
The links don't work. But this pr did not touch the code paths that the isin case takes |
Reworked my scripts and updated the links, the subbullets now contain links to single parameterization of the benchmark where my code is identifying a regression. But certainly could be a false positive. For example, the first subbullet of algos.isin.IsInLongSeriesLookUpDominates.time_isin is an interesting case: The first step up points to this PR whereas the second step up is the revert of quadratic programming. Both look like "regressions" but the revert of quadratic programming puts us back to where we started. I'm going to run asvs between certain commits; I'm just curious what's going on here. |
The current implementation has a couple of ugly side effects in merge:
This does not enable the classes yet. Will do as a follow up.