Skip to content

PERF: add support for NaT in hashtable factorizers, improving Categorical construction with NaT #12128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jan 25, 2016

closes #12077

before     after       ratio
[1330b9fe] [404911c6]
17.25ms     1.21ms      0.07  categoricals.categorical_constructor_with_datetimes.time_datetimes_with_nat

@jreback jreback added Datetime Datetime data dtype Performance Memory or execution speed performance Categorical Categorical Data Type labels Jan 25, 2016
@jreback jreback added this to the 0.18.0 milestone Jan 25, 2016
return reverse, labels

@cython.boundscheck(False)
def get_labels(self, int64_t[:] values, Int64Vector uniques,
Py_ssize_t count_prior, Py_ssize_t na_sentinel):
Py_ssize_t count_prior, Py_ssize_t na_sentinel,
bint check_null=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situation would check_null=False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory I should not be checking this for a pure int array (as opposed to a view of a datetimelike).
Or for an integer array that is not int64. neither conditons we really support now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. well, this is fine then

@wesm
Copy link
Member

wesm commented Jan 25, 2016

Cool, kind of a no brainer.

@jreback
Copy link
Contributor Author

jreback commented Jan 25, 2016

perfect example of why we need common NA handling in arrays. had to fix up factorize and rank! for int64

@wesm
Copy link
Member

wesm commented Jan 25, 2016

Yep. I'd be all for 100% bitmasks if it wouldn't cause legacy issues; maybe we'll get there someday

@jreback jreback closed this in 81bb972 Jan 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Datetime Datetime data dtype Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Categorical(vals, cats) bad performance with NaNs
2 participants