PERF: BoolHashTable #15751
Labels
Dtype Conversions
Unexpected or buggy dtype conversions
Internals
Related to non-user accessible pandas implementation
Performance
Memory or execution speed performance
xref #15738 (comment)
Currently bool data is passed to the generic python object hashtable for the following methods, making them all very slow.
value_counts
rank
unique
Linked PR casts to int for
factorize
,duplicated
,drop_duplicates
We could skip the hashing altogether and take advantage of the fixed set of values, e.g. below is a fastpath for unique.
The text was updated successfully, but these errors were encountered: