You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems like it'd be more performant from a computational and memory standpoint to bypass the intermediate construction of an IntervalIndex via take_nd and instead directly construct the Categorical via Categorical.from_codes.
Some ad hoc measurements on master:
In [3]: ii=pd.interval_range(0, 20)
In [4]: values=np.linspace(0, 20, 100).repeat(10**4)
In [5]: %timeitpd.cut(values, ii)
7.69s ± 43.5msperloop (mean ± std. dev. of7runs, 1loopeach)
In [6]: %memitpd.cut(values, ii)
peakmemory: 278.39MiB, increment: 130.76MiB
And the same measurements with the Categorical.from_codes fix:
In [3]: ii=pd.interval_range(0, 20)
In [4]: values=np.linspace(0, 20, 100).repeat(10**4)
In [5]: %timeitpd.cut(values, ii)
1.02s ± 18.9msperloop (mean ± std. dev. of7runs, 1loopeach)
In [6]: %memitpd.cut(values, ii)
peakmemory: 145.81MiB, increment: 15.98MiB
The text was updated successfully, but these errors were encountered:
When using
cut
with anIntervalIndex
forbins
the result of thecut
is first materialized as anIntervalIndex
and then converted to aCategorical
:pandas/pandas/core/reshape/tile.py
Lines 373 to 378 in 143bc34
It seems like it'd be more performant from a computational and memory standpoint to bypass the intermediate construction of an
IntervalIndex
viatake_nd
and instead directly construct theCategorical
viaCategorical.from_codes
.Some ad hoc measurements on
master
:And the same measurements with the
Categorical.from_codes
fix:The text was updated successfully, but these errors were encountered: