Skip to content

PERF: use ndarray.take instead of algos.take #40852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 13, 2021

Conversation

jbrockmendel
Copy link
Member

Un-revert part of #40510 in the hopes of tracking down where the perf impact was.

@jreback jreback added the Performance Memory or execution speed performance label Apr 9, 2021
@jreback
Copy link
Contributor

jreback commented Apr 9, 2021

ok is this WIP?

@jbrockmendel
Copy link
Member Author

ok is this WIP?

no, this is ready

@jorisvandenbossche
Copy link
Member

Can you run some benchmarks to verify this time?

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Apr 9, 2021

Eg the one I mentioned in #40818 (comment). I didn't list the exact impacted benchmarks in #40510 (comment), but eg the select_dtypes ones were also affected.

@jbrockmendel
Copy link
Member Author

Eg the one I mentioned in #40818 (comment). I didn't list the exact impacted benchmarks in #40510 (comment), but eg the select_dtypes ones were also affected.

Just ran the time_frame_agg one, no change. The non-asv motivator remains

In [1]: from pandas.core.algorithms import *
In [2]: arr = np.arange(10000)
In [3]: taker = np.random.randint(0, 10000, 1000)
In [4]: res = arr.take(taker)
In [5]: res2 = take_nd(arr, taker, allow_fill=False)

In [6]: %timeit res = arr.take(taker)
   ...: 
4.21 µs ± 93.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: %timeit res2 = take_nd(arr, taker, allow_fill=False)
   ...: 
8.9 µs ± 670 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

@jbrockmendel
Copy link
Member Author

-b select_dtypes also no change

@jreback jreback added this to the 1.3 milestone Apr 9, 2021
@jbrockmendel
Copy link
Member Author

gentle ping; this should be an unambiguous perf win

@jreback jreback merged commit eb0978f into pandas-dev:master Apr 13, 2021
@jbrockmendel jbrockmendel deleted the re-take-1 branch April 13, 2021 16:48
@jorisvandenbossche
Copy link
Member

And this time https://pandas.pydata.org/speed/pandas/#/ doesn't show any regressions!

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants