Skip to content

PERF: use numpy.random.Generator #28440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thrasibule opened this issue Sep 14, 2019 · 5 comments
Closed

PERF: use numpy.random.Generator #28440

thrasibule opened this issue Sep 14, 2019 · 5 comments
Labels
Performance Memory or execution speed performance

Comments

@thrasibule
Copy link
Contributor

Numpy 1.17 introduced a new random module with faster PRNGs, and which drops the strict reproducibility of random streams guarantee, which allows some algorithmic improvements. In particular, the choice method is now a lot faster in the replace=False case. Would it make sense for random_state to return a np.random.Generator instead of a np.random.RandomState when numpy version >= 1.17 here: https://github.com/pandas-dev/pandas/blob/master/pandas/core/common.py#L408. This would automatically speed up the DataFrame.sample method for instance.
I can write a PR, but I'm not sure how to handle the different numpy versions. Should I just do the tests inside random_state or is something that needs to go inside numpy.compat?

@thrasibule thrasibule changed the title PERF: unse numpy.random.Generator PERF: use numpy.random.Generator Sep 14, 2019
@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Sep 16, 2019
@WillAyd
Copy link
Member

WillAyd commented Sep 16, 2019

There should already be _np_version_under1p17 from NumPy compat which you can use internally

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 16, 2019 via email

@jamesmyatt
Copy link
Contributor

ValueError!

@jamesmyatt
Copy link
Contributor

jamesmyatt commented Nov 26, 2020

Guarantee-wise, it would be probably be best to match NEP 19

If you just replace pandas.core.common.random_state with numpy.random.default_rng then you probably match NumPy's compatibility guarantee exactly. But you can only do that when upgrading the rest of the project to use Generators.

@mroeschke
Copy link
Member

Our sample APIs take numpy random generators now so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

6 participants