PERF: use numpy.random.Generator #28440

thrasibule · 2019-09-14T02:23:12Z

Numpy 1.17 introduced a new random module with faster PRNGs, and which drops the strict reproducibility of random streams guarantee, which allows some algorithmic improvements. In particular, the choice method is now a lot faster in the replace=False case. Would it make sense for random_state to return a np.random.Generator instead of a np.random.RandomState when numpy version >= 1.17 here: https://github.com/pandas-dev/pandas/blob/master/pandas/core/common.py#L408. This would automatically speed up the DataFrame.sample method for instance.
I can write a PR, but I'm not sure how to handle the different numpy versions. Should I just do the tests inside random_state or is something that needs to go inside numpy.compat?

The text was updated successfully, but these errors were encountered:

WillAyd · 2019-09-16T02:41:04Z

There should already be _np_version_under1p17 from NumPy compat which you can use internally

TomAugspurger · 2019-09-16T11:55:56Z

We’ll need to discuss what API guarantees we make about the seeds. I don’t think the output should change when a use provides an integer seed, for example. What happens now if the user passes a new random Generator instance?

…

On Sep 15, 2019, at 21:41, William Ayd ***@***.***> wrote: There should already be _np_version_under1p17 from NumPy compat which you can use internally — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

jamesmyatt · 2020-11-26T21:42:27Z

ValueError!

jamesmyatt · 2020-11-26T21:47:07Z

Guarantee-wise, it would be probably be best to match NEP 19

If you just replace pandas.core.common.random_state with numpy.random.default_rng then you probably match NumPy's compatibility guarantee exactly. But you can only do that when upgrading the rest of the project to use Generators.

mroeschke · 2024-01-28T00:22:55Z

Our sample APIs take numpy random generators now so closing

thrasibule changed the title ~~PERF: unse numpy.random.Generator~~ PERF: use numpy.random.Generator Sep 14, 2019

jbrockmendel added the Performance Memory or execution speed performance label Sep 16, 2019

jamesmyatt mentioned this issue Nov 26, 2020

BUG: random_state should permit numpy.random.Generator #38100

Closed

3 tasks

mzeitlin11 mentioned this issue Jun 25, 2021

PERF/ENH: allow Generator in sampling methods #42243

Merged

4 tasks

gmaratos mentioned this issue Nov 16, 2021

ENH: DataFrame.sample should accept numpy.random.Generator #44486

Closed

mroeschke closed this as completed Jan 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: use numpy.random.Generator #28440

PERF: use numpy.random.Generator #28440

thrasibule commented Sep 14, 2019

WillAyd commented Sep 16, 2019

TomAugspurger commented Sep 16, 2019 via email

jamesmyatt commented Nov 26, 2020

jamesmyatt commented Nov 26, 2020 •

edited

Loading

mroeschke commented Jan 28, 2024

PERF: use numpy.random.Generator #28440

PERF: use numpy.random.Generator #28440

Comments

thrasibule commented Sep 14, 2019

WillAyd commented Sep 16, 2019

TomAugspurger commented Sep 16, 2019 via email

jamesmyatt commented Nov 26, 2020

jamesmyatt commented Nov 26, 2020 • edited Loading

mroeschke commented Jan 28, 2024

jamesmyatt commented Nov 26, 2020 •

edited

Loading