Skip to content

sample not using numpy's random state #13143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ariddell opened this issue May 11, 2016 · 4 comments
Closed

sample not using numpy's random state #13143

ariddell opened this issue May 11, 2016 · 4 comments
Labels
Compat pandas objects compatability with Numpy or Python functions Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@ariddell
Copy link

ariddell commented May 11, 2016

After fixing a random seed with numpy.random.seed, I expect sample to yield the same results.

Expected behavior of numpy.random.choice but found something different. Here is pandas:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame(np.arange(1000))
In [12]: np.random.seed(5); df.sample(2)
Out[12]: 
       0
824  824
225  225

In [13]: np.random.seed(5); df.sample(2)
Out[13]: 
       0
182  182
586  586

Whereas numpy.random.choice is consistent

In [6]: np.random.seed(5); np.random.choice(1000)
Out[6]: 867

In [7]: np.random.seed(5); np.random.choice(1000)
Out[7]: 867

output of pd.show_versions()

In [8]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-67-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.1
pip: 8.1.1
setuptools: 18.4
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.16.1
statsmodels: 0.6.1
xarray: None
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.2.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
@jreback
Copy link
Contributor

jreback commented May 11, 2016

you have to pass the state in. This was designed this way on purpose IIRC.

In [10]: df = pd.DataFrame(np.arange(1000))

In [12]: df.sample(2, random_state=2)
Out[12]: 
       0
37    37
726  726

In [13]: df.sample(2, random_state=2)
Out[13]: 
       0
37    37
726  726

@nickeubank @jorisvandenbossche

@jreback jreback closed this as completed May 11, 2016
@jreback jreback added Numeric Operations Arithmetic, Comparison, and Logical operations Compat pandas objects compatability with Numpy or Python functions labels May 11, 2016
@jorisvandenbossche
Copy link
Member

I think we should provide the proposed behaviour (next to numpy, this is also how eg sklearn's train_test_split behaves)

It would be a change to this line: https://github.com/pydata/pandas/blob/master/pandas/core/common.py#L2075. Looking at sklearn, we should return np.random.mtrand._rand instead (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/validation.py#L573)

@ariddell interested to do a PR?

@jreback jreback added this to the 0.18.2 milestone May 11, 2016
@jreback
Copy link
Contributor

jreback commented May 11, 2016

ahh I c, so that will then use the global state, makes sense.

@ariddell
Copy link
Author

Yes, I'll do the PR. Thanks for the pointer to the relevant line.

On 05/11, Joris Van den Bossche wrote:

I think we should provide the proposed behaviour (next to numpy, this is also how eg sklearn's train_test_split behaves)

It would be a change to this line: https://github.com/pydata/pandas/blob/master/pandas/core/common.py#L2075. Looking at sklearn, we should return np.random.mtrand._rand instead (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/validation.py#L573)

@ariddell interested to do a PR?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#13143 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants