Skip to content

Issues with numeric_only for DataFrame.std() #9201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mortada opened this issue Jan 6, 2015 · 4 comments · Fixed by #9209
Closed

Issues with numeric_only for DataFrame.std() #9201

mortada opened this issue Jan 6, 2015 · 4 comments · Fixed by #9209
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@mortada
Copy link
Contributor

mortada commented Jan 6, 2015

The docstring shows a numeric_only option for DataFrame.std() but it does not seem to actually be implemented. I'm happy to take a crack at fixing it but I'm not sure whether it's the doc or the implementation that needs fixing.

To see this consider a mixed-type DataFrame where I'm setting one entry to be a str of '100' while all other entries are float. For std() It does not matter whether numeric_only is True or False, but for max() it clearly makes a difference.

In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame(np.random.randn(5, 2), columns=['foo', 'bar'])
In [4]: df.ix[0, 'foo'] = '100'

In [5]: df
Out[5]:
         foo       bar
0        100 -1.958036
1   0.221049  0.309971
2   1.200093 -0.103244
3  -2.475388 -2.279483
4  0.1623936 -1.185682

In [6]: df.std(numeric_only=True)
Out[6]:
foo    44.841828
bar     1.129182
dtype: float64

In [7]: df.std(numeric_only=False)
Out[7]:
foo    44.841828
bar     1.129182
dtype: float64

In [8]: df.max(numeric_only=False)
Out[8]:
foo    100.000000
bar      0.309971
dtype: float64

In [9]: df.max(numeric_only=True)
Out[9]:
bar    0.309971
dtype: float64
@jreback
Copy link
Contributor

jreback commented Jan 6, 2015

numeric_only matters if axis=1 otherwise the dtype controls what happens
your first column is an object dtype
max and min will work on an object dtype (though not sure what std should do here - you would have to trace it and see)

@mortada
Copy link
Contributor Author

mortada commented Jan 6, 2015

I see. But even for axis=1 it seems to be inconsistent. You can see below that numeric_only has an effect on max(), but for std() it makes no difference

In [17]: df.max(axis=1, numeric_only=True)
Out[17]:
0   -1.958036
1    0.309971
2   -0.103244
3   -2.279483
4   -1.185682
dtype: float64

In [18]: df.max(axis=1, numeric_only=False)
Out[18]:
0    100.000000
1      0.309971
2      1.200093
3     -2.279483
4      0.162394
dtype: float64

In [19]: df.std(axis=1, numeric_only=True)
Out[19]:
0    72.095218
1     0.062878
2     0.921598
3     0.138526
4     0.953233
dtype: float64

In [20]: df.std(axis=1, numeric_only=False)
Out[20]:
0    72.095218
1     0.062878
2     0.921598
3     0.138526
4     0.953233
dtype: float64

@jreback
Copy link
Contributor

jreback commented Jan 6, 2015

hmm could be a buggie

@mortada
Copy link
Contributor Author

mortada commented Jan 7, 2015

@jreback I made a PR for this #9209

@jreback jreback added Bug Numeric Operations Arithmetic, Comparison, and Logical operations labels Jan 8, 2015
@jreback jreback added this to the 0.16.0 milestone Jan 8, 2015
@jreback jreback modified the milestones: 0.16.1, 0.16.0 Mar 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants