Skip to content

describe() returns RuntimeWarning: Invalid value encountered in median RuntimeWarning #13146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
msure opened this issue May 11, 2016 · 1 comment
Labels
Bug Duplicate Report Duplicate issue or pull request Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@msure
Copy link

msure commented May 11, 2016

I just upgraded to 18.1 w/ conda. I started noticing this problem in some notebooks I created before the upgrade but recently revisited for further analysis.

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
df = pd.DataFrame({'task_complete':['success','success','fail','fail','success','fail','success'],
    'value':[np.nan,4.5,5.7,3.0,np.nan,6.7,3.78]})

df.value.describe() returns a RuntimeWarning from numpy, which then gives this unexpected result for the quantiles:

In [5]: df.value.describe()
/Users/adrianpalacios/anaconda/lib/python3.4/site-packages/numpy/lib/function_base.py:3403: RuntimeWarning: Invalid value encountered in median
  RuntimeWarning)
Out[5]:
count    5.000000
mean     4.736000
std      1.480703
min      3.000000
25%           NaN
50%           NaN
75%           NaN
max      6.700000
Name: value, dtype: float64

Expected Output

I got this using a different conda environment that has not been upgraded to latest pandas version:

In [5]: df.value.describe()
Out[5]:
count    5.000000
mean     4.736000
std      1.480703
min      3.000000
25%      3.780000
50%      4.500000
75%      5.700000
max      6.700000
Name: value, dtype: float64

Dropping the NaN's works in pandas 18.1:

In [9]: df.value.dropna().describe()
Out[9]:
count    5.000000
mean     4.736000
std      1.480703
min      3.000000
25%      3.780000
50%      4.500000
75%      5.700000
max      6.700000
Name: value, dtype: float64

However, this work-around is not a great option when multiple columns w/ NaNs are present:

df2 = pd.DataFrame({'task_complete':['success','success','fail','fail','success','fail','success'],
    'value':[np.nan,4.5,5.7,3.0,np.nan,6.7,3.78],
    'more_values':[8.2,np.nan,np.nan,np.nan,9.4,np.nan,np.nan]
})

In [17]: df2[['value','more_values']].dropna().describe()
Out[17]:
       value  more_values
count    0.0          0.0
mean     NaN          NaN
std      NaN          NaN
min      NaN          NaN
25%      NaN          NaN
50%      NaN          NaN
75%      NaN          NaN
max      NaN          NaN

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.4.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.2.2
Cython: 0.22.1
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: 0.9.1
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented May 11, 2016

this is a duplicate of #13098 , closed by #13122 (not yet merged).

thanks for the report

@jreback jreback closed this as completed May 11, 2016
@jreback jreback added Bug Duplicate Report Duplicate issue or pull request Numeric Operations Arithmetic, Comparison, and Logical operations labels May 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

2 participants