Skip to content

Quantile fails when only NaNs on some rows/columns #15460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jluttine opened this issue Feb 20, 2017 · 1 comment
Closed

Quantile fails when only NaNs on some rows/columns #15460

jluttine opened this issue Feb 20, 2017 · 1 comment
Labels
Bug Duplicate Report Duplicate issue or pull request Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@jluttine
Copy link

jluttine commented Feb 20, 2017

Quantile method on DataFrame fails when there are NaNs in some specific way. Some cases it handles properly. I guess it's somehow related to cases when some rows and/or columns have only NaNs.

First, some working examples:

>>> pd.DataFrame({"a": [1, 2, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=0)
a    2.0
b    NaN
Name: 0.5, dtype: float64
>>> pd.DataFrame({"a": [1, 2, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=1)
0    1.0
1    2.0
2    3.0
Name: 0.5, dtype: float64

Code Sample, a copy-pastable example if possible

If I change the second element on the first column to NaN, it all breaks. Now there's one row and one column with only NaNs:

>>> pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=0)
a    1.0
b    3.0
Name: 0.5, dtype: float64
>>> pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-a62c307ef5d1> in <module>()
----> 1 pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=1)

/.../lib/python3.5/site-packages/pandas/core/frame.py in quantile(self, q, axis, numeric_only, interpolation)
   5153                                      axis=1,
   5154                                      interpolation=interpolation,
-> 5155                                      transposed=is_transposed)
   5156 
   5157         if result.ndim == 2:

/.../lib/python3.5/site-packages/pandas/core/internals.py in quantile(self, **kwargs)
   3142 
   3143     def quantile(self, **kwargs):
-> 3144         return self.reduction('quantile', **kwargs)
   3145 
   3146     def setitem(self, **kwargs):

/.../lib/python3.5/site-packages/pandas/core/internals.py in reduction(self, f, axis, consolidate, transposed, **kwargs)
   3071         for b in self.blocks:
   3072             kwargs['mgr'] = self
-> 3073             axe, block = getattr(b, f)(axis=axis, **kwargs)
   3074 
   3075             axes.append(axe)

/.../lib/python3.5/site-packages/pandas/core/internals.py in quantile(self, qs, interpolation, axis, mgr)
   1325             values = _block_shape(values[~mask], ndim=self.ndim)
   1326             if self.ndim > 1:
-> 1327                 values = values.reshape(result_shape)
   1328 
   1329         from pandas import Float64Index

ValueError: total size of new array must be unchanged

Problem description

In the first case, the quantiles are incorrect and in the second case, the quantile computation should not raise an error.

Expected Output

>>> pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=0)
a    2.0
b    NaN
Name: 0.5, dtype: float64
>>> pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=1)
0    1.0
1    NaN
2    3.0

Output of pd.show_versions()

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.9.9-gnu-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.0
nose: None
pip: 8.1.2
setuptools: 27.2.0.post20161106
Cython: None
numpy: 1.11.1
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.4
boto: None
pandas_datareader: None

@jreback jreback added Bug Duplicate Report Duplicate issue or pull request Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 20, 2017
@jreback jreback added this to the No action milestone Feb 20, 2017
@jreback
Copy link
Contributor

jreback commented Feb 20, 2017

you are using an older version, these are fixed by 0.19.2, this is a dupe of #14357 , closed in #14536

In [1]: pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=0)
   ...: 
Out[1]: 
a    2.0
b    NaN
Name: 0.5, dtype: float64

In [2]: pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=1)
Out[2]: 
0    1.0
1    NaN
2    3.0
Name: 0.5, dtype: float64

@jreback jreback closed this as completed Feb 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

2 participants