Skip to content

API: kdeplot fails with NaNs. #8182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Sep 5, 2014 · 2 comments
Closed

API: kdeplot fails with NaNs. #8182

TomAugspurger opened this issue Sep 5, 2014 · 2 comments

Comments

@TomAugspurger
Copy link
Contributor

This is inconsistent with the other plotting methods:

In [65]: df = pd.DataFrame(np.random.uniform(size=(100, 4)))

In [66]: df.loc[0, 0] = np.nan

In [67]: df.plot(kind='kde')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-67-9a78a7983298> in <module>()
----> 1 df.plot(kind='kde')

/Users/tom/Envs/py3/lib/python3.4/site-packages/pandas/pandas/tools/plotting.py in plot_frame(frame, x, y, subplots, sharex, sharey, use_index, figsize, grid, legend, rot, ax, style, title, xlim, ylim, logx, logy, xticks, yticks, kind, sort_columns, fontsize, secondary_y, layout, **kwds)
   2362                              secondary_y=secondary_y, layout=layout, **kwds)
   2363 
-> 2364     plot_obj.generate()
   2365     plot_obj.draw()
   2366     return plot_obj.result

/Users/tom/Envs/py3/lib/python3.4/site-packages/pandas/pandas/tools/plotting.py in generate(self)
    913         self._compute_plot_data()
    914         self._setup_subplots()
--> 915         self._make_plot()
    916         self._add_table()
    917         self._make_legend()

/Users/tom/Envs/py3/lib/python3.4/site-packages/pandas/pandas/tools/plotting.py in _make_plot(self)
   1915                 kwds['style'] = style
   1916 
-> 1917             artists = plotf(ax, y, column_num=i, **kwds)
   1918             self._add_legend_handle(artists[0], label)
   1919 

/Users/tom/Envs/py3/lib/python3.4/site-packages/pandas/pandas/tools/plotting.py in plotf(ax, y, style, column_num, **kwds)
   1960         def plotf(ax, y, style=None, column_num=None, **kwds):
   1961             if LooseVersion(spv) >= '0.11.0':
-> 1962                 gkde = gaussian_kde(y, bw_method=self.bw_method)
   1963             else:
   1964                 gkde = gaussian_kde(y)

/Users/tom/Envs/py3/lib/python3.4/site-packages/scipy/stats/kde.py in __init__(self, dataset, bw_method)
    186 
    187         self.d, self.n = self.dataset.shape
--> 188         self.set_bandwidth(bw_method=bw_method)
    189 
    190     def evaluate(self, points):

/Users/tom/Envs/py3/lib/python3.4/site-packages/scipy/stats/kde.py in set_bandwidth(self, bw_method)
    496             raise ValueError(msg)
    497 
--> 498         self._compute_covariance()
    499 
    500     def _compute_covariance(self):

/Users/tom/Envs/py3/lib/python3.4/site-packages/scipy/stats/kde.py in _compute_covariance(self)
    507             self._data_covariance = atleast_2d(np.cov(self.dataset, rowvar=1,
    508                                                bias=False))
--> 509             self._data_inv_cov = linalg.inv(self._data_covariance)
    510 
    511         self.covariance = self._data_covariance * self.factor**2

/Users/tom/Envs/py3/lib/python3.4/site-packages/scipy/linalg/basic.py in inv(a, overwrite_a, check_finite)
    352 
    353     if check_finite:
--> 354         a1 = np.asarray_chkfinite(a)
    355     else:
    356         a1 = np.asarray(a)

/Users/tom/Envs/py3/lib/python3.4/site-packages/numpy/lib/function_base.py in asarray_chkfinite(a, dtype, order)
    593     if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
    594         raise ValueError(
--> 595                 "array must not contain infs or NaNs")
    596     return a
    597 

ValueError: array must not contain infs or NaNs

The default should be to drop missing observations.

@onesandzeroes
Copy link
Contributor

Missing values are ignored by all the DataFrame.plot() kinds, and there's no arg/switch to not ignore them, right? So we can just drop missing vals without checking any args. Seems pretty easy if so, I'll get a PR going.

@TomAugspurger
Copy link
Contributor Author

Closed by #8196

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants