Skip to content

Styler.background_gradient needs to handle NaN values #14260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stonebig opened this issue Sep 20, 2016 · 19 comments
Closed

Styler.background_gradient needs to handle NaN values #14260

stonebig opened this issue Sep 20, 2016 · 19 comments
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string

Comments

@stonebig
Copy link
Contributor

stonebig commented Sep 20, 2016

This code is ok under Pandas-0.18.1 (from a @jreback demo)

%matplotlib inline
# Pandas interactive
import pandas as pd
import numpy as np
import seaborn as sns

# create a df with random datas
np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],
               axis=1)
df.iloc[0, 2] = np.nan

# interactive
from IPython.html import widgets
@widgets.interact
def f(h_neg=(0, 359, 1), h_pos=(0, 359), s=(0., 99.9), l=(0., 99.9)):
    return (df
             .style
             .background_gradient(
                cmap=sns.palettes.diverging_palette(
                     h_neg=h_neg, h_pos=h_pos, s=s, l=l, as_cmap=True)
             ).highlight_null()
           )

With Pandas-0.19rc1, I get a warning:

matplotlib-20160918a

see matplotlib/matplotlib#7129

@stonebig stonebig changed the title invalmid value transmitted to Matplotlib with pandas-0.19rc1 invalid value transmitted to Matplotlib with pandas-0.19rc1 Sep 20, 2016
@stonebig
Copy link
Contributor Author


INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 23 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.0rc1
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.0
statsmodels: 0.8.0rc1
xarray: 0.8.2
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: 1.4.4
bottleneck: 1.1.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.2
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: 0.9.3
lxml: None
bs4: 4.5.1
html5lib: 0.999999999
httplib2: None
apiclient: None
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: 0.2.1

@jorisvandenbossche
Copy link
Member

@stonebig Were you able to reproduce it with a simpler example? Or this the usage of the widgets, seaborn, .. needed to reproduce?

@stonebig
Copy link
Contributor Author

stonebig commented Sep 20, 2016

The line df.iloc[0, 2] = np.nan is triggering the warning.

Sorry, I'm not expert enough in Pandas to reduce the Jeff example.

Maybe it's a too wired usage of Pandas/Seaborn/Matplotlib/Numpy ?

@jorisvandenbossche
Copy link
Member

Yes, but can you trigger the warning without using interact?

@stonebig
Copy link
Contributor Author

Sorry, I don't know how to transform the code: if I remove interact, nothing happens

@jorisvandenbossche
Copy link
Member

So I get the warning with just running df.style.background_gradient(), so it seems interact/seaborn etc in any case have no influence on it, but only the NaN value in determining the colors for the background.

@jorisvandenbossche
Copy link
Member

See matplotlib/matplotlib#7129 (comment) for a more detailed reproducible example

@stonebig
Copy link
Contributor Author

the wired fact is that it's not failing with Pandas-0.18.1, or the error message was intercepted

@jorisvandenbossche
Copy link
Member

Did your numpy version change when upgrading from 0.18.1 to 0.19.0rc1?

@stonebig
Copy link
Contributor Author

stonebig commented Sep 20, 2016

now I went back from Matplotlibe-2.0.0b4 to 1.5.3 then 1.5.2, and as it was doing nothing I started doing the same with Pandas. no other package were armed nor touched in the process.

Maybe Jeff did a special hack for his demo, and the code has been normalized since that demo, or the ordering of operations has been changed.

@jorisvandenbossche jorisvandenbossche changed the title invalid value transmitted to Matplotlib with pandas-0.19rc1 Styler.background_gradient needs to handle NaN values Sep 20, 2016
@jorisvandenbossche jorisvandenbossche added Output-Formatting __repr__ of pandas objects, to_string IO HTML read_html, to_html, Styler.apply, Styler.applymap labels Sep 20, 2016
@jorisvandenbossche
Copy link
Member

So summary of discussion in matplotlib/matplotlib#7129. The warning is caused by a line of pure numpy code, so it is not related to pandas or matplotlib. But, pandas no longers suppresses this kind of warnings starting from 0.19.0 (#13145). This is why you only see the warning with the latest pandas.

But the conclusion for pandas is: matplotlib's cmap(vals) (which is done under the hood in background_gradient) is not supposed to handle NaN values, so we should deal with possible missing values on the pandas side to not get this warning.

@jorisvandenbossche jorisvandenbossche added this to the Next Major Release milestone Sep 20, 2016
@stonebig
Copy link
Contributor Author

after all this effort digging for the truth, it's sad the fix won't make it for Pandas 0.19 final

@jorisvandenbossche
Copy link
Member

PR is always welcome, and then it can be fixed in 0.19, no problem. The tag is just to indicate it is not a priority for 0.19

@stonebig
Copy link
Contributor Author

shouldn't it be tagged as a regression ?

@jorisvandenbossche
Copy link
Member

There is no change in behaviour, it's only a warning, so I don't think it's important enough for that.

@joelostblom
Copy link
Contributor

joelostblom commented Sep 28, 2019

Note that this affects several of the examples in the styling documentation (without explanation to what is happening) https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html#Builtin-styles

@jnothman
Copy link
Contributor

I think this remains an issue. nan should not be included in colour normalisation for background_gradient .

@jayenashar
Copy link

is it possible to pass a boolean DataFrame as the subset parameter of background_gradient? then we could use:

return (df
         .style
         .background_gradient(
            subset=~pd.isnull(df), # only apply gradient to the non-nan data
            cmap=sns.palettes.diverging_palette(
                 h_neg=h_neg, h_pos=h_pos, s=s, l=l, as_cmap=True)
         ).highlight_null()
       )

@jayenashar
Copy link

jayenashar commented Dec 27, 2019

i got this to style nan as 0 in the worst way possible. this is for pandas 0.25.3 and may not work for other versions. breaks highlight_null as well.

  def my_render(styler, **kwargs):
        # change the data before calling ._compute()
        # then change it back
        data = styler.data
        data_copy = data.copy()
        data_copy[data.isnull()] = 0
        styler.data = data_copy
        styler._compute()
        styler.data = data

        d = styler._translate()
        trimmed = [x for x in d["cellstyle"] if any(any(y) for y in x["props"])]
        d["cellstyle"] = trimmed
        d.update(kwargs)
        return styler.template.render(**d)
  pd.io.formats.style.Styler.render = my_render

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

5 participants