Skip to content

Scatter plot should have a discrete colorbar when 'c' is integer #12380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
s-celles opened this issue Feb 18, 2016 · 3 comments · Fixed by #34293
Closed

Scatter plot should have a discrete colorbar when 'c' is integer #12380

s-celles opened this issue Feb 18, 2016 · 3 comments · Fixed by #34293
Labels
Milestone

Comments

@s-celles
Copy link
Contributor

import pandas as pd
pd.set_option('max_rows', 10)
import sklearn.datasets as datasets
import pandas_ml as pdml  # https://github.com/pandas-ml/pandas-ml
import matplotlib.pyplot as plt
df = pdml.ModelFrame(datasets.load_iris())
print(df)
     .target  sepal length (cm)  sepal width (cm)  petal length (cm)  \
0          0                5.1               3.5                1.4
1          0                4.9               3.0                1.4
2          0                4.7               3.2                1.3
3          0                4.6               3.1                1.5
4          0                5.0               3.6                1.4
..       ...                ...               ...                ...
145        2                6.7               3.0                5.2
146        2                6.3               2.5                5.0
147        2                6.5               3.0                5.2
148        2                6.2               3.4                5.4
149        2                5.9               3.0                5.1

     petal width (cm)
0                 0.2
1                 0.2
2                 0.2
3                 0.2
4                 0.2
..                ...
145               2.3
146               1.9
147               2.0
148               2.3
149               1.8

[150 rows x 5 columns]

print(df.dtypes)
.target                int64
sepal length (cm)    float64
sepal width (cm)     float64
petal length (cm)    float64
petal width (cm)     float64
dtype: object
df.plot.scatter(x='sepal length (cm)', y='sepal width (cm)', c='.target')

plt.show()

iris

Expected Output

A scatter plot with a colobar with a discrete colormap

Additional information

Converting .target column to a category doesn't fix the problem.
Maybe I should open an other issue about it.

df['.target'] = df['.target'].astype('category')
df.plot.scatter(x='sepal length (cm)', y='sepal width (cm)', c='.target')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
//anaconda/lib/python3.5/site-packages/matplotlib/colors.py in to_rgb(self, arg)
    305                     else:
--> 306                         fl = float(argl)
    307                         if fl < 0 or fl > 1:

ValueError: could not convert string to float: '.target'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
//anaconda/lib/python3.5/site-packages/matplotlib/colors.py in to_rgba(self, arg, alpha)
    369             else:
--> 370                 r, g, b = self.to_rgb(arg)
    371             if alpha is None:

//anaconda/lib/python3.5/site-packages/matplotlib/colors.py in to_rgb(self, arg)
    327             raise ValueError(
--> 328                 'to_rgb: Invalid rgb arg "%s"\n%s' % (str(arg), exc))
    329             # Error messages could be improved by handling TypeError

ValueError: to_rgb: Invalid rgb arg ".target"
could not convert string to float: '.target'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
//anaconda/lib/python3.5/site-packages/matplotlib/colors.py in to_rgba_array(self, c, alpha)
    398             # Single value? Put it in an array with a single row.
--> 399             return np.array([self.to_rgba(c, alpha)], dtype=np.float)
    400         except ValueError:

//anaconda/lib/python3.5/site-packages/matplotlib/colors.py in to_rgba(self, arg, alpha)
    375             raise ValueError(
--> 376                 'to_rgba: Invalid rgba arg "%s"\n%s' % (str(arg), exc))
    377

ValueError: to_rgba: Invalid rgba arg ".target"
to_rgb: Invalid rgb arg ".target"
could not convert string to float: '.target'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
//anaconda/lib/python3.5/site-packages/matplotlib/colors.py in to_rgb(self, arg)
    305                     else:
--> 306                         fl = float(argl)
    307                         if fl < 0 or fl > 1:

ValueError: could not convert string to float: '.'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
//anaconda/lib/python3.5/site-packages/matplotlib/colors.py in to_rgba(self, arg, alpha)
    369             else:
--> 370                 r, g, b = self.to_rgb(arg)
    371             if alpha is None:

//anaconda/lib/python3.5/site-packages/matplotlib/colors.py in to_rgb(self, arg)
    327             raise ValueError(
--> 328                 'to_rgb: Invalid rgb arg "%s"\n%s' % (str(arg), exc))
    329             # Error messages could be improved by handling TypeError

ValueError: to_rgb: Invalid rgb arg "."
could not convert string to float: '.'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-8-19145e105567> in <module>()
----> 1 df.plot.scatter(x='sepal length (cm)', y='sepal width (cm)', c='.target')

//anaconda/lib/python3.5/site-packages/pandas/tools/plotting.py in scatter(self, x, y, s, c, **kwds)
   3847         axes : matplotlib.AxesSubplot or np.array of them
   3848         """
-> 3849         return self(kind='scatter', x=x, y=y, c=c, s=s, **kwds)
   3850
   3851     def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None,

//anaconda/lib/python3.5/site-packages/pandas/tools/plotting.py in __call__(self, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)
   3669                           fontsize=fontsize, colormap=colormap, table=table,
   3670                           yerr=yerr, xerr=xerr, secondary_y=secondary_y,
-> 3671                           sort_columns=sort_columns, **kwds)
   3672     __call__.__doc__ = plot_frame.__doc__
   3673

//anaconda/lib/python3.5/site-packages/pandas/tools/plotting.py in plot_frame(data, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)
   2554                  yerr=yerr, xerr=xerr,
   2555                  secondary_y=secondary_y, sort_columns=sort_columns,
-> 2556                  **kwds)
   2557
   2558

//anaconda/lib/python3.5/site-packages/pandas/tools/plotting.py in _plot(data, x, y, subplots, ax, kind, **kwds)
   2382         plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
   2383
-> 2384     plot_obj.generate()
   2385     plot_obj.draw()
   2386     return plot_obj.result

//anaconda/lib/python3.5/site-packages/pandas/tools/plotting.py in generate(self)
    985         self._compute_plot_data()
    986         self._setup_subplots()
--> 987         self._make_plot()
    988         self._add_table()
    989         self._make_legend()

//anaconda/lib/python3.5/site-packages/pandas/tools/plotting.py in _make_plot(self)
   1557             label = None
   1558         scatter = ax.scatter(data[x].values, data[y].values, c=c_values,
-> 1559                              label=label, cmap=cmap, **self.kwds)
   1560         if cb:
   1561             img = ax.collections[0]

//anaconda/lib/python3.5/site-packages/matplotlib/__init__.py in inner(ax, *args, **kwargs)
   1810                     warnings.warn(msg % (label_namer, func.__name__),
   1811                                   RuntimeWarning, stacklevel=2)
-> 1812             return func(ax, *args, **kwargs)
   1813         pre_doc = inner.__doc__
   1814         if pre_doc is None:

//anaconda/lib/python3.5/site-packages/matplotlib/axes/_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
   3891                 offsets=offsets,
   3892                 transOffset=kwargs.pop('transform', self.transData),
-> 3893                 alpha=alpha
   3894                 )
   3895         collection.set_transform(mtransforms.IdentityTransform())

//anaconda/lib/python3.5/site-packages/matplotlib/collections.py in __init__(self, paths, sizes, **kwargs)
    829         """
    830
--> 831         Collection.__init__(self, **kwargs)
    832         self.set_paths(paths)
    833         self.set_sizes(sizes)

//anaconda/lib/python3.5/site-packages/matplotlib/collections.py in __init__(self, edgecolors, facecolors, linewidths, linestyles, antialiaseds, offsets, transOffset, norm, cmap, pickradius, hatch, urls, offset_position, zorder, **kwargs)
    115
    116         self.set_edgecolor(edgecolors)
--> 117         self.set_facecolor(facecolors)
    118         self.set_linewidth(linewidths)
    119         self.set_linestyle(linestyles)

//anaconda/lib/python3.5/site-packages/matplotlib/collections.py in set_facecolor(self, c)
    610             c = mpl.rcParams['patch.facecolor']
    611         self._facecolors_original = c
--> 612         self._facecolors = mcolors.colorConverter.to_rgba_array(c, self._alpha)
    613         self.stale = True
    614

//anaconda/lib/python3.5/site-packages/matplotlib/colors.py in to_rgba_array(self, c, alpha)
    420             result = np.zeros((nc, 4), dtype=np.float)
    421             for i, cc in enumerate(c):
--> 422                 result[i] = self.to_rgba(cc, alpha)
    423             return result
    424

//anaconda/lib/python3.5/site-packages/matplotlib/colors.py in to_rgba(self, arg, alpha)
    374         except (TypeError, ValueError) as exc:
    375             raise ValueError(
--> 376                 'to_rgba: Invalid rgba arg "%s"\n%s' % (str(arg), exc))
    377
    378     def to_rgba_array(self, c, alpha=None):

ValueError: to_rgba: Invalid rgba arg "."
to_rgb: Invalid rgb arg "."
could not convert string to float: '.'

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.0.2
setuptools: 19.6.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
Jinja2: 2.8

Pinging @sinhrks

See also http://stanford.edu/~mwaskom/software/seaborn/examples/scatterplot_matrix.html

@jreback
Copy link
Contributor

jreback commented Feb 18, 2016

@TomAugspurger

@TomAugspurger
Copy link
Contributor

Similar seaborn issue here, which I now see was you @scls19fr :)

I suspect that this would be OK right? Can we think of anything this would break? How many discrete colors can we handle before we should switch back to a continuous scale?

Also +1 for handling categoricals, we can do that here or an other issue. The only slight wrinkle is the appropriate colormap for ordered vs. unordered categoricals. We could use different defaults for either, but maybe that's up to the user to choose.

@s-celles
Copy link
Contributor Author

For ordered categoricals (and integers), I think we should have a discrete colorbar (up to a given number of possible values) and a continuous colorbar beyond this value.

For unordered categoricals, a legend like
legend
could be a better idea

@jreback jreback added this to the 1.3 milestone Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants