Scatter requires x column to be numeric #18755

naught101 · 2017-12-13T02:03:18Z

In [30]: df = pd.DataFrame(dict(a=['A', 'B', 'C'], b=[2, 3, 4]))

In [31]: plt.scatter(df['a'], df['b'])
Out[31]: <matplotlib.collections.PathCollection at 0x7f17e17b41d0>

This works, and produces this:

On the other hand, this doesn't:

In [32]: df.plot.scatter(x='a', y='b')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/data/documents/uni/phd/projects/FluxnetTrafficLights/scripts/plots/predictability_plots.py in <module>()                                                                                                     
----> 1 df.plot.scatter(x='a', y='b')

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in scatter(self, x, y, s, c, **kwds)                                                                                           
   2803         axes : matplotlib.AxesSubplot or np.array of them
   2804         """
-> 2805         return self(kind='scatter', x=x, y=y, c=c, s=s, **kwds)
   2806 
   2807     def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None,

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in __call__(self, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)                                                                                           
   2625                           fontsize=fontsize, colormap=colormap, table=table,
   2626                           yerr=yerr, xerr=xerr, secondary_y=secondary_y,
-> 2627                           sort_columns=sort_columns, **kwds)
   2628     __call__.__doc__ = plot_frame.__doc__
   2629 

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in plot_frame(data, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)                                                                                         
   1867                  yerr=yerr, xerr=xerr,
   1868                  secondary_y=secondary_y, sort_columns=sort_columns,
-> 1869                  **kwds)
   1870 
   1871 

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in _plot(data, x, y, subplots, ax, kind, **kwds)
   1650         if isinstance(data, DataFrame):
   1651             plot_obj = klass(data, x=x, y=y, subplots=subplots, ax=ax,
-> 1652                              kind=kind, **kwds)
   1653         else:
   1654             raise ValueError("plot kind %r can only be used for data frames"

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in __init__(self, data, x, y, s, c, **kwargs)
    808             # the handling of this argument later
    809             s = 20
--> 810         super(ScatterPlot, self).__init__(data, x, y, s=s, **kwargs)
    811         if is_integer(c) and not self.data.columns.holds_integer():
    812             c = self.data.columns[c]

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in __init__(self, data, x, y, **kwargs)
    783             y = self.data.columns[y]
    784         if len(self.data[x]._get_numeric_data()) == 0:
--> 785             raise ValueError(self._kind + ' requires x column to be numeric')
    786         if len(self.data[y]._get_numeric_data()) == 0:
    787             raise ValueError(self._kind + ' requires y column to be numeric')

ValueError: scatter requires x column to be numeric

Why does pandas require x to be numeric if matplotlib doesn't?

using versions from conda, on Kubuntu 17.10:

matplotlib                2.1.0            py36hba5de38_0  
pandas                    0.20.3                   py36_0  
python                    3.6.3                h0ef2715_3

The text was updated successfully, but these errors were encountered:

sinhrks · 2017-12-13T02:19:48Z

Related to #8113. Maybe we can pass x-values as it is?

naught101 · 2017-12-13T03:18:05Z

I tried commenting out the lines which raise the assertion: https://github.com/pandas-dev/pandas/blob/master/pandas/plotting/_core.py#L811

But then I got this:

In [7]: df.plot.scatter(x='a', y='b')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)                                                                                    
   2441             try:
-> 2442                 return self._engine.get_loc(key)
   2443             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)()

KeyError: 'a'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-7-d7c052f80cc7> in <module>()
----> 1 df.plot.scatter(x='a', y='b')

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in scatter(self, x, y, s, c, **kwds)                                                                                           
   2803         axes : matplotlib.AxesSubplot or np.array of them
   2804         """
-> 2805         return self(kind='scatter', x=x, y=y, c=c, s=s, **kwds)
   2806 
   2807     def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None,

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in __call__(self, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)                                                                                           
   2625                           fontsize=fontsize, colormap=colormap, table=table,
   2626                           yerr=yerr, xerr=xerr, secondary_y=secondary_y,
-> 2627                           sort_columns=sort_columns, **kwds)
   2628     __call__.__doc__ = plot_frame.__doc__
   2629 

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in plot_frame(data, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)                                                                                         
   1867                  yerr=yerr, xerr=xerr,
   1868                  secondary_y=secondary_y, sort_columns=sort_columns,
-> 1869                  **kwds)
   1870 
   1871 

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in _plot(data, x, y, subplots, ax, kind, **kwds)                                                                               
   1692         plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
   1693 
-> 1694     plot_obj.generate()
   1695     plot_obj.draw()
   1696     return plot_obj.result

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in generate(self)
    243         self._compute_plot_data()
    244         self._setup_subplots()
--> 245         self._make_plot()
    246         self._add_table()
    247         self._make_legend()

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in _make_plot(self)
    841         else:
    842             label = None
--> 843         scatter = ax.scatter(data[x].values, data[y].values, c=c_values,
    844                              label=label, cmap=cmap, **self.kwds)
    845         if cb:

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   1962             return self._getitem_multilevel(key)
   1963         else:
-> 1964             return self._getitem_column(key)
   1965 
   1966     def _getitem_column(self, key):

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)                                                                                                      
   1969         # get column
   1970         if self.columns.is_unique:
-> 1971             return self._get_item_cache(key)
   1972 
   1973         # duplicate columns & possible reduce dimensionality

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)                                                                                                   
   1643         res = cache.get(item)
   1644         if res is None:
-> 1645             values = self._data.get(item)
   1646             res = self._box_item_values(item, values)
   1647             cache[item] = res

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath)                                                                                                   
   3588 
   3589             if not isnull(item):
-> 3590                 loc = self.items.get_loc(item)
   3591             else:
   3592                 indexer = np.arange(len(self.items))[isnull(self.items)]

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2442                 return self._engine.get_loc(key)
   2443             except KeyError:
-> 2444                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2445 
   2446         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)()

KeyError: 'a'

The non-numeric columns seem to get dropped somehow?

bobhaffner · 2017-12-16T02:32:38Z

Adding to this. Looks like plt.scatter(df['a'], df['b']) won't work with older versions of matplotlib as I get ValueError: could not convert string to float: 'C' with version '2.0.2'

bobhaffner · 2017-12-16T03:02:07Z

and @naught101 yeah, 'a' is getting dropped here

jadolfbr · 2018-05-05T00:53:44Z

Any way around this issue still using the plot function?

ckaldemeyer · 2018-07-13T13:57:12Z

I have the same error and it occurs only when there is a column of type DateTime which is not even selected for the scatter plot.

RahulRAhhy123 · 2018-07-17T10:56:51Z

I am also getting a same error ValueError: scatter requires x column to be numeric when I am trying plot a column which is of datetime type...how to resolve this @naught101 @ckaldemeyer @sinhrks @bobhaffner @jadolfbr

jadolfbr · 2018-07-31T17:00:37Z

@RahulRAhhy123 - I don't have a workaround for this. I wish I could use scatter directly from pandas with a jitter for the values. For some types of data, this is much clearer than a bar plot.

pietz · 2018-10-16T08:31:33Z

What I did when I tried to plot a Date feature over multiple years on the x-axis was to convert it to a year-float representation like so:

df['DateNum'] = df.Date.dt.year + df.Date.dt.month / 12. + df.Date.dt.day / 30.

2018-09-01 --> 2018.75

This isn't super accurate, nor is it pretty but it got the job done for me.

fmagin · 2018-10-26T12:07:07Z

I am running into this issue when trying to scatterplot with x and y both initially being lists of datetime objects, but I have a workaround using matplotlib directly:

df = pd.DataFrame({'x':x,'y':y})

# Fails with 'ValueError: scatter requires x column to be numeric'
df.plot.scatter(x='x', y='y')

# Works, xdate=True is implicit
plt.plot_date(df['x'], df['y'], ydate=True)

jasonbono · 2019-08-19T15:59:11Z

Some of the workarounds may work, but it's an inconvenient bug that I'm, and presumably others, are still encountering. Does anyone know if the actual bug is being address?

TomAugspurger · 2019-08-19T16:12:24Z

I don't think anyone is working on this right now. Are you interested @jasonbono?

jasonbono · 2019-08-19T16:19:06Z

That's good to know--thanks. I can't promise to try and fix it right now, but that is part of why I asked.

RahulRAhhy123 · 2019-08-20T19:19:36Z

@jasonbono

you can convert datetime column to separate date, separate month and separate day column respectively. Finally, get rid of your datetime column and make sure new columns are of integer type.

r02b · 2019-08-21T20:35:28Z

+1
This happens to me when I actually plot some lines for a datetime index, then trying to add scatter plots to the original one fails :/

charlesdong1991 · 2019-12-23T18:10:54Z

take

dlainfiesta · 2020-06-15T15:02:31Z

I encounter the same bug. I have two columns, both Timestamp, and I get this reply when I try to plot scatter:

ax = patients.plot.scatter(
    title='Síntomas vs. Caso confirmado - COVID19',
    x='Onset',
    y='Confirmed',
    alpha=.1,
    lw=0,
    s=10,
    figsize=(6,6))

ValueError: scatter requires x column to be numeric

sinhrks added Visualization plotting Dtype Conversions Unexpected or buggy dtype conversions labels Dec 13, 2017

brownsarahm mentioned this issue Mar 14, 2018

BUG: xtick labels didn't work some times #20338

Closed

MarcoGorelli mentioned this issue Dec 23, 2019

df.plot() with kind='scatter' and datetime on x axis bug #30391

Closed

github-actions bot assigned charlesdong1991 Dec 23, 2019

charlesdong1991 mentioned this issue Dec 23, 2019

ENH: Allow scatter plot to plot objects and datetime type data #30434

Merged

4 tasks

jreback added this to the 1.0 milestone Jan 1, 2020

jreback closed this as completed in #30434 Jan 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scatter requires x column to be numeric #18755

Scatter requires x column to be numeric #18755

naught101 commented Dec 13, 2017

sinhrks commented Dec 13, 2017

naught101 commented Dec 13, 2017

bobhaffner commented Dec 16, 2017

bobhaffner commented Dec 16, 2017

jadolfbr commented May 5, 2018

ckaldemeyer commented Jul 13, 2018

RahulRAhhy123 commented Jul 17, 2018

jadolfbr commented Jul 31, 2018

pietz commented Oct 16, 2018 •

edited

Loading

fmagin commented Oct 26, 2018

jasonbono commented Aug 19, 2019

TomAugspurger commented Aug 19, 2019

jasonbono commented Aug 19, 2019

RahulRAhhy123 commented Aug 20, 2019

r02b commented Aug 21, 2019

charlesdong1991 commented Dec 23, 2019

dlainfiesta commented Jun 15, 2020

Scatter requires x column to be numeric #18755

Scatter requires x column to be numeric #18755

Comments

naught101 commented Dec 13, 2017

sinhrks commented Dec 13, 2017

naught101 commented Dec 13, 2017

bobhaffner commented Dec 16, 2017

bobhaffner commented Dec 16, 2017

jadolfbr commented May 5, 2018

ckaldemeyer commented Jul 13, 2018

RahulRAhhy123 commented Jul 17, 2018

jadolfbr commented Jul 31, 2018

pietz commented Oct 16, 2018 • edited Loading

fmagin commented Oct 26, 2018

jasonbono commented Aug 19, 2019

TomAugspurger commented Aug 19, 2019

jasonbono commented Aug 19, 2019

RahulRAhhy123 commented Aug 20, 2019

r02b commented Aug 21, 2019

charlesdong1991 commented Dec 23, 2019

dlainfiesta commented Jun 15, 2020

pietz commented Oct 16, 2018 •

edited

Loading