Skip to content

Scatter requires x column to be numeric #18755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
naught101 opened this issue Dec 13, 2017 · 17 comments · Fixed by #30434
Closed

Scatter requires x column to be numeric #18755

naught101 opened this issue Dec 13, 2017 · 17 comments · Fixed by #30434
Assignees
Labels
Dtype Conversions Unexpected or buggy dtype conversions Visualization plotting
Milestone

Comments

@naught101
Copy link

In [30]: df = pd.DataFrame(dict(a=['A', 'B', 'C'], b=[2, 3, 4]))

In [31]: plt.scatter(df['a'], df['b'])
Out[31]: <matplotlib.collections.PathCollection at 0x7f17e17b41d0>

This works, and produces this:

figure_1

On the other hand, this doesn't:

In [32]: df.plot.scatter(x='a', y='b')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/data/documents/uni/phd/projects/FluxnetTrafficLights/scripts/plots/predictability_plots.py in <module>()                                                                                                     
----> 1 df.plot.scatter(x='a', y='b')

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in scatter(self, x, y, s, c, **kwds)                                                                                           
   2803         axes : matplotlib.AxesSubplot or np.array of them
   2804         """
-> 2805         return self(kind='scatter', x=x, y=y, c=c, s=s, **kwds)
   2806 
   2807     def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None,

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in __call__(self, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)                                                                                           
   2625                           fontsize=fontsize, colormap=colormap, table=table,
   2626                           yerr=yerr, xerr=xerr, secondary_y=secondary_y,
-> 2627                           sort_columns=sort_columns, **kwds)
   2628     __call__.__doc__ = plot_frame.__doc__
   2629 

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in plot_frame(data, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)                                                                                         
   1867                  yerr=yerr, xerr=xerr,
   1868                  secondary_y=secondary_y, sort_columns=sort_columns,
-> 1869                  **kwds)
   1870 
   1871 

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in _plot(data, x, y, subplots, ax, kind, **kwds)
   1650         if isinstance(data, DataFrame):
   1651             plot_obj = klass(data, x=x, y=y, subplots=subplots, ax=ax,
-> 1652                              kind=kind, **kwds)
   1653         else:
   1654             raise ValueError("plot kind %r can only be used for data frames"

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in __init__(self, data, x, y, s, c, **kwargs)
    808             # the handling of this argument later
    809             s = 20
--> 810         super(ScatterPlot, self).__init__(data, x, y, s=s, **kwargs)
    811         if is_integer(c) and not self.data.columns.holds_integer():
    812             c = self.data.columns[c]

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in __init__(self, data, x, y, **kwargs)
    783             y = self.data.columns[y]
    784         if len(self.data[x]._get_numeric_data()) == 0:
--> 785             raise ValueError(self._kind + ' requires x column to be numeric')
    786         if len(self.data[y]._get_numeric_data()) == 0:
    787             raise ValueError(self._kind + ' requires y column to be numeric')

ValueError: scatter requires x column to be numeric

Why does pandas require x to be numeric if matplotlib doesn't?

using versions from conda, on Kubuntu 17.10:

matplotlib                2.1.0            py36hba5de38_0  
pandas                    0.20.3                   py36_0  
python                    3.6.3                h0ef2715_3  
@sinhrks
Copy link
Member

sinhrks commented Dec 13, 2017

Related to #8113. Maybe we can pass x-values as it is?

@sinhrks sinhrks added Visualization plotting Dtype Conversions Unexpected or buggy dtype conversions labels Dec 13, 2017
@naught101
Copy link
Author

I tried commenting out the lines which raise the assertion: https://github.com/pandas-dev/pandas/blob/master/pandas/plotting/_core.py#L811

But then I got this:

In [7]: df.plot.scatter(x='a', y='b')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)                                                                                    
   2441             try:
-> 2442                 return self._engine.get_loc(key)
   2443             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)()

KeyError: 'a'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-7-d7c052f80cc7> in <module>()
----> 1 df.plot.scatter(x='a', y='b')

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in scatter(self, x, y, s, c, **kwds)                                                                                           
   2803         axes : matplotlib.AxesSubplot or np.array of them
   2804         """
-> 2805         return self(kind='scatter', x=x, y=y, c=c, s=s, **kwds)
   2806 
   2807     def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None,

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in __call__(self, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)                                                                                           
   2625                           fontsize=fontsize, colormap=colormap, table=table,
   2626                           yerr=yerr, xerr=xerr, secondary_y=secondary_y,
-> 2627                           sort_columns=sort_columns, **kwds)
   2628     __call__.__doc__ = plot_frame.__doc__
   2629 

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in plot_frame(data, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)                                                                                         
   1867                  yerr=yerr, xerr=xerr,
   1868                  secondary_y=secondary_y, sort_columns=sort_columns,
-> 1869                  **kwds)
   1870 
   1871 

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in _plot(data, x, y, subplots, ax, kind, **kwds)                                                                               
   1692         plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
   1693 
-> 1694     plot_obj.generate()
   1695     plot_obj.draw()
   1696     return plot_obj.result

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in generate(self)
    243         self._compute_plot_data()
    244         self._setup_subplots()
--> 245         self._make_plot()
    246         self._add_table()
    247         self._make_legend()

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/plotting/_core.py in _make_plot(self)
    841         else:
    842             label = None
--> 843         scatter = ax.scatter(data[x].values, data[y].values, c=c_values,
    844                              label=label, cmap=cmap, **self.kwds)
    845         if cb:

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   1962             return self._getitem_multilevel(key)
   1963         else:
-> 1964             return self._getitem_column(key)
   1965 
   1966     def _getitem_column(self, key):

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)                                                                                                      
   1969         # get column
   1970         if self.columns.is_unique:
-> 1971             return self._get_item_cache(key)
   1972 
   1973         # duplicate columns & possible reduce dimensionality

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)                                                                                                   
   1643         res = cache.get(item)
   1644         if res is None:
-> 1645             values = self._data.get(item)
   1646             res = self._box_item_values(item, values)
   1647             cache[item] = res

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath)                                                                                                   
   3588 
   3589             if not isnull(item):
-> 3590                 loc = self.items.get_loc(item)
   3591             else:
   3592                 indexer = np.arange(len(self.items))[isnull(self.items)]

~/miniconda3/envs/science/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2442                 return self._engine.get_loc(key)
   2443             except KeyError:
-> 2444                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2445 
   2446         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)()

KeyError: 'a'

The non-numeric columns seem to get dropped somehow?

@bobhaffner
Copy link
Contributor

Adding to this. Looks like plt.scatter(df['a'], df['b']) won't work with older versions of matplotlib as I get ValueError: could not convert string to float: 'C' with version '2.0.2'

@bobhaffner
Copy link
Contributor

and @naught101 yeah, 'a' is getting dropped here

@jadolfbr
Copy link

jadolfbr commented May 5, 2018

Any way around this issue still using the plot function?

@ckaldemeyer
Copy link

I have the same error and it occurs only when there is a column of type DateTime which is not even selected for the scatter plot.

@RahulRAhhy123
Copy link

I am also getting a same error ValueError: scatter requires x column to be numeric when I am trying plot a column which is of datetime type...how to resolve this @naught101 @ckaldemeyer @sinhrks @bobhaffner @jadolfbr

@jadolfbr
Copy link

@RahulRAhhy123 - I don't have a workaround for this. I wish I could use scatter directly from pandas with a jitter for the values. For some types of data, this is much clearer than a bar plot.

@pietz
Copy link

pietz commented Oct 16, 2018

What I did when I tried to plot a Date feature over multiple years on the x-axis was to convert it to a year-float representation like so:

df['DateNum'] = df.Date.dt.year + df.Date.dt.month / 12. + df.Date.dt.day / 30.

2018-09-01 --> 2018.75

This isn't super accurate, nor is it pretty but it got the job done for me.

@fmagin
Copy link

fmagin commented Oct 26, 2018

I am running into this issue when trying to scatterplot with x and y both initially being lists of datetime objects, but I have a workaround using matplotlib directly:

df = pd.DataFrame({'x':x,'y':y})

# Fails with 'ValueError: scatter requires x column to be numeric'
df.plot.scatter(x='x', y='y')

# Works, xdate=True is implicit
plt.plot_date(df['x'], df['y'], ydate=True)

@jasonbono
Copy link

Some of the workarounds may work, but it's an inconvenient bug that I'm, and presumably others, are still encountering. Does anyone know if the actual bug is being address?

@TomAugspurger
Copy link
Contributor

I don't think anyone is working on this right now. Are you interested @jasonbono?

@jasonbono
Copy link

That's good to know--thanks. I can't promise to try and fix it right now, but that is part of why I asked.

@RahulRAhhy123
Copy link

@jasonbono

you can convert datetime column to separate date, separate month and separate day column respectively. Finally, get rid of your datetime column and make sure new columns are of integer type.

@r02b
Copy link

r02b commented Aug 21, 2019

+1
This happens to me when I actually plot some lines for a datetime index, then trying to add scatter plots to the original one fails :/

@charlesdong1991
Copy link
Member

take

@dlainfiesta
Copy link

I encounter the same bug. I have two columns, both Timestamp, and I get this reply when I try to plot scatter:

ax = patients.plot.scatter(
    title='Síntomas vs. Caso confirmado - COVID19',
    x='Onset',
    y='Confirmed',
    alpha=.1,
    lw=0,
    s=10,
    figsize=(6,6))

ValueError: scatter requires x column to be numeric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Visualization plotting
Projects
None yet