KeyError when generating scatter plot of DataFrame columns #4493

fonnesbeck · 2013-08-07T02:32:17Z

I have two columns of a DataFrame that I am trying to plot using scatter in Matplotlib. Here are the two series:

mmsi
210234000    52.600000
211109000    62.250000
211200350    48.333333
211201390    52.887500
211203370    41.962500
211204500    32.233333
211205780    87.050000
211205790    42.500000
211207740    38.233333
211220290    53.233333
211262460    40.100000
211262630    31.716667
211327410    54.300000
211335760    43.175000
211378120    58.361290
...
636090986    62.812500
636091049    70.175000
636091052    45.600000
636091078    82.725000
636091137    59.075000
636091138    61.083333
636091195    75.030000
636091278    68.241667
636091394    57.700000
636091452    44.275000
636091501    63.480000
636091506    67.578571
636091545    44.500000
636091595    46.475000
636091800    61.850000
Name: pdgt10_early, Length: 235, dtype: float64

mmsi
210234000    True
211109000    True
211200350    True
211201390    True
211203370    True
211204500    True
211205780    True
211205790    True
211207740    True
211220290    True
211262460    True
211262630    True
211327410    True
211335760    True
211378120    True
...
636090986    True
636091049    True
636091052    True
636091078    True
636091137    True
636091138    True
636091195    True
636091278    True
636091394    True
636091452    True
636091501    True
636091506    True
636091545    True
636091595    True
636091800    True
Length: 235, dtype: bool

(Actually the latter is the result of a comparison operation on the column to see if values are greater or less than some threshold value)

Unfortunately, calling scatter on these two axes raises a KeyError: 0 exception for some reason.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-52-4b02ddd54f50> in <module>()
      1 for sma in sma_list:
----> 2     plot_logistic('Cargo', sma, nova_traces[np.where(sma_list==sma)[0][0]].traces[1], season_a=-3)

<ipython-input-51-94c541930a55> in plot_logistic(ship, sma, trace, dataset, season_a, season_b, nlines)
     14     print x
     15     print y
---> 16     scatter(x, y)
     17     title('{0} in {1}'.format(ship, sma))
     18 

/Library/Python/2.7/site-packages/matplotlib-1.4.x-py2.7-macosx-10.8-intel.egg/matplotlib/pyplot.pyc in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, hold, **kwargs)
   3088         ret = ax.scatter(x, y, s=s, c=c, marker=marker, cmap=cmap, norm=norm,
   3089                          vmin=vmin, vmax=vmax, alpha=alpha,
-> 3090                          linewidths=linewidths, verts=verts, **kwargs)
   3091         draw_if_interactive()
   3092     finally:

/Library/Python/2.7/site-packages/matplotlib-1.4.x-py2.7-macosx-10.8-intel.egg/matplotlib/axes/_axes.pyc in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, **kwargs)
   3210             self.cla()
   3211 
-> 3212         self._process_unit_info(xdata=x, ydata=y, kwargs=kwargs)
   3213         x = self.convert_xunits(x)
   3214         y = self.convert_yunits(y)

/Library/Python/2.7/site-packages/matplotlib-1.4.x-py2.7-macosx-10.8-intel.egg/matplotlib/axes/_base.pyc in _process_unit_info(self, xdata, ydata, kwargs)
   1652             # we only need to update if there is nothing set yet.
   1653             if not self.xaxis.have_units():
-> 1654                 self.xaxis.update_units(xdata)
   1655             #print '\tset from xdata', self.xaxis.units
   1656 

/Library/Python/2.7/site-packages/matplotlib-1.4.x-py2.7-macosx-10.8-intel.egg/matplotlib/axis.pyc in update_units(self, data)
   1330         """
   1331 
-> 1332         converter = munits.registry.get_converter(data)
   1333         if converter is None:
   1334             return False

/Library/Python/2.7/site-packages/matplotlib-1.4.x-py2.7-macosx-10.8-intel.egg/matplotlib/units.pyc in get_converter(self, x)
    135 
    136         if isinstance(x, np.ndarray) and x.size:
--> 137             converter = self.get_converter(x.ravel()[0])
    138             return converter
    139 

/Library/Python/2.7/site-packages/pandas-0.12.0_80_gfcaf9a6_20130729-py2.7-macosx-10.8-intel.egg/pandas/core/series.pyc in __getitem__(self, key)
    617     def __getitem__(self, key):
    618         try:
--> 619             return self.index.get_value(self, key)
    620         except InvalidIndexError:
    621             pass

/Library/Python/2.7/site-packages/pandas-0.12.0_80_gfcaf9a6_20130729-py2.7-macosx-10.8-intel.egg/pandas/core/index.pyc in get_value(self, series, key)
    722         """
    723         try:
--> 724             return self._engine.get_value(series, key)
    725         except KeyError as e1:
    726             if len(self) > 0 and self.inferred_type == 'integer':

/Library/Python/2.7/site-packages/pandas-0.12.0_80_gfcaf9a6_20130729-py2.7-macosx-10.8-intel.egg/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2646)()

/Library/Python/2.7/site-packages/pandas-0.12.0_80_gfcaf9a6_20130729-py2.7-macosx-10.8-intel.egg/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2461)()

/Library/Python/2.7/site-packages/pandas-0.12.0_80_gfcaf9a6_20130729-py2.7-macosx-10.8-intel.egg/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3198)()

/Library/Python/2.7/site-packages/pandas-0.12.0_80_gfcaf9a6_20130729-py2.7-macosx-10.8-intel.egg/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6422)()

/Library/Python/2.7/site-packages/pandas-0.12.0_80_gfcaf9a6_20130729-py2.7-macosx-10.8-intel.egg/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6366)()

KeyError: 0

This worked like a charm prior to updating pandas and matplotlib a week or so ago.

Running Python 2.7.2 on OS X 10.8.4.

The text was updated successfully, but these errors were encountered:

cpcloud · 2013-08-07T03:28:33Z

@fonnesbeck sorry to hear about all your plotting troubles! i'll check this one out

cpcloud · 2013-08-07T04:21:09Z

if i do

In [49]: mmsi = Series(normal(50, 10, size=1000), name='mmsi')

In [50]: mmsi_thr = mmsi < 40

In [51]: scatter(mmsi, mmsi_thr)

I get the following plot:

Here are my versions

INSTALLED VERSIONS
------------------
Python: 2.7.5.final.0
OS: Linux 3.10.5-1-ARCH #1 SMP PREEMPT Mon Aug 5 08:04:22 CEST 2013 x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

Cython: 0.19.1
Numpy: 1.7.1
Scipy: 0.12.0
statsmodels: 0.4.3
    patsy: 0.1.0
scikits.timeseries: 0.91.3
dateutil: 2.1
pytz: 2013b
bottleneck: 0.6.0
PyTables: 3.0.0
    numexpr: 2.1
matplotlib: 1.3.0
openpyxl: 1.6.2
xlrd: 0.9.2
xlwt: 0.7.5
sqlalchemy: 0.8.1
lxml: 3.2.3
bs4: 4.2.1
html5lib: 0.95-dev

fonnesbeck · 2013-08-07T13:34:36Z

Try going back to my example, and doing the following:

dataset = pd.read_csv("processed_data.csv", index_col=0, parse_dates=[-2,-1])
plt.scatter(dataset.pdgt10,dataset.loa)

cpcloud · 2013-08-07T14:34:55Z

whoa i get a totally different error than the one you showed above... you can't plot dataset.loa bc it has object dtype...try this:

converters = {'active': lambda x: bool(int(x))}
df = read_csv('processed_data.csv', index_col=0, parse_dates=[-2, -1], converters=converters)
df = df.convert_objects(convert_numeric=True)
scatter(df.pdgt10, df.loa)

You should get:

cpcloud · 2013-08-07T14:36:36Z

I wonder why loa is not being converted to float64 in read_csv

cpcloud · 2013-08-07T15:39:23Z

@fonnesbeck

in loa there are strings like "0.0/0.33":

to see it do

In [29]: df = read_csv('processed_data.csv', index_col=0, parse_dates=[-2, -1])

In [30]: counts = df.loa.str.count(r'\.')

In [31]: df.loa[counts > 1]
Out[31]:
2014    0.0/33.0
2015    0.0/33.0
2016    0.0/33.0
2017    0.0/33.0
2018    0.0/33.0
2019    0.0/33.0
2020    0.0/33.0
2021    0.0/33.0
2022    0.0/33.0
2023    0.0/33.0
2024    0.0/33.0
2025    0.0/33.0
2026    0.0/33.0
2027    0.0/33.0
2028    0.0/33.0
...
262353    0.0/33.0
262354    0.0/33.0
262355    0.0/33.0
262356    0.0/33.0
262357    0.0/33.0
262358    0.0/33.0
262359    0.0/33.0
262360    0.0/33.0
262361    0.0/33.0
262362    0.0/33.0
262363    0.0/33.0
262364    0.0/33.0
262365    0.0/33.0
262366    0.0/33.0
262367    0.0/33.0
Name: loa, Length: 38303, dtype: object

cpcloud · 2013-08-07T15:42:10Z

Also some that can't just evaluate to 0

In [37]: df.loa[counts > 1][~df.loa[counts > 1].str.startswith('0.0')]
Out[37]:
2619    276.0/277.0
2620    276.0/277.0
2621    276.0/277.0
2622    276.0/277.0
2623    276.0/277.0
2624    276.0/277.0
2625    276.0/277.0
2626    276.0/277.0
2627    276.0/277.0
2628    276.0/277.0
2629    276.0/277.0
2630    276.0/277.0
2631    276.0/277.0
2632    276.0/277.0
2633    276.0/277.0
...
262050    213.0/217.0
262128      44.0/78.0
262129      44.0/78.0
262130      44.0/78.0
262131      44.0/78.0
262132      44.0/78.0
262133      44.0/78.0
262134      44.0/78.0
262135      44.0/78.0
262136      44.0/78.0
262137      44.0/78.0
262138      44.0/78.0
262139      44.0/78.0
262140    142.0/148.0
262141    142.0/148.0
Name: loa, Length: 30000, dtype: object

fonnesbeck · 2013-08-07T17:13:14Z

Sorry, that was a bad choice of column. The one in my real model is the result of a merge on 2 tables like this and use modified pdgt10 columns, without these unconvertable values. I will get a better example up when I get a chance.

fonnesbeck · 2013-08-07T17:21:23Z

Try the test_data.csv dataset, with the following:

sma_data = pd.read_csv('test_data.csv')
scatter(sma_data.pdgt10_early, sma_data.pdgt10_late<sma_data.pdgt10_early)

and see if you do not get the KeyError.

cpcloud · 2013-08-07T17:27:32Z

i don't get the error

In [1]: df = read_csv('test_data.csv')

In [2]: df
Out[2]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 143 entries, 0 to 142
Data columns (total 4 columns):
mmsi            143  non-null values
pdgt10_early    143  non-null values
pdgt10_late     143  non-null values
type            143  non-null values
dtypes: float64(2), int64(1), object(1)

In [3]: df.pdgt10_early
Out[3]:
0     77.500
1     84.467
2     27.037
3     54.200
4     25.248
5     28.750
6     44.362
7     92.600
8      9.300
9     26.525
10    63.400
11    36.750
12    12.050
13    20.867
14    85.150
...
128     2.581
129    19.320
130     6.444
131    21.477
132    36.575
133    46.950
134    14.450
135     0.100
136    91.700
137    10.418
138     0.014
139    11.393
140     7.045
141     1.257
142    51.050
Name: pdgt10_early, Length: 143, dtype: float64

In [4]: scatter(df.pdgt10_early, df.pdgt10_late < df.pdgt10_early)
Out[4]: <matplotlib.collections.PathCollection at 0x5ce0450>

fonnesbeck · 2013-08-07T17:29:42Z

The issue is probably related to this then:

In [2]: matplotlib.__version__
Out[2]: '1.4.x'

cpcloud · 2013-08-07T17:31:50Z

i'm guessing that's dev let me try it out

fonnesbeck · 2013-08-07T17:32:51Z

Yes, current master.

cpcloud · 2013-08-07T17:35:13Z

hm nope works with that too...

In [7]: mpl.__version__
Out[7]: '1.4.x'

fonnesbeck · 2013-08-07T17:41:39Z

Okay, back to the drawing board for me then.

cpcloud · 2013-08-07T17:41:45Z

can u try with .values on those Series? not sure why it should be different tho....

fonnesbeck · 2013-08-07T17:43:06Z

Will try. I wonder if your recent fix for my other issue was related? Getting on a plane; will update later. Thanks.

cpcloud · 2013-08-07T17:47:50Z

don't think so since these series have unique monotonic indices

fonnesbeck · 2013-08-07T19:54:51Z

Can you try importing the data with index_col=0 and see if you can replicate?

cpcloud · 2013-08-08T02:11:14Z

able to plot on 1.3.1 trying 1.4.x

cpcloud · 2013-08-08T02:15:19Z

NICE, GOT IT with 1.4.x

cpcloud · 2013-08-08T02:20:01Z

the issue is that matplotlib calls x.ravel()[0] on your Series objects, but they don't have a 0 index, thus the KeyError...the solution is to call .values

a little unsatisfying, but i'm not sure what else you can do...

cpcloud · 2013-08-08T02:30:39Z

i'll look at a diff of matplotlib for that function to see if there's anything we or they can do about it to make it compatible

@fonnesbeck thanks for the report!

fonnesbeck · 2013-08-17T01:59:20Z

BTW, I get similar behavior with RPlot (again, this worked a few weeks ago):

>>> cdystonia = pd.read_csv("../data/cdystonia.csv", index_col=None)
>>> cdystonia.head()

   patient  obs  week  site  id  treat  age sex  twstrs
0        1    1     0     1   1  5000U   65   F      32
1        1    2     2     1   1  5000U   65   F      30
2        1    3     4     1   1  5000U   65   F      24
3        1    4     8     1   1  5000U   65   F      37
4        1    5    12     1   1  5000U   65   F      39

>>> plt.figure(figsize=(12,12))
>>> bbp = RPlot(cdystonia, x='age', y='twstrs')
>>> bbp.add(TrellisGrid(['week', 'treat']))
>>> bbp.add(GeomScatter())
>>> bbp.add(GeomPolyFit(degree=2))
>>> _ = bbp.render(gcf())

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-30-4cbe319d1ffa> in <module>()
      4 bbp.add(GeomScatter())
      5 bbp.add(GeomPolyFit(degree=2))
----> 6 _ = bbp.render(gcf())

    /Library/Python/2.7/site-packages/pandas-0.12.0_146_g9cba0de_20130808-py2.7-macosx-10.8-intel.egg/pandas/tools/rplot.pyc in render(self, fig)
        878             # Prepare the subplots and draw on them
        879             new_layers = sequence_grids(new_layers)
    --> 880             axes_grids = [work_grid(grid, fig) for grid in new_layers]
        881             axes_grid = axes_grids[-1]
        882             adjust_subplots(fig, axes_grid, last_trellis, new_layers[-1])

    /Library/Python/2.7/site-packages/pandas-0.12.0_146_g9cba0de_20130808-py2.7-macosx-10.8-intel.egg/pandas/tools/rplot.pyc in work_grid(grid, fig)
        718         for col in range(ncols):
        719             axes[row][col] = fig.add_subplot(nrows, ncols, ncols * row + col + 1)
    --> 720             grid[row][col].work(ax=axes[row][col])
        721     return axes
        722 

    /Library/Python/2.7/site-packages/pandas-0.12.0_146_g9cba0de_20130808-py2.7-macosx-10.8-intel.egg/pandas/tools/rplot.pyc in work(self, fig, ax)
        462         x = self.data[self.aes['x']]
        463         y = self.data[self.aes['y']]
    --> 464         ax.scatter(x, y, marker=self.marker, c=self.colour, alpha=self.alpha)
        465         return fig, ax
        466 

    /Library/Python/2.7/site-packages/matplotlib-1.4.x-py2.7-macosx-10.8-intel.egg/matplotlib/axes/_axes.pyc in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, **kwargs)
       3210             self.cla()
       3211 
    -> 3212         self._process_unit_info(xdata=x, ydata=y, kwargs=kwargs)
       3213         x = self.convert_xunits(x)
       3214         y = self.convert_yunits(y)

    /Library/Python/2.7/site-packages/matplotlib-1.4.x-py2.7-macosx-10.8-intel.egg/matplotlib/axes/_base.pyc in _process_unit_info(self, xdata, ydata, kwargs)
       1652             # we only need to update if there is nothing set yet.
       1653             if not self.xaxis.have_units():
    -> 1654                 self.xaxis.update_units(xdata)
       1655             #print '\tset from xdata', self.xaxis.units
       1656 

    /Library/Python/2.7/site-packages/matplotlib-1.4.x-py2.7-macosx-10.8-intel.egg/matplotlib/axis.pyc in update_units(self, data)
       1330         """
       1331 
    -> 1332         converter = munits.registry.get_converter(data)
       1333         if converter is None:
       1334             return False

    /Library/Python/2.7/site-packages/matplotlib-1.4.x-py2.7-macosx-10.8-intel.egg/matplotlib/units.pyc in get_converter(self, x)
        135 
        136         if isinstance(x, np.ndarray) and x.size:
    --> 137             converter = self.get_converter(x.ravel()[0])
        138             return converter
        139 

    /Library/Python/2.7/site-packages/pandas-0.12.0_146_g9cba0de_20130808-py2.7-macosx-10.8-intel.egg/pandas/core/series.pyc in __getitem__(self, key)
        616     def __getitem__(self, key):
        617         try:
    --> 618             return self.index.get_value(self, key)
        619         except InvalidIndexError:
        620             pass

    /Library/Python/2.7/site-packages/pandas-0.12.0_146_g9cba0de_20130808-py2.7-macosx-10.8-intel.egg/pandas/core/index.pyc in get_value(self, series, key)
        722         """
        723         try:
    --> 724             return self._engine.get_value(series, key)
        725         except KeyError as e1:
        726             if len(self) > 0 and self.inferred_type == 'integer':

    /Library/Python/2.7/site-packages/pandas-0.12.0_146_g9cba0de_20130808-py2.7-macosx-10.8-intel.egg/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2646)()

    /Library/Python/2.7/site-packages/pandas-0.12.0_146_g9cba0de_20130808-py2.7-macosx-10.8-intel.egg/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2461)()

/Library/Python/2.7/site-packages/pandas-0.12.0_146_g9cba0de_20130808-py2.7-macosx-10.8-intel.egg/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3198)()

/Library/Python/2.7/site-packages/pandas-0.12.0_146_g9cba0de_20130808-py2.7-macosx-10.8-intel.egg/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6422)()

/Library/Python/2.7/site-packages/pandas-0.12.0_146_g9cba0de_20130808-py2.7-macosx-10.8-intel.egg/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6366)()

KeyError: 0

cpcloud · 2013-08-17T02:06:53Z

recent change to that that ravel()[0] (which is the issue here) business in mpl checking it out now

cpcloud · 2013-08-17T02:10:02Z

@fonnesbeck can u try with latest matplotlib master? it's working on my end

cpcloud · 2013-08-17T02:12:07Z

here'st he diff for the offending chunk from latest matplotlib

diff --git a/lib/matplotlib/units.py b/lib/matplotlib/units.py
index 8f08a64..73d363a 100644
--- a/lib/matplotlib/units.py
+++ b/lib/matplotlib/units.py
@@ -134,8 +134,19 @@ class Registry(dict):
             converter = self.get(classx)

         if isinstance(x, np.ndarray) and x.size:
-            converter = self.get_converter(x.ravel()[0])
-            return converter
+            xravel = x.ravel()
+            try:
+                # pass the first value of x that is not masked back to
+                # get_converter
+                if not np.all(xravel.mask):
+                    # some elements are not masked
+                    converter = self.get_converter(
+                        xravel[np.argmin(xravel.mask)])
+                    return converter
+            except AttributeError:
+                # not a masked_array
+                converter = self.get_converter(xravel[0])
+                return converter

         if converter is None and iterable(x):
             for thisx in x:

fonnesbeck · 2013-08-17T02:18:51Z

success!

cpcloud · 2013-08-17T02:20:06Z

excellent!

cpcloud · 2013-08-17T02:23:56Z

hm i'm not sure this doing the right thing tho....

in pandas pre series->ndframe refactor xravel will be a Series object and if not np.all(xravel.mask) will test False because mask is a method on Series objects so actually this new mpl chunk of code will do nothing (will exit the try suite) when i'm not sure it's really supposed to

in post refactor seriess world the AttributeError will be caught since now ravel returns a raw numpy array (which means that it doesn't use the __array__ interface) and it will be successful in conversion because xravel[0] won't raise, when it could previously on a series, any time integer indices occur but 0 isn't in the index (for example)

cpcloud · 2013-08-17T02:31:48Z

OTOH maybe good enough is good enough

cpcloud · 2013-09-28T19:38:31Z

pushing to 0.14

cpcloud · 2014-02-27T17:48:47Z

closing as stale. @fonnesbeck ping us if this comes up again

joehand · 2014-06-12T19:48:10Z

Never mind, figured it out =). Was similar to issue #6127. Just passing .values instead of the numpy array.

@cpcloud, I'm getting what I think is the same error but when trying to create a histogram with matplotlib. Edit: After looking at the above errors a bit more, its not quite the same. Let me know if I should create a new issue for this.

I have Pandas 0.14, but still running matplotlib 1.3.1. This function was working previously on the same data (not sure what versions, it was a few months ago) but isn't now. I was on Pandas 0.13.x earlier and it wasn't working, tried upgrading and it still isn't working =(. Any ideas?

The histogram works on with some sub-sets of the data not others (example below). I haven't quite been able to find the differences in what is working or not.

Update: This issue has something to do with series which have indices that don't start at 0. I can get it working if create a new index from 0 -> Length with the same series that wasn't working before.

One of the sub-sets it is not working on:

49663     868.002991
49664     502.933014
49665     506.434998
49666     976.049011
49667    1735.439941
49668    1230.319946
49669     993.213013
49670    1551.069946
49671     732.591980
49672    1018.109985
49673     899.023010
49674     667.893005
49675    1405.270019
49676    1018.869995
49677    1117.680054
...
53686    4426.439941
53687    2557.149902
53688    1884.449951
53689     961.927002
53690    1248.160034
53691    2091.310059
53692    1292.329956
53693     860.046021
53694    1279.329956
53695    4214.609863
53696    3223.159912
53697    3815.270020
53698    2677.679932
53699    1410.280029
53700     807.500000
Name: PopDen, Length: 1570, dtype: float64

And here are the errors:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-11-0d069c72fca2> in <module>()
----> 5 plots.plot_single(atl, 'PopDen')

/Users/joe/Dropbox/SDI_Papers_CODE/statistics_neighborhoods/plot_functions.py in plot_single(self, df, plot_col, bins, skew, fig_title, xlabel, ylabel)
     73             x, mean, sigma = self._calc_params(log_col)
     74 
---> 75             plt.hist(log_col, bins, normed=True,  color=self.colors[0])
     76             plt.plot(x,mlab.normpdf(x,mean,sigma), label='Log-Normal', lw=3, color=self.colors[1])
     77 

/usr/local/lib/python2.7/site-packages/matplotlib/pyplot.pyc in hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, hold, **kwargs)
   2825                       histtype=histtype, align=align, orientation=orientation,
   2826                       rwidth=rwidth, log=log, color=color, label=label,
-> 2827                       stacked=stacked, **kwargs)
   2828         draw_if_interactive()
   2829     finally:

/usr/local/lib/python2.7/site-packages/matplotlib/axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   8247         # Massage 'x' for processing.
   8248         # NOTE: Be sure any changes here is also done below to 'weights'
-> 8249         if isinstance(x, np.ndarray) or not iterable(x[0]):
   8250             # TODO: support masked arrays;
   8251             x = np.asarray(x)

/usr/local/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    477     def __getitem__(self, key):
    478         try:
--> 479             result = self.index.get_value(self, key)
    480 
    481             if not np.isscalar(result):

/usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
   1169 
   1170         try:
-> 1171             return self._engine.get_value(s, k)
   1172         except KeyError as e1:
   1173             if len(self) > 0 and self.inferred_type == 'integer':

/usr/local/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2987)()

/usr/local/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2802)()

/usr/local/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3528)()

/usr/local/lib/python2.7/site-packages/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:7032)()

/usr/local/lib/python2.7/site-packages/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6973)()

KeyError: 0

This commit prevents the KeyError raised when DataFrame.plot() is called with xerr or yerr being a Series or DataFrame whose index doesn't include 0. The error comes from matplotlib code which tries to access xerr[0] or yerr[0], so to solve the problem, we convert xerr and yerr from Pandas objects to NumPy ndarrays before sending them through to matplotlib. This is a different instance of the same type of problem in Github issues pandas-dev#4493 and pandas-dev#6127 (and perhaps others).

ghost assigned cpcloud Aug 7, 2013

jreback mentioned this issue Sep 27, 2013

ENH: Scatterplot Method added #3473

Merged

cpcloud closed this as completed Feb 27, 2014

diazona mentioned this issue Dec 17, 2015

Index without 0 in xerr/yerr causes KeyError #11858

Closed

wesm unassigned cpcloud Oct 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError when generating scatter plot of DataFrame columns #4493

KeyError when generating scatter plot of DataFrame columns #4493

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 7, 2013

cpcloud commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 7, 2013

cpcloud commented Aug 7, 2013

cpcloud commented Aug 7, 2013

cpcloud commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 8, 2013

cpcloud commented Aug 8, 2013

cpcloud commented Aug 8, 2013

cpcloud commented Aug 8, 2013

fonnesbeck commented Aug 17, 2013

cpcloud commented Aug 17, 2013

cpcloud commented Aug 17, 2013

cpcloud commented Aug 17, 2013

fonnesbeck commented Aug 17, 2013

cpcloud commented Aug 17, 2013

cpcloud commented Aug 17, 2013

cpcloud commented Aug 17, 2013

cpcloud commented Sep 28, 2013

cpcloud commented Feb 27, 2014

joehand commented Jun 12, 2014

KeyError when generating scatter plot of DataFrame columns #4493

KeyError when generating scatter plot of DataFrame columns #4493

Comments

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 7, 2013

cpcloud commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 7, 2013

cpcloud commented Aug 7, 2013

cpcloud commented Aug 7, 2013

cpcloud commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 7, 2013

fonnesbeck commented Aug 7, 2013

cpcloud commented Aug 8, 2013

cpcloud commented Aug 8, 2013

cpcloud commented Aug 8, 2013

cpcloud commented Aug 8, 2013

fonnesbeck commented Aug 17, 2013

cpcloud commented Aug 17, 2013

cpcloud commented Aug 17, 2013

cpcloud commented Aug 17, 2013

fonnesbeck commented Aug 17, 2013

cpcloud commented Aug 17, 2013

cpcloud commented Aug 17, 2013

cpcloud commented Aug 17, 2013

cpcloud commented Sep 28, 2013

cpcloud commented Feb 27, 2014

joehand commented Jun 12, 2014