Since 0.13: passing pandas DataFrame/Series like numpy array breaks #6127

twiecki · 2014-01-27T16:16:52Z

As discussed in #6063:
I noticed that that numpy-style access breaks sometimes under 0.13. While I haven't been able to pin-point the issue, calls like pylab.hist(-df.ix[row, col_name]) fail with some x[0] index error and I have to use pylab.hist(-df.ix[row, col_name]).values.

Here is a csv file for which this happens: https://gist.github.com/8651509

plt.hist(pd.read_csv('debug.csv'))

produces:

--------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-203-b20a1ff5d1db> in <module>()
----> 1 hist(pd.load('debug.pickle'))

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/matplotlib/pyplot.pyc in hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, hold, **kwargs)
   2825                       histtype=histtype, align=align, orientation=orientation,
   2826                       rwidth=rwidth, log=log, color=color, label=label,
-> 2827                       stacked=stacked, **kwargs)
   2828         draw_if_interactive()
   2829     finally:

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/matplotlib/axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   8247         # Massage 'x' for processing.
   8248         # NOTE: Be sure any changes here is also done below to 'weights'
-> 8249         if isinstance(x, np.ndarray) or not iterable(x[0]):
   8250             # TODO: support masked arrays;
   8251             x = np.asarray(x)

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    482     def __getitem__(self, key):
    483         try:
--> 484             result = self.index.get_value(self, key)
    485             if isinstance(result, np.ndarray):
    486                 return self._constructor(result,index=[key]*len(result)).__finalize__(self)

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
   1030 
   1031         try:
-> 1032             return self._engine.get_value(s, k)
   1033         except KeyError as e1:
   1034             if len(self) > 0 and self.inferred_type == 'integer':

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2890)()

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2702)()

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3440)()

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6595)()

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6536)()

KeyError: 0

While passing .values works.

The text was updated successfully, but these errors were encountered:

jreback · 2014-01-27T16:23:35Z

what numpy/matplotlib are you using here?

jreback · 2014-01-27T16:24:04Z

it could be something like this: http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#internal-refactoring

jreback · 2014-01-27T16:32:02Z

I think your df has float headers which maybe the problem

In [1]: import pylab

In [2]: pylab.hist(pd.read_csv('debug.csv',header=None)
   ...: 
KeyboardInterrupt

In [2]: pylab.hist(pd.read_csv('debug.csv',header=None))
Out[2]: 
([array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
  array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1])],
 array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ]),
 <a list of 2 Lists of Patches objects>)

INSTALLED VERSIONS
------------------
commit: 1112cb74264d40a91ce2a80f6bbbf24298a72f40
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-5-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.13.0rc1-151-g1777c89
Cython: 0.20
numpy: 1.7.1
scipy: 0.12.0
statsmodels: 0.5.0
IPython: 1.0.0
sphinx: 1.1.3
patsy: None
scikits.timeseries: None
dateutil: 1.5
pytz: None
bottleneck: 0.6.0
tables: 3.0.0
numexpr: 2.1
matplotlib: 1.2.0
openpyxl: 1.5.7
xlrd: 0.9.0
xlwt: None
xlsxwriter: None
sqlalchemy: None
lxml: 2.3.4
bs4: None
html5lib: None
bq: v2.0.15
apiclient: 1.0

twiecki · 2014-01-27T19:28:11Z

This does happen with a df that was read in via a csv that had proper columns; i.e. the error does not occur only when I load from the file I provided.

http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#internal-refactoring reads as if it could be the cause but I'm obviously not familiar enough with the internals. I only observed this when I slice a dataframe and select a column inside a df.ix[slice, col_name] like call and pass it to a function that expects numpy ndarrays.

INSTALLED VERSIONS
------------------
Python: 2.7.3.final.0
OS: Linux
Release: 3.2.0-29-generic
Processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: C

pandas: 0.13.0
Cython: 0.19.1
Numpy: 1.8.0
Scipy: 0.14.0.dev-a3e9c7f
statsmodels: 0.6.0.dev-fe6e688
    patsy: 0.2.1
scikits.timeseries: Not installed
dateutil: 2.2
pytz: 2013.9
bottleneck: Not installed
PyTables: 2.4.0
    numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: Not installed
xlsxwriter: Not installed
sqlalchemy: Not installed
lxml: 2.3.2
bs4: Not installed
html5lib: Not installed
bigquery: Not installed
apiclient: Not installed

jreback · 2014-01-27T19:32:10Z

can you try on master, ?

twiecki · 2014-01-27T19:41:22Z

So far couldn't reproduce on master!

twiecki · 2014-01-27T19:51:30Z

I'll close this and will reopen if problem resurfaces.

jreback · 2014-01-27T20:23:48Z

ok...gr8!

twiecki · 2014-01-28T21:33:26Z

ok, resurfaced.

Here is an updated file: https://gist.github.com/anonymous/8676957

I can trigger this by loading and passing this (loaded as anti_val):
hist(anti_val.ix[anti_val.cond == 'incong', 'rt'], bins=bins, histtype='step', normed=True);

jreback · 2014-01-28T22:01:39Z

can u put up the exact code u r using to load

twiecki · 2014-01-28T22:04:18Z

Hrm, I can't reproduce with the freshly loaded one, sorry... I guess I could pickle it but not sure how to upload that anywhere quick and easy.

jreback · 2014-01-28T22:28:22Z

Dropbox public link

jreback · 2014-01-29T00:45:56Z

any reason you dont use series.hist()?

jreback · 2014-01-29T23:45:15Z

@twiecki can you repro? about to release 0.13.1

twiecki · 2014-01-30T00:13:47Z

Sorry, here's the pickle that can reproduce it:
https://www.dropbox.com/s/1k9hln4cvoc1pev/anti_val.pickle

df = pd.read_pickle('/tmp/anti_val.pickle')
hist(df.ix[df.cond == 'incong', 'rt']);

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-164-bb7f9e1b7a1b> in <module>()
      1 df = pd.read_pickle('/tmp/anti_val.pickle')
----> 2 hist(df.ix[df.cond == 'incong', 'rt']);

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/matplotlib/pyplot.pyc in hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, hold, **kwargs)
   2825                       histtype=histtype, align=align, orientation=orientation,
   2826                       rwidth=rwidth, log=log, color=color, label=label,
-> 2827                       stacked=stacked, **kwargs)
   2828         draw_if_interactive()
   2829     finally:

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/matplotlib/axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   8247         # Massage 'x' for processing.
   8248         # NOTE: Be sure any changes here is also done below to 'weights'
-> 8249         if isinstance(x, np.ndarray) or not iterable(x[0]):
   8250             # TODO: support masked arrays;
   8251             x = np.asarray(x)

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    487     def __getitem__(self, key):
    488         try:
--> 489             result = self.index.get_value(self, key)
    490             if isinstance(result, np.ndarray):
    491                 return self._constructor(result,index=[key]*len(result)).__finalize__(self)

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
   1030 
   1031         try:
-> 1032             return self._engine.get_value(s, k)
   1033         except KeyError as e1:
   1034             if len(self) > 0 and self.inferred_type == 'integer':

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2957)()

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2772)()

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3498)()

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6930)()

/home/ipython/envs/ipynb/local/lib/python2.7/site-packages/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6871)()

KeyError: 0

jreback · 2014-01-30T01:28:26Z

ok, this will reproduce it:

pylab.hist(Series([1,2,3],index=[1,2,3]))

Here's what is happening. Matplotlib thinks it is always passed an ndarray, or an iterable, so it first
checks if its an ndarray (which < 0.13 Series WAS an ndarray) so it didn't get to the second part of the check, which is checking x[0] which normally is the 0th element, but since you don't have an index of 0 then this raises a KeyError.

If you have a 0th element that all is good.

Matplotlib should be trapping this exception , so I believe this is a trivial bug there.

work-arounds:

use pandas .hist method
pass the actual .values
trap the exception (by wrapping .hist) which your own routine that does one of the above

IIRC matplotlib < 1.3 doesn't have this issue.

twiecki · 2014-01-30T01:49:29Z

Hm, OK. I always thought it was a nice feature that a pandas df behaved like a ndarray. And I think this happened not only with hist(). Isn't there some way to fake the isinstance() check?

jreback · 2014-01-30T01:58:05Z

I have tried that, but its essentially a c-level call, not even with a MetaClass.
well it works for a lot of stuff, but numpy is hard headed about it, no way to get around it.

blame it on matplotlib!!! maybe file a bug report!

twiecki · 2014-01-30T02:09:35Z

I see. Well maybe it's better to explicit in any case; I've just gotten used to passing it around like a ndarray. Agreed that it's a matplotlib problem.

This commit prevents the KeyError raised when DataFrame.plot() is called with xerr or yerr being a Series or DataFrame whose index doesn't include 0. The error comes from matplotlib code which tries to access xerr[0] or yerr[0], so to solve the problem, we convert xerr and yerr from Pandas objects to NumPy ndarrays before sending them through to matplotlib. This is a different instance of the same type of problem in Github issues pandas-dev#4493 and pandas-dev#6127 (and perhaps others).

twiecki mentioned this issue Jan 27, 2014

Changes in .ix behavior that break backwards compat #6063

Closed

twiecki closed this as completed Jan 27, 2014

twiecki reopened this Jan 28, 2014

jreback closed this as completed Jan 30, 2014

twiecki mentioned this issue Jan 30, 2014

Compatibility with pandas 0.13 matplotlib/matplotlib#2775

Closed

jreback mentioned this issue Mar 31, 2014

KeyError: 0L with pandas 0.13.1 #6750

Closed

joehand mentioned this issue Jun 12, 2014

KeyError when generating scatter plot of DataFrame columns #4493

Closed

diazona mentioned this issue Dec 17, 2015

Index without 0 in xerr/yerr causes KeyError #11858

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Since 0.13: passing pandas DataFrame/Series like numpy array breaks #6127

Since 0.13: passing pandas DataFrame/Series like numpy array breaks #6127

twiecki commented Jan 27, 2014

jreback commented Jan 27, 2014

jreback commented Jan 27, 2014

jreback commented Jan 27, 2014

twiecki commented Jan 27, 2014

jreback commented Jan 27, 2014

twiecki commented Jan 27, 2014

twiecki commented Jan 27, 2014

jreback commented Jan 27, 2014

twiecki commented Jan 28, 2014

jreback commented Jan 28, 2014

twiecki commented Jan 28, 2014

jreback commented Jan 28, 2014

jreback commented Jan 29, 2014

jreback commented Jan 29, 2014

twiecki commented Jan 30, 2014

jreback commented Jan 30, 2014

twiecki commented Jan 30, 2014

jreback commented Jan 30, 2014

twiecki commented Jan 30, 2014

Since 0.13: passing pandas DataFrame/Series like numpy array breaks #6127

Since 0.13: passing pandas DataFrame/Series like numpy array breaks #6127

Comments

twiecki commented Jan 27, 2014

jreback commented Jan 27, 2014

jreback commented Jan 27, 2014

jreback commented Jan 27, 2014

twiecki commented Jan 27, 2014

jreback commented Jan 27, 2014

twiecki commented Jan 27, 2014

twiecki commented Jan 27, 2014

jreback commented Jan 27, 2014

twiecki commented Jan 28, 2014

jreback commented Jan 28, 2014

twiecki commented Jan 28, 2014

jreback commented Jan 28, 2014

jreback commented Jan 29, 2014

jreback commented Jan 29, 2014

twiecki commented Jan 30, 2014

jreback commented Jan 30, 2014

twiecki commented Jan 30, 2014

jreback commented Jan 30, 2014

twiecki commented Jan 30, 2014