Skip to content

Weird IndexError for plotting after DF row selection #10891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelaye opened this issue Aug 23, 2015 · 7 comments
Closed

Weird IndexError for plotting after DF row selection #10891

michaelaye opened this issue Aug 23, 2015 · 7 comments
Labels

Comments

@michaelaye
Copy link
Contributor

I'm doing this:

df[df.CHANNEL=='MUV'].loc['2015'].DET_TEMP.plot()

with the index being a DatetimeIndex and if I don't plot it all works fine:

df[df.CHANNEL=='MUV'].loc['2015'][['CHANNEL','DET_TEMP']].head()

screenshot 2015-08-22 23 47 18

but when I try the plotting I get this error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-84-28500c6c30a0> in <module>()
----> 1 df[df.CHANNEL=='MUV'].loc['2015'].DET_TEMP.plot()

/usr/local/python3/miniconda/lib/python3.4/site-packages/pandas-0.16.2+409.g73dcb95-py3.4-linux-x86_64.egg/pandas/tools/plotting.py in plot_series(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
   2526                  yerr=yerr, xerr=xerr,
   2527                  label=label, secondary_y=secondary_y,
-> 2528                  **kwds)
   2529 
   2530 

/usr/local/python3/miniconda/lib/python3.4/site-packages/pandas-0.16.2+409.g73dcb95-py3.4-linux-x86_64.egg/pandas/tools/plotting.py in _plot(data, x, y, subplots, ax, kind, **kwds)
   2331         plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
   2332 
-> 2333     plot_obj.generate()
   2334     plot_obj.draw()
   2335     return plot_obj.result

/usr/local/python3/miniconda/lib/python3.4/site-packages/pandas-0.16.2+409.g73dcb95-py3.4-linux-x86_64.egg/pandas/tools/plotting.py in generate(self)
    940         self._make_legend()
    941         self._post_plot_logic()
--> 942         self._adorn_subplots()
    943 
    944     def _args_adjust(self):

/usr/local/python3/miniconda/lib/python3.4/site-packages/pandas-0.16.2+409.g73dcb95-py3.4-linux-x86_64.egg/pandas/tools/plotting.py in _adorn_subplots(self)
   1068                                 naxes=nrows * ncols, nrows=nrows,
   1069                                 ncols=ncols, sharex=self.sharex,
-> 1070                                 sharey=self.sharey)
   1071 
   1072         for ax in to_adorn:

/usr/local/python3/miniconda/lib/python3.4/site-packages/pandas-0.16.2+409.g73dcb95-py3.4-linux-x86_64.egg/pandas/tools/plotting.py in _handle_shared_axes(axarr, nplots, naxes, nrows, ncols, sharex, sharey)
   3333         layout = np.zeros((nrows+1,ncols+1), dtype=np.bool)
   3334         for ax in axarr:
-> 3335             layout[ax.rowNum, ax.colNum] = ax.get_visible()
   3336 
   3337         if sharex and nrows > 1:

IndexError: index 5 is out of bounds for axis 1 with size 3

System: Conda, all updates, Pandas: '0.16.2+409.g73dcb95' (73dcb95)

@michaelaye
Copy link
Contributor Author

The error does not change when I apply the .loc operator in both cases:

df.loc[df.CHANNEL=='MUV'].loc['2015'].DET_TEMP.plot()

@michaelaye
Copy link
Contributor Author

The selection works in this form:

df.loc[df.CHANNEL=='MUV', ['DET_TEMP']]['2015'].plot()

@jorisvandenbossche
Copy link
Member

Would it be possible to provide some reproducible example? A part of the data (or dummy data) that shows the problem?

@sinhrks
Copy link
Member

sinhrks commented Aug 23, 2015

The error is raised by the same part as the problem fixed in #10879. But this looks to be a different problem caused by chained ops. Recommended to select by one op, such as

df.loc[df.index.year == 2015 & df.CHANNEL=='MUV', 'DET_TEMP'].plot()

@michaelaye
Copy link
Contributor Author

I tried to set up a dummy example, but it does not fail. :(

import pandas as pd
import numpy as np
index = pd.date_range('2012', freq='3m', periods=20)
df = pd.DataFrame(np.random.rand(20,2), index=index, columns=['CASE_TEMP','DET_TEMP'])
df['CHANNEL'] = np.random.random_integers(low=1, high=2, size=20)
df.CHANNEL = df.CHANNEL.map(lambda x: 'MUV' if x==1 else 'FUV')
df.ix[4, 'DET_TEMP'] = np.NAN
df.ix[17, 'CHANNEL'] = np.NAN
df.ix[12, 'CASE_TEMP'] = np.NAN
df[df.CHANNEL=='MUV'].loc['2015'].DET_TEMP.plot()

@michaelaye
Copy link
Contributor Author

@sinhrks I had to put parentheses around the row selectors, before it would not choke, but then after that with my data i get the same IndexError as above:

df.loc[(df.index.year == 2015) & (df.CHANNEL=='MUV'), 'DET_TEMP'].plot()

@jorisvandenbossche As I wrote above, without the plot() command I don't get an error, also the describe command does not show anything weird about the data:

count    18890.000000
mean       -23.543606
std          0.646694
min        -24.554800
25%        -23.805800
50%        -23.656000
75%        -23.356400
max        -14.368400
Name: DET_TEMP, dtype: float64

but if I store the selection into a Series, and call plot() on it it still fails.
I saved this Series into an hdf file with the handle 'df' and one can access it here:
https://dl.dropboxusercontent.com/u/139035/df_selection_data.h5
(It's only 307 KB).

@mroeschke
Copy link
Member

Looks like the link to the data 404s, and we would need a minimal example to reproduce the bug. Going to close due to needing more info but happy to reopen if we get a self-contained example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants