Skip to content

IndexError: index out of bounds for large Series with MultiIndex #5727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Marigold opened this issue Dec 18, 2013 · 8 comments · Fixed by #5730
Closed

IndexError: index out of bounds for large Series with MultiIndex #5727

Marigold opened this issue Dec 18, 2013 · 8 comments · Fixed by #5730
Labels
API Design Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Milestone

Comments

@Marigold
Copy link

I am on 0.13.0rc1-64-gceec8bf (master). I get the IndexError when trying to print large dataframe with MultiIndex full of NaNs. Following script raises the error

import pandas as pd
import numpy as np

n = 3500000
arrays = [range(n), np.empty(n)]
index = pd.MultiIndex.from_tuples(zip(*arrays))
s = pd.Series(np.zeros(n), index=index)

print(s)

And here is a Traceback

Traceback (most recent call last):
  File "/home/mvinkler/Documents/analysis/models/test.py", line 12, in <module>
    print(s)
  File "/home/mvinkler/Downloads/pandas/pandas/core/base.py", line 35, in __str__
    return self.__bytes__()
  File "/home/mvinkler/Downloads/pandas/pandas/core/base.py", line 47, in __bytes__
    return self.__unicode__().encode(encoding, 'replace')
  File "/home/mvinkler/Downloads/pandas/pandas/core/series.py", line 857, in __unicode__
    result = self._tidy_repr(min(30, max_rows - 4))
  File "/home/mvinkler/Downloads/pandas/pandas/core/series.py", line 876, in _tidy_repr
    head = self.iloc[:num]._get_repr(print_header=True, length=False,
  File "/home/mvinkler/Downloads/pandas/pandas/core/generic.py", line 946, in _indexer
    setattr(self, iname, indexer(self, name))
  File "/home/mvinkler/Downloads/pandas/pandas/core/generic.py", line 1607, in __setattr__
    elif name in self._info_axis:
  File "/home/mvinkler/Downloads/pandas/pandas/core/index.py", line 2498, in __contains__
    self.get_loc(key)
  File "/home/mvinkler/Downloads/pandas/pandas/core/index.py", line 2986, in get_loc
    return self._get_level_indexer(key, level=0)
  File "/home/mvinkler/Downloads/pandas/pandas/core/index.py", line 3123, in _get_level_indexer
    loc = level_index.get_loc(key)
  File "/home/mvinkler/Downloads/pandas/pandas/core/index.py", line 1017, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "index.pyx", line 129, in pandas.index.IndexEngine.get_loc (pandas/index.c:3560)
  File "index.pyx", line 138, in pandas.index.IndexEngine.get_loc (pandas/index.c:3314)
  File "util.pxd", line 40, in util.get_value_at (pandas/index.c:13101)
IndexError: index out of bounds
@jreback
Copy link
Contributor

jreback commented Dec 18, 2013

thanks for the report!

@jreback
Copy link
Contributor

jreback commented Dec 18, 2013

pls update to master and give a try..thanks for the report

@Marigold
Copy link
Author

Works great, thanks a lot.

@Marigold
Copy link
Author

Marigold commented Jan 9, 2014

There seems to be another problem when assigning value on index to large arrays. I am on current master.

import pandas as pd
import numpy as np

n = 3500000
arrays = [range(n), np.empty(n)]
index = pd.MultiIndex.from_tuples(zip(*arrays))
s = pd.Series(np.zeros(n), index=index)

s[s==0] = 1

with traceback

Traceback (most recent call last):
  File "/home/mvinkler/Documents/analysis/load_db/test.py", line 9, in <module>
    s[s==0] = 1
  File "/home/mvinkler/Downloads/pandas/pandas/core/series.py", line 628, in __setitem__
    self.where(~key, value, inplace=True)
  File "/home/mvinkler/Downloads/pandas/pandas/core/generic.py", line 2895, in where
    self._update_inplace(new_data)
  File "/home/mvinkler/Downloads/pandas/pandas/core/generic.py", line 1244, in _update_inplace
    self._reset_cache()
  File "/home/mvinkler/Downloads/pandas/pandas/core/base.py", line 92, in _reset_cache
    if getattr(self, '_cache', None) is None:
  File "/home/mvinkler/Downloads/pandas/pandas/core/generic.py", line 1610, in __getattr__
    if name in self._info_axis:
  File "/home/mvinkler/Downloads/pandas/pandas/core/index.py", line 2507, in __contains__
    self.get_loc(key)
  File "/home/mvinkler/Downloads/pandas/pandas/core/index.py", line 2995, in get_loc
    return self._get_level_indexer(key, level=0)
  File "/home/mvinkler/Downloads/pandas/pandas/core/index.py", line 3132, in _get_level_indexer
    loc = level_index.get_loc(key)
  File "/home/mvinkler/Downloads/pandas/pandas/core/index.py", line 1017, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "index.pyx", line 129, in pandas.index.IndexEngine.get_loc (pandas/index.c:3560)
  File "index.pyx", line 138, in pandas.index.IndexEngine.get_loc (pandas/index.c:3314)
  File "util.pxd", line 40, in util.get_value_at (pandas/index.c:13101)
IndexError: index out of bounds

@jreback
Copy link
Contributor

jreback commented Jan 9, 2014

@Marigold thanks.....#5892 fixes this on master...

was not a bug related to multi-indexing per say...but an internal cache issue

@genghis88
Copy link

I have a similar problem. Can anyone please help?

import pandas as pd
allData = pd.io.parsers.read_csv('companies.csv',sep=',',low_memory=False,error_bad_lines=False)
#print allData['Database']
df = allData['Company']
print df['Company'].str.contains('/^((?!Amazon).)*$/',regex=True)

I get this error:
Traceback (most recent call last):
File "get_count.py", line 5, in
print df['Company'].str.contains('/^((?!Amazon).)*$/',regex=True)
File "C:\Users\kprabh\AppData\Local\Continuum\Anaconda\lib\site-packages\panda
s\core\series.py", line 491, in getitem
result = self.index.get_value(self, key)
File "C:\Users\kprabh\AppData\Local\Continuum\Anaconda\lib\site-packages\panda
s\core\index.py", line 1032, in get_value
return self._engine.get_value(s, k)
File "index.pyx", line 97, in pandas.index.IndexEngine.get_value (pandas\index
.c:2957)
File "index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas\inde
x.c:2772)
File "index.pyx", line 138, in pandas.index.IndexEngine.get_loc (pandas\index.
c:3378)
File "util.pxd", line 40, in util.get_value_at (pandas\index.c:13607)
IndexError: index out of bounds

@jreback
Copy link
Contributor

jreback commented Jun 11, 2014

ok post

pd.show_versions()
and a link to your file

@genghis88
Copy link

It was my mistake, I realized!! I had not converted df to a dataframe. It was a series. But the error was misleading. Googling it brought me right here. Thank you for your quick reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants