Skip to content

No automatic type casting in complete indexing of a large MultiIndex #18818

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
toobaz opened this issue Dec 18, 2017 · 0 comments · Fixed by #19074
Closed

No automatic type casting in complete indexing of a large MultiIndex #18818

toobaz opened this issue Dec 18, 2017 · 0 comments · Fixed by #19074
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions MultiIndex
Milestone

Comments

@toobaz
Copy link
Member

toobaz commented Dec 18, 2017

Code Sample, a copy-pastable example if possible

In [2]: pd.MultiIndex.from_product([[1., 2.], range(5000)]).get_loc((1, 0))
Out[2]: 0

In [3]: pd.MultiIndex.from_product([[1., 2.], range(5001)]).get_loc((1., 0))
Out[3]: 0

In [4]: pd.MultiIndex.from_product([[1., 2.], range(5001)]).get_loc(1)
Out[4]: slice(0, 5001, None)

In [5]: pd.MultiIndex.from_product([[1., 2.], range(5001)]).get_loc((1, 0))
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-5-a8f6fcc9f5d9> in <module>()
----> 1 pd.MultiIndex.from_product([[1., 2.], range(5001)]).get_loc((1, 0))

/home/nobackup/repo/pandas/pandas/core/indexes/multi.py in get_loc(self, key, method)
   2141             key = _values_from_object(key)
   2142             key = tuple(map(_maybe_str_to_time_stamp, key, self.levels))
-> 2143             return self._engine.get_loc(key)
   2144 
   2145         # -- partial selection or non-unique index

/home/nobackup/repo/pandas/pandas/_libs/index.pyx in pandas._libs.index.MultiIndexHashEngine.get_loc (pandas/_libs/index.c:15856)()
    645         return algos.pad_object(values, other, limit=limit)
    646 
--> 647     cpdef get_loc(self, object val):
    648         if is_definitely_invalid_key(val):
    649             raise TypeError("'{val}' is an invalid key".format(val=val))

/home/nobackup/repo/pandas/pandas/_libs/index.pyx in pandas._libs.index.MultiIndexHashEngine.get_loc (pandas/_libs/index.c:15703)()
    654 
    655         try:
--> 656             return self.mapping.get_item(val)
    657         except TypeError:
    658             raise KeyError(val)

/home/nobackup/repo/pandas/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.MultiIndexHashTable.get_item (pandas/_libs/hashtable.c:24621)()
   1446             return False
   1447 
-> 1448     cpdef get_item(self, object key):
   1449         cdef:
   1450             khiter_t k

/home/nobackup/repo/pandas/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.MultiIndexHashTable.get_item (pandas/_libs/hashtable.c:24576)()
   1460             return loc
   1461         else:
-> 1462             raise KeyError(key)
   1463 
   1464     cpdef set_item(self, object key, Py_ssize_t val):

KeyError: (1, 0)

Problem description

Related to #18519 - this is an inherent limit of the current design of the engine for large MultiIndexes, and an improved MultiIndexEngine should solve this too.

Expected Output

Out[5]: 0

Output of pd.show_versions()

INSTALLED VERSIONS

commit: b5f1e71
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.22.0.dev0+371.gb5f1e716d
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

@jreback jreback added this to the Next Major Release milestone Dec 18, 2017
@jreback jreback modified the milestones: Next Major Release, 0.23.0 Jan 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants