Skip to content

MultiIndex.isin fails in 0.20.x when values is empty iterable #16777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
drudd opened this issue Jun 27, 2017 · 2 comments
Closed

MultiIndex.isin fails in 0.20.x when values is empty iterable #16777

drudd opened this issue Jun 27, 2017 · 2 comments
Labels
MultiIndex Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@drudd
Copy link
Contributor

drudd commented Jun 27, 2017

In 0.19.x, the implementation of MultiIndex.isin allowed for an empty iterable to be passed:

In [3]: pd.MultiIndex([['foo', 'bar'],['a', 'b']], [[0, 1], [1, 0]]).isin([])
Out[3]: array([False, False], dtype=bool)

However in 0.20.2 it now raises an exception:

In [1]: pd.MultiIndex([['foo', 'bar'],['a', 'b']], [[0, 1], [1, 0]]).isin([])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-57-94faf7c7f3d5> in <module>()
----> 1 pd.MultiIndex([['foo', 'bar'],['a', 'b']], [[0, 1], [1, 0]]).isin([])

/usr/local/lib/python2.7/site-packages/pandas/core/indexes/multi.pyc in isin(self, values, level)
   2625         if level is None:
   2626             return algos.isin(self.values,
-> 2627                               MultiIndex.from_tuples(values).values)
   2628         else:
   2629             num = self._get_level_number(level)

/usr/local/lib/python2.7/site-packages/pandas/core/indexes/multi.pyc in from_tuples(cls, tuples, sortorder, names)
   1136         if len(tuples) == 0:
   1137             # I think this is right? Not quite sure...
-> 1138             raise TypeError('Cannot infer number of levels from empty list')
   1139
   1140         if isinstance(tuples, (np.ndarray, Index)):

TypeError: Cannot infer number of levels from empty list

Problem description

This occurs because the new implementation constructs a new MultiIndex from the values parameter, and MultiIndex.from_tuples fails to generate empty MultiIndex objects (see Issue #263).

I attempted to fix this by changing the isin function:

return algos.isin(self.values,
                  MultiIndex.from_tuples(values, level=self._levels).values)

and also fixing the from_tuples function to only error out if names does not provide a way to infer the number of levels:

if len(tuples) == 0 and names is None:
    raise TypeError('Cannot infer number of levels from empty list')

However MultiIndex.from_arrays also didn't like the empty lists, and the last change is a bit ugly:

    if names is not None and len(names) > 0 and len(levels) == 0:
        levels = [[]*len(names)]
        labels = [[]*len(names)]

Happy to turn this in to a PR if there are no objections.

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.20.2
pytest: 3.0.3
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.25.2
numpy: 1.13.0
scipy: 0.19.1
xarray: None
IPython: 5.3.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: None
numexpr: 2.6.1
feather: None
matplotlib: 2.0.0
openpyxl: 2.3.5
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jorisvandenbossche
Copy link
Member

I think it is reasonable to allow this (since we also allow this on normal index/series).
And given the comment in the code ("# I think this is right? Not quite sure..."), it seems it was not that a strong decision to raise :)

PR welcome. We can discuss whether there is something less ugly there.

@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Jun 27, 2017
@jreback
Copy link
Contributor

jreback commented Jul 7, 2017

closed by #16782

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

3 participants