Skip to content

Clarify is_bool_indexer for Extension dtypes #22326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Aug 13, 2018 · 4 comments · Fixed by #22667
Closed

Clarify is_bool_indexer for Extension dtypes #22326

TomAugspurger opened this issue Aug 13, 2018 · 4 comments · Fixed by #22667
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves Sparse Sparse Data Type
Milestone

Comments

@TomAugspurger
Copy link
Contributor

What do we want here?

In [1]: import pandas as pd

In [2]: pd.core.common.is_bool_indexer(pd.Categorical([True, True]))
Out[2]: False

working around this in #22325

@TomAugspurger TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves ExtensionArray Extending pandas with custom dtypes or arrays. labels Aug 13, 2018
@TomAugspurger
Copy link
Contributor Author

Likewise for is_bool_dtype.

@TomAugspurger
Copy link
Contributor Author

This manifests in failures for .loc with Categoricalndex holding booleans

In [3]: pd.Series([1, 2, 3]).loc[pd.Categorical([True, False, True])]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-3-0532517e2922> in <module>()
----> 1 pd.Series([1, 2, 3]).loc[pd.Categorical([True, False, True])]

~/sandbox/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1500
   1501             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1502             return self._getitem_axis(maybe_callable, axis=axis)
   1503
   1504     def _is_scalar_access(self, key):

~/sandbox/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1902                     raise ValueError('Cannot index with multidimensional key')
   1903
-> 1904                 return self._getitem_iterable(key, axis=axis)
   1905
   1906             # nested tuple slicing

~/sandbox/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1203             # A collection of keys
   1204             keyarr, indexer = self._get_listlike_indexer(key, axis,
-> 1205                                                          raise_missing=False)
   1206             return self.obj._reindex_with_indexers({axis: [keyarr, indexer]},
   1207                                                    copy=True, allow_dups=True)

~/sandbox/pandas/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1159         self._validate_read_indexer(keyarr, indexer,
   1160                                     o._get_axis_number(axis),
-> 1161                                     raise_missing=raise_missing)
   1162         return keyarr, indexer
   1163

~/sandbox/pandas/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1244                 raise KeyError(
   1245                     u"None of [{key}] are in the [{axis}]".format(
-> 1246                         key=key, axis=self.obj._get_axis_name(axis)))
   1247
   1248             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Index([True, False, True], dtype='object')] are in the [index]"

That should be Series([1, 3]).

@jorisvandenbossche
Copy link
Member

Should we have something similar on the dtype as for numeric data? (the ExtensionDtype._is_numeric added recently) To indicate a certain dtype is considered boolean?

Because otherwise those inspection functions would need to know how to inspect the dtype (for categorical checking the dtype of the categories).

I am only a bit worried about a possible proliferation of such attributes ..

(now, categorical with boolean categories also doesn't sound that useful. We could also say we require an actual boolean dtype to do boolean indexing)

@TomAugspurger
Copy link
Contributor Author

I suppose this is what the .kind attribute is for?

Right now Categorical.dtype.kind is always O:

In [8]: pd.Categorical([True, False]).dtype.kind
Out[8]: 'O'

If we changed that to be .categories.dtype.kind we wouldn't need a ._is_boolean, though we would need to implement a BooleanIndex.

@TomAugspurger TomAugspurger added the Sparse Sparse Data Type label Aug 31, 2018
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Sep 11, 2018
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Sep 11, 2018
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Sep 11, 2018
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Sep 11, 2018
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Sep 11, 2018
@jreback jreback added this to the 0.24.0 milestone Sep 13, 2018
TomAugspurger added a commit that referenced this issue Sep 20, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this issue Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants