Skip to content

Trying to reindex category Series with invalid fill_value raises error without message #18185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
toobaz opened this issue Nov 9, 2017 · 7 comments · Fixed by #37720
Closed
Labels
Bug Categorical Categorical Data Type Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@toobaz
Copy link
Member

toobaz commented Nov 9, 2017

Code Sample, a copy-pastable example if possible

In [2]: pd.Series([1, 2, 3, 1], dtype='category').reindex([1,2,3,4,5], fill_value=-1)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-2-dd2f847ae90c> in <module>()
----> 1 pd.Series([1, 2, 3, 1], dtype='category').reindex([1,2,3,4,5], fill_value=-1)

~/nobackup/repo/pandas/pandas/core/series.py in reindex(self, index, **kwargs)
   2638     @Appender(generic._shared_docs['reindex'] % _shared_doc_kwargs)
   2639     def reindex(self, index=None, **kwargs):
-> 2640         return super(Series, self).reindex(index=index, **kwargs)
   2641 
   2642     @Appender(generic._shared_docs['fillna'] % _shared_doc_kwargs)

~/nobackup/repo/pandas/pandas/core/generic.py in reindex(self, *args, **kwargs)
   3005         # perform the reindex on the axes
   3006         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 3007                                   fill_value, copy).__finalize__(self)
   3008 
   3009     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

~/nobackup/repo/pandas/pandas/core/generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   3023             obj = obj._reindex_with_indexers({axis: [new_index, indexer]},
   3024                                              fill_value=fill_value,
-> 3025                                              copy=copy, allow_dups=False)
   3026 
   3027         return obj

~/nobackup/repo/pandas/pandas/core/generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   3126                                                 fill_value=fill_value,
   3127                                                 allow_dups=allow_dups,
-> 3128                                                 copy=copy)
   3129 
   3130         if copy and new_data is self._data:

~/nobackup/repo/pandas/pandas/core/internals.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
   4139         if axis == 0:
   4140             new_blocks = self._slice_take_blocks_ax0(indexer,
-> 4141                                                      fill_tuple=(fill_value,))
   4142         else:
   4143             new_blocks = [blk.take_nd(indexer, axis=axis, fill_tuple=(

~/nobackup/repo/pandas/pandas/core/internals.py in _slice_take_blocks_ax0(self, slice_or_indexer, fill_tuple)
   4178                 return [blk.take_nd(slobj, axis=0,
   4179                                     new_mgr_locs=slice(0, sllen),
-> 4180                                     fill_tuple=fill_tuple)]
   4181 
   4182         if sl_type in ('slice', 'mask'):

~/nobackup/repo/pandas/pandas/core/internals.py in take_nd(self, indexer, axis, new_mgr_locs, fill_tuple)
   2402         # but are passed the axis depending on the calling routing
   2403         # if its REALLY axis 0, then this will be a reindex and not a take
-> 2404         new_values = self.values.take_nd(indexer, fill_value=fill_value)
   2405 
   2406         # if we are a 1-dim object, then always place at 0

~/nobackup/repo/pandas/pandas/core/categorical.py in take_nd(self, indexer, allow_fill, fill_value)
   1712         # filling must always be None/nan here
   1713         # but is passed thru internally
-> 1714         assert isna(fill_value)
   1715 
   1716         codes = take_1d(self._codes, indexer, allow_fill=True, fill_value=-1)

AssertionError: 

Problem description

At least, the error can be improved (after all, nothing in the docs suggests fill_value must be a missing_value, so the fact that this actually works with fill_value=np.nan is an implementation detail), but maybe we could even actually support this? The effort of just adding one element to the categories and set new elements to it should be minimal.

Expected Output

Either the new elements set to -1, or at least a more meaningful error message.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.22.0.dev0+84.g8dac63314
pytest: 3.0.6
pip: 9.0.1
setuptools: 36.6.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

@toobaz
Copy link
Member Author

toobaz commented Nov 9, 2017

Argument for supporting this: in non-categorical Series, .reindex() is expected anyway to make a copy of data.

@gfyoung gfyoung added Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves Error Reporting Incorrect or improved errors from pandas labels Nov 10, 2017
@gfyoung
Copy link
Member

gfyoung commented Nov 10, 2017

+1 for improving the error message.

-0 for adding support. The whole point is to remove NA values, and -1 wouldn't work because it would still be NA for this example.

@toobaz
Copy link
Member Author

toobaz commented Nov 10, 2017

The whole point is to remove NA values

Yeah, I could have provided a better example - I didn't mean that reindex is a good way to insert missing values encoded differently than NA. This is actually what the current behavior implicitly assumes, and I'm suggesting we should change it.

So take for instance

pd.Series([1, 2, 3, 1], dtype='category').reindex([1,2,3,4,5], fill_value=2)

... which also fails: this is surprising even for those who have clear how categoricals work - that is, that 1) they have native support for NA and 2) they don't like to change the categories (which, by the way, is true when assigning, but reindex is different, so I would allow changing them).

@gfyoung gfyoung added the Bug label Nov 10, 2017
@gfyoung
Copy link
Member

gfyoung commented Nov 10, 2017

Ah, gotcha. That's very odd indeed. Feel free to have a look and see what's going on there.

@jreback
Copy link
Contributor

jreback commented Nov 10, 2017

  1. they have native support for NA

I actually considered this to make real boolean with NAs as a dtype, because this is certainly possible!

About this issue. I think you would need to coerce to object type to be semantically correct. Though I suppose you could add the new category. Not sure how odd this would be.

@phofl
Copy link
Member

phofl commented Nov 8, 2020

This raises now

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py", line 27, in <module>
    pd.Series([1, 2, 3, 1], dtype='category').reindex([1,2,3,4,5], fill_value=-1)
  File "/home/developer/PycharmProjects/pandas/pandas/core/series.py", line 4388, in reindex
    return super().reindex(index=index, **kwargs)
  File "/home/developer/PycharmProjects/pandas/pandas/core/generic.py", line 4718, in reindex
    return self._reindex_axes(
  File "/home/developer/PycharmProjects/pandas/pandas/core/generic.py", line 4738, in _reindex_axes
    obj = obj._reindex_with_indexers(
  File "/home/developer/PycharmProjects/pandas/pandas/core/generic.py", line 4781, in _reindex_with_indexers
    new_data = new_data.reindex_indexer(
  File "/home/developer/PycharmProjects/pandas/pandas/core/internals/managers.py", line 1274, in reindex_indexer
    new_blocks = self._slice_take_blocks_ax0(indexer, fill_value=fill_value)
  File "/home/developer/PycharmProjects/pandas/pandas/core/internals/managers.py", line 1341, in _slice_take_blocks_ax0
    blk.take_nd(
  File "/home/developer/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 1821, in take_nd
    new_values = self.values.take(indexer, fill_value=fill_value, allow_fill=True)
  File "/home/developer/PycharmProjects/pandas/pandas/core/arrays/_mixins.py", line 63, in take
    fill_value = self._validate_fill_value(fill_value)
  File "/home/developer/PycharmProjects/pandas/pandas/core/arrays/categorical.py", line 1213, in _validate_fill_value
    raise ValueError(
ValueError: 'fill_value=-1' is not present in this Categorical's categories

Process finished with exit code 1

Error message is appropriate, so could close this?

@jreback
Copy link
Contributor

jreback commented Nov 9, 2020

yes that message looks good
can u confirm we are testing this fully (think so)

if so we can just close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants