Skip to content

REGR: empty CatgoricalIndex constructor by passing no data (but passing categories) no longer working #38944

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Jan 4, 2021 · 1 comment · Fixed by #41612
Labels
Categorical Categorical Data Type Needs Discussion Requires discussion from core team before further action Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 4, 2021

See https://github.com/pandas-dev/pandas/pull/38614/files#r551313565

#38614 changed the CategoricalIndex constructor to disallow scalar values, but at the same time it also disallowed passing no values to create an empty index (which in practice means passing the default of None):

In [17]: pd.CategoricalIndex(categories=['a', 'b'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-6f0ae73d9374> in <module>
----> 1 pd.CategoricalIndex(categories=['a', 'b'])

~/scipy/pandas/pandas/core/indexes/category.py in __new__(cls, data, categories, ordered, dtype, copy, name)
    187         dtype: Optional[Dtype] = None,
    188         copy=False,
--> 189         name=None,
    190     ):
    191 

TypeError: CategoricalIndex(...) must be called with a collection of some kind, None was passed

If we make this change, this is at least something that should be deprecated IMO (eg it breaks dask parquet reading).

But I am not sure if we should change it, since other constructors like Series() allow similar creation of empty objects:

cc @jbrockmendel

@jorisvandenbossche jorisvandenbossche added Regression Functionality that used to work in a prior pandas version Categorical Categorical Data Type Needs Discussion Requires discussion from core team before further action labels Jan 4, 2021
@jorisvandenbossche jorisvandenbossche added this to the 1.3 milestone Jan 4, 2021
@jorisvandenbossche
Copy link
Member Author

But I am not sure if we should change it, since other constructors like Series() allow similar creation of empty objects:

So while Series indeed allows this (and pd.DataFrame() as well):

In [23]: pd.Series(dtype="int64")
Out[23]: Series([], dtype: int64)

it seems that other Index constructors actually consistently disallow this:

In [29]: pd.Index(dtype=object)
...
TypeError: Index(...) must be called with a collection of some kind, None was passed

In [30]: pd.Int64Index()
...
TypeError: Int64Index(...) must be called with a collection of some kind, None was passed

so indeed good to be consistent about it between Index constructors. But I think we should still do it with a deprecation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Needs Discussion Requires discussion from core team before further action Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant