Skip to content

BUG: Series constructor with category dtype does not raise with unknown categories #59204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
WillAyd opened this issue Jul 7, 2024 · 5 comments
Closed
3 tasks done
Labels
Bug Categorical Categorical Data Type Constructors Series/DataFrame/Index/pd.array Constructors

Comments

@WillAyd
Copy link
Member

WillAyd commented Jul 7, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

Assignment fails as expected:


>>> cat = pd.CategoricalDtype(["foo", "bar", "baz"])
>>> ser = pd.Series(["foo", "baz", "baz"], dtype=cat)
>>> ser.iloc[2] = "qux"
TypeError: Cannot setitem on a Categorical with a new category (qux), set the categories first


But the constructor just maps this to a missing value:
```python
>>> pd.Series(["foo", "baz", "baz", "qux"], dtype=cat)
Out[13]: 
0    foo
1    baz
2    baz
3    NaN
dtype: category
Categories (3, object): ['foo', 'bar', 'baz']


### Issue Description

The Series constructor allows you to use values that are not part of the categorical values, mapping them to missing

### Expected Behavior

I think the constructor should fail just like assignment does

### Installed Versions

main
@WillAyd WillAyd added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 7, 2024
@WillAyd
Copy link
Member Author

WillAyd commented Jul 7, 2024

This might be another one of those constructors quirks @Aloqeely and I were chatting about on Slack recently

@WillAyd
Copy link
Member Author

WillAyd commented Jul 7, 2024

Though our documentation does rely on this behavior https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.cat.set_categories.html

Maybe it is considered a feature?

@WillAyd WillAyd added Categorical Categorical Data Type Constructors Series/DataFrame/Index/pd.array Constructors and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 7, 2024
@jbrockmendel
Copy link
Member

Xref #40996

@WillAyd
Copy link
Member Author

WillAyd commented Jul 8, 2024

Ah thanks - this is a dupe then

@WillAyd WillAyd closed this as completed Jul 8, 2024
@jbrockmendel
Copy link
Member

jbrockmendel commented Jul 8, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Constructors Series/DataFrame/Index/pd.array Constructors
Projects
None yet
Development

No branches or pull requests

2 participants