-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Add fix to raise error when category value 'x' is not predefined but is assigned through df.loc[..]=x #34011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
chrispe
wants to merge
79
commits into
pandas-dev:master
from
chrispe:fix-category-index-issue-33952
Closed
Changes from 30 commits
Commits
Show all changes
79 commits
Select commit
Hold shift + click to select a range
4487bec
Add fix to raise error when category value is not predefined
chrispe 10098ab
Fix linting
chrispe cb34580
Added new test
chrispe 1622663
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe c627fa6
Add test case for unused categories
chrispe ba3a751
Remove trailing whitespace
chrispe 51dcdfe
Fix linting
chrispe 9057b26
Fix linting
chrispe 06fdc3e
Remove temporary fix from generic.py
chrispe 8c8f794
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe 582c023
First fix try through indexing.py
chrispe 730fc2b
Fix lint
chrispe c275eb9
Fix import ordering
chrispe 944ae24
Fix Update
chrispe 8372bdb
Fix lint
chrispe 0e5e418
Include more related test cases
chrispe eea359a
Fix linting
chrispe 5f72d4e
Update test_indexing.py
chrispe 781322a
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe 26f474b
import missing dtypes function
chrispe 215943e
Fix linting
chrispe 5bacde9
Include requested changes
chrispe 96e4318
Fix import ordering/format
chrispe 993be66
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe 42d968b
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe e7ce246
Update test_categorical.py
chrispe 31ef609
Fix format
chrispe ce3f463
Remove commas
chrispe a825269
Update test_categorical.py
chrispe 3816789
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe 8651d25
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe 72726a0
Update solution
chrispe 0f738c3
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe 51e2032
Fix lint
chrispe 7d64357
Fix format issues
chrispe d68f215
Update indexing.py
chrispe 5ea8ab1
Update indexing.py
chrispe b08efc1
Update indexing.py
chrispe 69f4e62
Update test_categorical.py
chrispe 4c33040
Update concat.py
chrispe e936736
Update cast.py
chrispe 8031f8f
Update cast.py
chrispe 1e1c094
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe c889b1b
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe c08c6c0
Update test_categorical.py
chrispe e6c3a4c
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe c862d99
Revert previous approach and include concat changes
chrispe 5baa314
Remove non-required convertion
chrispe cb5d8e4
Update concat.py
chrispe 7d7da20
Update concat.py
chrispe ecad50f
Update cast.py
chrispe ab5af93
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe 0c6b68a
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe 4197a74
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe 4bc05c6
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe af5e141
Add new version with raise
chrispe 6d45570
Add format fixes
chrispe 31612ed
Update test_categorical.py
chrispe 4a2a8e8
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe dd7e3ca
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe 6d9e667
Update
chrispe e0da655
Use prio_cat_dtype only for EAs
chrispe 92d1f14
Revert usage of first_ea
chrispe 9b9b382
Fix mypy errors
chrispe d3df994
Use unique1d in _cast_to_common_type
chrispe 41aa9e3
Fix isort error
chrispe ca0eb1f
Renamed input variable for find_common_type
chrispe e2cfb79
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe 931d6c8
Remove new argument in find_common_type
chrispe 8065ddb
Add check to _get_common_dtype
chrispe 5d533dd
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe b21326b
Update dtypes.py
chrispe 335fc06
Update dtypes.py
chrispe 950dcc4
Update dtypes.py
chrispe 2ee1df8
Update dtypes.py
chrispe 17120f0
Test
chrispe 439b49f
Add flag in get_common_type
chrispe c6e3435
Revert
chrispe fc40817
Update dtypes.py
chrispe File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
import pytest | ||
|
||
import pandas as pd | ||
from pandas import Categorical, Index | ||
import pandas._testing as tm | ||
|
||
|
||
class TestCategoricalSeries: | ||
def test_loc_new_category_series_raises(self): | ||
ser = pd.Series(Categorical(["a", "b", "c"])) | ||
msg = "Cannot setitem on a Categorical with a new category" | ||
with pytest.raises(ValueError, match=msg): | ||
ser.loc[3] = "d" | ||
|
||
def test_unused_category_retention(self): | ||
# Init case | ||
exp_cats = Index(["a", "b", "c", "d"]) | ||
ser = pd.Series(Categorical(["a", "b", "c"], categories=exp_cats)) | ||
tm.assert_index_equal(ser.cat.categories, exp_cats) | ||
|
||
# Modify case | ||
ser.loc[0] = "b" | ||
expected = pd.Series(Categorical(["b", "b", "c"], categories=exp_cats)) | ||
tm.assert_index_equal(ser.cat.categories, exp_cats) | ||
tm.assert_series_equal(ser, expected) | ||
|
||
def test_loc_new_category_row_raises(self): | ||
df = pd.DataFrame( | ||
{ | ||
"int": [0, 1, 2], | ||
"cat": Categorical(["a", "b", "c"], categories=["a", "b", "c"]), | ||
} | ||
) | ||
msg = "Cannot setitem on a Categorical with a new category" | ||
with pytest.raises(ValueError, match=msg): | ||
df.loc[3] = [3, "d"] | ||
|
||
def test_loc_new_row_category_dtype_retention(self): | ||
df = pd.DataFrame( | ||
{ | ||
"int": [0, 1, 2], | ||
"cat": pd.Categorical(["a", "b", "c"], categories=["a", "b", "c"]), | ||
} | ||
) | ||
df.loc[3] = [3, "c"] | ||
|
||
expected = pd.DataFrame( | ||
{ | ||
"int": [0, 1, 2, 3], | ||
"cat": pd.Categorical(["a", "b", "c", "c"], categories=["a", "b", "c"]), | ||
} | ||
) | ||
|
||
tm.assert_frame_equal(df, expected) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this entire implementation should not be here, rather in Categorical indexing itself. Indexing is already very complicated and we are trying to remove things from core/indexing.py; this is dtype specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls address this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the dtype specific code from
core/indexing.py
. However, some code changes still remain there, because I couldn't find an alternative way. The latest changes address the issue and so theCategorical
dtype is now preserved:Some additional points related to this PR changes:
pd.DataFrame
then that value is replaced with nan.pd.Series
in this PR. Which means that the dtype of the series changes to object when an unseen category is appended through index expansion.