Skip to content

Add fix to raise error when category value 'x' is not predefined but is assigned through df.loc[..]=x #34011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
4487bec
Add fix to raise error when category value is not predefined
chrispe May 5, 2020
10098ab
Fix linting
chrispe May 5, 2020
cb34580
Added new test
chrispe May 5, 2020
1622663
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe May 16, 2020
c627fa6
Add test case for unused categories
chrispe May 16, 2020
ba3a751
Remove trailing whitespace
chrispe May 16, 2020
51dcdfe
Fix linting
chrispe May 16, 2020
9057b26
Fix linting
chrispe May 16, 2020
06fdc3e
Remove temporary fix from generic.py
chrispe May 23, 2020
8c8f794
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe May 23, 2020
582c023
First fix try through indexing.py
chrispe May 24, 2020
730fc2b
Fix lint
chrispe May 24, 2020
c275eb9
Fix import ordering
chrispe May 24, 2020
944ae24
Fix Update
chrispe May 24, 2020
8372bdb
Fix lint
chrispe May 24, 2020
0e5e418
Include more related test cases
chrispe May 24, 2020
eea359a
Fix linting
chrispe May 24, 2020
5f72d4e
Update test_indexing.py
chrispe May 24, 2020
781322a
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Oct 8, 2020
26f474b
import missing dtypes function
chrispe Oct 8, 2020
215943e
Fix linting
chrispe Oct 8, 2020
5bacde9
Include requested changes
chrispe Oct 10, 2020
96e4318
Fix import ordering/format
chrispe Oct 10, 2020
993be66
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Oct 10, 2020
42d968b
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Oct 11, 2020
e7ce246
Update test_categorical.py
chrispe Oct 11, 2020
31ef609
Fix format
chrispe Oct 11, 2020
ce3f463
Remove commas
chrispe Oct 11, 2020
a825269
Update test_categorical.py
chrispe Oct 11, 2020
3816789
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Oct 12, 2020
8651d25
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Oct 17, 2020
72726a0
Update solution
chrispe Oct 17, 2020
0f738c3
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Oct 17, 2020
51e2032
Fix lint
chrispe Oct 17, 2020
7d64357
Fix format issues
chrispe Oct 17, 2020
d68f215
Update indexing.py
chrispe Oct 17, 2020
5ea8ab1
Update indexing.py
chrispe Oct 18, 2020
b08efc1
Update indexing.py
chrispe Oct 18, 2020
69f4e62
Update test_categorical.py
chrispe Oct 18, 2020
4c33040
Update concat.py
chrispe Oct 18, 2020
e936736
Update cast.py
chrispe Oct 18, 2020
8031f8f
Update cast.py
chrispe Oct 18, 2020
1e1c094
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Oct 18, 2020
c889b1b
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Nov 4, 2020
c08c6c0
Update test_categorical.py
chrispe Nov 4, 2020
e6c3a4c
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Nov 28, 2020
c862d99
Revert previous approach and include concat changes
chrispe Nov 28, 2020
5baa314
Remove non-required convertion
chrispe Nov 28, 2020
cb5d8e4
Update concat.py
chrispe Nov 28, 2020
7d7da20
Update concat.py
chrispe Nov 28, 2020
ecad50f
Update cast.py
chrispe Dec 6, 2020
ab5af93
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Dec 6, 2020
0c6b68a
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Jan 21, 2021
4197a74
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Feb 13, 2021
4bc05c6
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Feb 15, 2021
af5e141
Add new version with raise
chrispe Feb 15, 2021
6d45570
Add format fixes
chrispe Feb 15, 2021
31612ed
Update test_categorical.py
chrispe Feb 15, 2021
4a2a8e8
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Feb 17, 2021
dd7e3ca
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Feb 20, 2021
6d9e667
Update
chrispe Feb 20, 2021
e0da655
Use prio_cat_dtype only for EAs
chrispe Feb 20, 2021
92d1f14
Revert usage of first_ea
chrispe Feb 20, 2021
9b9b382
Fix mypy errors
chrispe Feb 20, 2021
d3df994
Use unique1d in _cast_to_common_type
chrispe Feb 20, 2021
41aa9e3
Fix isort error
chrispe Feb 20, 2021
ca0eb1f
Renamed input variable for find_common_type
chrispe Feb 20, 2021
e2cfb79
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Mar 7, 2021
931d6c8
Remove new argument in find_common_type
chrispe Mar 7, 2021
8065ddb
Add check to _get_common_dtype
chrispe Mar 13, 2021
5d533dd
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
chrispe Mar 13, 2021
b21326b
Update dtypes.py
chrispe Mar 13, 2021
335fc06
Update dtypes.py
chrispe Mar 13, 2021
950dcc4
Update dtypes.py
chrispe Mar 13, 2021
2ee1df8
Update dtypes.py
chrispe Mar 13, 2021
17120f0
Test
chrispe Mar 13, 2021
439b49f
Add flag in get_common_type
chrispe Mar 13, 2021
c6e3435
Revert
chrispe Mar 13, 2021
fc40817
Update dtypes.py
chrispe Mar 13, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@
pandas_dtype,
)
from pandas.core.dtypes.dtypes import (
CategoricalDtype,
DatetimeTZDtype,
ExtensionDtype,
IntervalDtype,
Expand Down Expand Up @@ -1849,13 +1850,16 @@ def ensure_nanosecond_dtype(dtype: DtypeObj) -> DtypeObj:
return dtype


def find_common_type(types: List[DtypeObj]) -> DtypeObj:
def find_common_type(
types: List[DtypeObj], promote_categorical: Optional[bool] = False
) -> DtypeObj:
"""
Find a common data type among the given dtypes.

Parameters
----------
types : list of dtypes
promote_categorical : find if possible, a categorical dtype that fits all the dtypes

Returns
-------
Expand All @@ -1876,6 +1880,24 @@ def find_common_type(types: List[DtypeObj]) -> DtypeObj:
if all(is_dtype_equal(first, t) for t in types[1:]):
return first

# special case for categorical
if promote_categorical:
if any(is_categorical_dtype(t) for t in types):
cat_dtypes = []
for t in types:
if isinstance(t, CategoricalDtype) and t.categories is not None:
if any(~isna(t.categories.values)):
cat_values_dtype = t.categories.values.dtype
if all(
is_categorical_dtype(x) or np.can_cast(x, cat_values_dtype)
for x in types
):
cat_dtypes.append(t)
if len(cat_dtypes) > 0:
dtype_ref = cat_dtypes[0]
if all(is_dtype_equal(dtype, dtype_ref) for dtype in cat_dtypes[1:]):
return dtype_ref

# get unique types (dict.fromkeys is used as order-preserving set())
types = list(dict.fromkeys(types).keys())

Expand Down
21 changes: 20 additions & 1 deletion pandas/core/dtypes/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,14 @@
is_extension_array_dtype,
is_sparse,
)
from pandas.core.dtypes.dtypes import CategoricalDtype
from pandas.core.dtypes.generic import (
ABCCategoricalIndex,
ABCSeries,
)
from pandas.core.dtypes.missing import isna

from pandas.core.algorithms import unique1d
from pandas.core.arrays import ExtensionArray
from pandas.core.arrays.sparse import SparseArray
from pandas.core.construction import (
Expand All @@ -35,6 +38,16 @@ def _cast_to_common_type(arr: ArrayLike, dtype: DtypeObj) -> ArrayLike:
Helper function for `arr.astype(common_dtype)` but handling all special
cases.
"""
if isinstance(dtype, CategoricalDtype):
# if casting an array to a categorical dtype, then we need to ensure
# that its unique values are predefined as categories in that dtype
unique_values = unique1d(arr[~isna(arr)])
if any(val not in dtype.categories for val in unique_values.tolist()):
raise ValueError(
"Cannot setitem on a Categorical with a new category, "
"set the categories first"
)

if (
is_categorical_dtype(arr.dtype)
and isinstance(dtype, np.dtype)
Expand Down Expand Up @@ -116,12 +129,18 @@ def is_nonempty(x) -> bool:
all_empty = not len(non_empties)
single_dtype = len({x.dtype for x in to_concat}) == 1
any_ea = any(is_extension_array_dtype(x.dtype) for x in to_concat)
first_ea = isinstance(to_concat[0], ExtensionArray)
arr_index_expansion = (
first_ea and len(to_concat) == 2 and to_concat[1].shape[0] == 1
)

if any_ea:
# we ignore axis here, as internally concatting with EAs is always
# for axis=0
if not single_dtype:
target_dtype = find_common_type([x.dtype for x in to_concat])
target_dtype = find_common_type(
[x.dtype for x in to_concat], promote_categorical=arr_index_expansion
)
to_concat = [_cast_to_common_type(arr, target_dtype) for arr in to_concat]

if isinstance(to_concat[0], ExtensionArray):
Expand Down
54 changes: 54 additions & 0 deletions pandas/tests/series/test_categorical.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import pytest

import pandas as pd
from pandas import Categorical
import pandas._testing as tm


class TestCategoricalSeries:
def test_setitem_undefined_category_raises(self):
ser = pd.Series(Categorical(["a", "b", "c"]))
msg = (
"Cannot setitem on a Categorical with a new category, "
"set the categories first"
)
with pytest.raises(ValueError, match=msg):
ser.loc[2] = "d"

def test_concat_undefined_category_raises(self):
ser = pd.Series(Categorical(["a", "b", "c"]))
msg = (
"Cannot setitem on a Categorical with a new category, "
"set the categories first"
)
with pytest.raises(ValueError, match=msg):
ser.loc[3] = "d"

def test_loc_category_dtype_retention(self):
# Case 1
df = pd.DataFrame(
{
"int": [0, 1, 2],
"cat": Categorical(["a", "b", "c"], categories=["a", "b", "c"]),
}
)
df.loc[3] = [3, "c"]
expected = pd.DataFrame(
{
"int": [0, 1, 2, 3],
"cat": Categorical(["a", "b", "c", "c"], categories=["a", "b", "c"]),
}
)
tm.assert_frame_equal(df, expected)

# Case 2
ser = pd.Series(Categorical(["a", "b", "c"]))
ser.loc[3] = "c"
expected = pd.Series(Categorical(["a", "b", "c", "c"]))
tm.assert_series_equal(ser, expected)

# Case 3
ser = pd.Series(Categorical([1, 2, 3]))
ser.loc[3] = 3
expected = pd.Series(Categorical([1, 2, 3, 3]))
tm.assert_series_equal(ser, expected)