From e4145d45b1604aae226cb2d3b420250911f11dc9 Mon Sep 17 00:00:00 2001 From: Zhengbo Wang <77875500+luke396@users.noreply.github.com> Date: Tue, 6 Feb 2024 11:51:00 +0800 Subject: [PATCH 1/5] Add doc and codes --- doc/source/user_guide/categorical.rst | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/doc/source/user_guide/categorical.rst b/doc/source/user_guide/categorical.rst index 8fb991dca02db..b5640c5d83e6c 100644 --- a/doc/source/user_guide/categorical.rst +++ b/doc/source/user_guide/categorical.rst @@ -263,6 +263,18 @@ All instances of ``CategoricalDtype`` compare equal to the string ``'category'`` c1 == "category" +But notice that the ``categories_dtype`` should be considered, especially when comparing with +two empty ``CategoricalDtype`` instances: + +.. ipython:: python + import numpy as np + + c2 = pd.Categorical(np.full(2, np.nan, dtype=object)) + c3 = pd.Categorical(np.full(2, np.nan, dtype=float)) + + # Unequal, since their categories_dtype are different + c2.dtype == c3.dtype + Description ----------- From b4dd2b3968c2b116da403737989b68b7295fcdb0 Mon Sep 17 00:00:00 2001 From: Zhengbo Wang <77875500+luke396@users.noreply.github.com> Date: Tue, 6 Feb 2024 13:15:50 +0800 Subject: [PATCH 2/5] Update doc --- doc/source/user_guide/categorical.rst | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/doc/source/user_guide/categorical.rst b/doc/source/user_guide/categorical.rst index b5640c5d83e6c..3dc1812fd9ec4 100644 --- a/doc/source/user_guide/categorical.rst +++ b/doc/source/user_guide/categorical.rst @@ -269,10 +269,12 @@ two empty ``CategoricalDtype`` instances: .. ipython:: python import numpy as np - c2 = pd.Categorical(np.full(2, np.nan, dtype=object)) - c3 = pd.Categorical(np.full(2, np.nan, dtype=float)) + c2 = pd.Categorical(np.array([], dtype=object)) + c3 = pd.Categorical(np.array([], dtype=float)) # Unequal, since their categories_dtype are different + # c2.dtype = CategoricalDtype(categories=[], ordered=False, categories_dtype=object) + # c3.dtype = CategoricalDtype(categories=[], ordered=False, categories_dtype=float64) c2.dtype == c3.dtype Description From 136e65d5df1e8cab979f839ec008f0a217e23ae0 Mon Sep 17 00:00:00 2001 From: Zhengbo Wang <77875500+luke396@users.noreply.github.com> Date: Fri, 16 Feb 2024 20:41:28 +0800 Subject: [PATCH 3/5] Improve and fix --- doc/source/user_guide/categorical.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/source/user_guide/categorical.rst b/doc/source/user_guide/categorical.rst index 3dc1812fd9ec4..db9db3b05c9ec 100644 --- a/doc/source/user_guide/categorical.rst +++ b/doc/source/user_guide/categorical.rst @@ -244,7 +244,7 @@ Equality semantics ~~~~~~~~~~~~~~~~~~ Two instances of :class:`~pandas.api.types.CategoricalDtype` compare equal -whenever they have the same categories and order. When comparing two +whenever they have the same categories, order and dtypes. When comparing two unordered categoricals, the order of the ``categories`` is not considered. .. ipython:: python @@ -263,16 +263,16 @@ All instances of ``CategoricalDtype`` compare equal to the string ``'category'`` c1 == "category" -But notice that the ``categories_dtype`` should be considered, especially when comparing with -two empty ``CategoricalDtype`` instances: +Notice that the ``categories_dtype`` should be considered, especially when comparing with +two empty ``CategoricalDtype`` instances. .. ipython:: python + import numpy as np c2 = pd.Categorical(np.array([], dtype=object)) c3 = pd.Categorical(np.array([], dtype=float)) - # Unequal, since their categories_dtype are different # c2.dtype = CategoricalDtype(categories=[], ordered=False, categories_dtype=object) # c3.dtype = CategoricalDtype(categories=[], ordered=False, categories_dtype=float64) c2.dtype == c3.dtype From 822a52f1be8039724b1f273b7e7019f403e0d669 Mon Sep 17 00:00:00 2001 From: Zhengbo Wang <77875500+luke396@users.noreply.github.com> Date: Sat, 17 Feb 2024 09:05:50 +0800 Subject: [PATCH 4/5] Improve doc --- doc/source/user_guide/categorical.rst | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/doc/source/user_guide/categorical.rst b/doc/source/user_guide/categorical.rst index db9db3b05c9ec..2826fdcae5de4 100644 --- a/doc/source/user_guide/categorical.rst +++ b/doc/source/user_guide/categorical.rst @@ -244,8 +244,9 @@ Equality semantics ~~~~~~~~~~~~~~~~~~ Two instances of :class:`~pandas.api.types.CategoricalDtype` compare equal -whenever they have the same categories, order and dtypes. When comparing two -unordered categoricals, the order of the ``categories`` is not considered. +whenever they have the same categories, order. When comparing two +unordered categoricals, the order of the ``categories`` is not considered. Note +that categories with different dtypes are not the same. .. ipython:: python @@ -268,13 +269,9 @@ two empty ``CategoricalDtype`` instances. .. ipython:: python - import numpy as np - c2 = pd.Categorical(np.array([], dtype=object)) c3 = pd.Categorical(np.array([], dtype=float)) - # c2.dtype = CategoricalDtype(categories=[], ordered=False, categories_dtype=object) - # c3.dtype = CategoricalDtype(categories=[], ordered=False, categories_dtype=float64) c2.dtype == c3.dtype Description From 6518dd0ac2e23f97de5bffd939ba7e0da7f2c5cb Mon Sep 17 00:00:00 2001 From: Zhengbo Wang <77875500+luke396@users.noreply.github.com> Date: Sat, 17 Feb 2024 19:42:01 +0800 Subject: [PATCH 5/5] Update doc/source/user_guide/categorical.rst Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com> --- doc/source/user_guide/categorical.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/user_guide/categorical.rst b/doc/source/user_guide/categorical.rst index 2826fdcae5de4..7b2fd32303845 100644 --- a/doc/source/user_guide/categorical.rst +++ b/doc/source/user_guide/categorical.rst @@ -244,7 +244,7 @@ Equality semantics ~~~~~~~~~~~~~~~~~~ Two instances of :class:`~pandas.api.types.CategoricalDtype` compare equal -whenever they have the same categories, order. When comparing two +whenever they have the same categories and order. When comparing two unordered categoricals, the order of the ``categories`` is not considered. Note that categories with different dtypes are not the same.