Skip to content

Commit 7e18be0

Browse files
committed
expand whatsnew and comments
1 parent 0b07426 commit 7e18be0

File tree

2 files changed

+40
-3
lines changed

2 files changed

+40
-3
lines changed

doc/source/whatsnew/v0.23.0.txt

+23-1
Original file line numberDiff line numberDiff line change
@@ -460,6 +460,29 @@ To restore previous behavior, simply set ``expand`` to ``False``:
460460
extracted
461461
type(extracted)
462462

463+
.. _whatsnew_0230.api_breaking.cdt_ordered:
464+
465+
Default value for the ``ordered`` parameter of ``CategoricalDtype``
466+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
467+
468+
The default value of the ``ordered`` parameter for :class:`~pandas.api.types.CategoricalDtype` has changed from ``False`` to ``None`` to allow updating of ``categories`` without impacting ``ordered``. Behavior should remain consistent for downstream objects, such as :class:`Categorical` (:issue:`18790`)
469+
470+
In previous versions, the default value for the ``ordered`` parameter was ``False``. This could potentially lead to the ``ordered`` parameter unintentionally being changed from ``True`` to ``False`` when users attempt to update ``categories`` if ``ordered`` is not explicitly specified, as it would silently default to ``False``. The new behavior for ``ordered=None`` is to retain the existing value of ``ordered``.
471+
472+
New Behavior:
473+
474+
.. ipython:: python
475+
476+
from pandas.api.types import CategoricalDtype
477+
cat = pd.Categorical(list('abcaba'), ordered=True, categories=list('cba'))
478+
cat
479+
cdt = CategoricalDtype(categories=list('cbad'))
480+
cat.astype(cdt)
481+
482+
Notice in the example above that the converted ``Categorical`` has retained ``ordered=True``. Had the default value for ``ordered`` remained as ``False``, the converted ``Categorical`` would have become unordered, despite ``ordered=False`` never being explicitly specified. To change the value of ``ordered``, explicitly pass it to the new dtype, e.g. ``CategoricalDtype(categories=list('cbad'), ordered=False)``.
483+
484+
Note that the unintenional conversion of ``ordered`` discussed above did not arise in previous versions due to separate bugs that prevented ``astype`` from doing any type of category to category conversion (:issue:`10696`, :issue:`18593`). These bugs have been fixed in this release, and motivated changing the default value of ``ordered``.
485+
463486
.. _whatsnew_0230.api:
464487

465488
Other API Changes
@@ -507,7 +530,6 @@ Other API Changes
507530
- Set operations (union, difference...) on :class:`IntervalIndex` with incompatible index types will now raise a ``TypeError`` rather than a ``ValueError`` (:issue:`19329`)
508531
- :class:`DateOffset` objects render more simply, e.g. "<DateOffset: days=1>" instead of "<DateOffset: kwds={'days': 1}>" (:issue:`19403`)
509532
- :func:`pandas.merge` provides a more informative error message when trying to merge on timezone-aware and timezone-naive columns (:issue:`15800`)
510-
- The default value of the ``ordered`` parameter for :class:`~pandas.api.types.CategoricalDtype` has changed from ``False`` to ``None`` to allow updating of ``categories`` without impacting ``ordered``. Behavior should remain consistent for downstream objects, such as :class:`Categorical` (:issue:`18790`)
511533

512534
.. _whatsnew_0230.deprecations:
513535

pandas/core/dtypes/dtypes.py

+17-2
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,17 @@ def __hash__(self):
206206
return int(self._hash_categories(self.categories, self.ordered))
207207

208208
def __eq__(self, other):
209+
"""
210+
Rules for CDT equality:
211+
1) Any CDT is equal to the string 'category'
212+
2) Any CDT is equal to a CDT with categories=None regardless of ordered
213+
3) A CDT with ordered=True is only equal to another CDT with
214+
ordered=True and identical categories in the same order
215+
4) A CDT with ordered={False, None} is only equal to another CDT with
216+
ordered={False, None} and identical categories, but same order is
217+
not required. There is no distinction between False/None.
218+
5) Any other comparison returns False
219+
"""
209220
if isinstance(other, compat.string_types):
210221
return other == self.name
211222

@@ -219,11 +230,15 @@ def __eq__(self, other):
219230
# CDT(., .) = CDT(None, True).
220231
return True
221232
elif self.ordered or other.ordered:
222-
# at least one ordered
233+
# At least one has ordered=True; equal if both have ordered=True
234+
# and the same values for categories in the same order.
223235
return ((self.ordered == other.ordered) and
224236
self.categories.equals(other.categories))
225237
else:
226-
# both unordered; this could probably be optimized / cached
238+
# Neither has ordered=True; equal if both have the same categories,
239+
# but same order is not necessary. There is no distinction between
240+
# ordered=False and ordered=None: CDT(., False) and CDT(., None)
241+
# will be equal if they have the same categories.
227242
return hash(self) == hash(other)
228243

229244
def __repr__(self):

0 commit comments

Comments
 (0)