Skip to content

Commit 221b493

Browse files
committed
API: deprecate setting of .ordered directly (GH9347, GH9190)
add set_ordered method for setting ordered default for Categorical is now to NOT order unless explicity specified whatsnew doc updates for categorical api changes add ability to specify keywords to astype for creation defaults fix issue with grouping with sort=True on an unordered Categorical update categorical.rst docs test unsortable when ordered=True v0.16.0.txt / release notes updates clean up check for ordering allow groupby to work on an unordered categorical
1 parent 28b5ef9 commit 221b493

16 files changed

+461
-111
lines changed

doc/source/api.rst

+2
Original file line numberDiff line numberDiff line change
@@ -585,6 +585,8 @@ following usable methods and properties (all available as ``Series.cat.<method_o
585585
Categorical.remove_categories
586586
Categorical.remove_unused_categories
587587
Categorical.set_categories
588+
Categorical.as_ordered
589+
Categorical.as_unordered
588590
Categorical.codes
589591

590592
To create a Series of dtype ``category``, use ``cat = s.astype("category")``.

doc/source/categorical.rst

+27-13
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,6 @@ By using some special functions:
9090
See :ref:`documentation <reshaping.tile.cut>` for :func:`~pandas.cut`.
9191

9292
By passing a :class:`pandas.Categorical` object to a `Series` or assigning it to a `DataFrame`.
93-
This is the only possibility to specify differently ordered categories (or no order at all) at
94-
creation time and the only reason to use :class:`pandas.Categorical` directly:
9593

9694
.. ipython:: python
9795
@@ -103,6 +101,14 @@ creation time and the only reason to use :class:`pandas.Categorical` directly:
103101
df["B"] = raw_cat
104102
df
105103
104+
You can also specify differently ordered categories or make the resulting data ordered, by passing these arguments to ``astype()``:
105+
106+
.. ipython:: python
107+
108+
s = Series(["a","b","c","a"])
109+
s_cat = s.astype("category", categories=["b","c","d"], ordered=False)
110+
s_cat
111+
106112
Categorical data has a specific ``category`` :ref:`dtype <basics.dtypes>`:
107113

108114
.. ipython:: python
@@ -176,10 +182,9 @@ It's also possible to pass in the categories in a specific order:
176182
s.cat.ordered
177183
178184
.. note::
179-
New categorical data is automatically ordered if the passed in values are sortable or a
180-
`categories` argument is supplied. This is a difference to R's `factors`, which are unordered
181-
unless explicitly told to be ordered (``ordered=TRUE``). You can of course overwrite that by
182-
passing in an explicit ``ordered=False``.
185+
186+
New categorical data are NOT automatically ordered. You must explicity pass ``ordered=True`` to
187+
indicate an ordered ``Categorical``.
183188

184189

185190
Renaming categories
@@ -270,6 +275,10 @@ Sorting and Order
270275

271276
.. _categorical.sort:
272277

278+
.. warning::
279+
280+
The default for construction has change in v0.16.0 to ``ordered=False``, from the prior implicit ``ordered=True``
281+
273282
If categorical data is ordered (``s.cat.ordered == True``), then the order of the categories has a
274283
meaning and certain operations are possible. If the categorical is unordered, a `TypeError` is
275284
raised.
@@ -281,18 +290,26 @@ raised.
281290
s.sort()
282291
except TypeError as e:
283292
print("TypeError: " + str(e))
284-
s = Series(["a","b","c","a"], dtype="category") # ordered per default!
293+
s = Series(["a","b","c","a"]).astype('category', ordered=True)
285294
s.sort()
286295
s
287296
s.min(), s.max()
288297
298+
You can set categorical data to be ordered by using ``as_ordered()`` or unordered by using ``as_unordered()``. These will by
299+
default return a *new* object.
300+
301+
.. ipython:: python
302+
303+
s.cat.as_ordered()
304+
s.cat.as_unordered()
305+
289306
Sorting will use the order defined by categories, not any lexical order present on the data type.
290307
This is even true for strings and numeric data:
291308

292309
.. ipython:: python
293310
294311
s = Series([1,2,3,1], dtype="category")
295-
s.cat.categories = [2,3,1]
312+
s = s.cat.set_categories([2,3,1], ordered=True)
296313
s
297314
s.sort()
298315
s
@@ -310,7 +327,7 @@ necessarily make the sort order the same as the categories order.
310327
.. ipython:: python
311328
312329
s = Series([1,2,3,1], dtype="category")
313-
s = s.cat.reorder_categories([2,3,1])
330+
s = s.cat.reorder_categories([2,3,1], ordered=True)
314331
s
315332
s.sort()
316333
s
@@ -339,7 +356,7 @@ The ordering of the categorical is determined by the ``categories`` of that colu
339356

340357
.. ipython:: python
341358
342-
dfs = DataFrame({'A' : Categorical(list('bbeebbaa'),categories=['e','a','b']),
359+
dfs = DataFrame({'A' : Categorical(list('bbeebbaa'),categories=['e','a','b'],ordered=True),
343360
'B' : [1,2,1,2,2,1,2,1] })
344361
dfs.sort(['A','B'])
345362
@@ -664,9 +681,6 @@ The following differences to R's factor functions can be observed:
664681

665682
* R's `levels` are named `categories`
666683
* R's `levels` are always of type string, while `categories` in pandas can be of any dtype.
667-
* New categorical data is automatically ordered if the passed in values are sortable or a
668-
`categories` argument is supplied. This is a difference to R's `factors`, which are unordered
669-
unless explicitly told to be ordered (``ordered=TRUE``).
670684
* It's not possible to specify labels at creation time. Use ``s.cat.rename_categories(new_labels)``
671685
afterwards.
672686
* In contrast to R's `factor` function, using categorical data as the sole input to create a

doc/source/release.rst

+1
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ Highlights include:
5959
- ``Series.to_coo/from_coo`` methods to interact with ``scipy.sparse``, see :ref:`here <whatsnew_0160.enhancements.sparse>`
6060
- Backwards incompatible change to ``Timedelta`` to conform the ``.seconds`` attribute with ``datetime.timedelta``, see :ref:`here <whatsnew_0160.api_breaking.timedelta>`
6161
- Changes to the ``.loc`` slicing API to conform with the behavior of ``.ix`` see :ref:`here <whatsnew_0160.api_breaking.indexing>`
62+
- Changes to the default for ordering in the ``Categorical`` constructor, see :ref:`here <whatsnew_0160.api_breaking.categorical>`
6263

6364
See the :ref:`v0.16.0 Whatsnew <whatsnew_0160>` overview or the issue tracker on GitHub for an extensive list
6465
of all API changes, enhancements and bugs that have been fixed in 0.16.0.

doc/source/whatsnew/v0.16.0.txt

+145
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ users upgrade to this version.
1313
* ``Series.to_coo/from_coo`` methods to interact with ``scipy.sparse``, see :ref:`here <whatsnew_0160.enhancements.sparse>`
1414
* Backwards incompatible change to ``Timedelta`` to conform the ``.seconds`` attribute with ``datetime.timedelta``, see :ref:`here <whatsnew_0160.api_breaking.timedelta>`
1515
* Changes to the ``.loc`` slicing API to conform with the behavior of ``.ix`` see :ref:`here <whatsnew_0160.api_breaking.indexing>`
16+
* Changes to the default for ordering in the ``Categorical`` constructor, see :ref:`here <whatsnew_0160.api_breaking.categorical>`
1617

1718
- Check the :ref:`API Changes <whatsnew_0160.api>` and :ref:`deprecations <whatsnew_0160.deprecations>` before updating
1819

@@ -366,6 +367,150 @@ API Changes
366367
- ``Series.describe`` for categorical data will now give counts and frequencies of 0, not ``NaN``, for unused categories (:issue:`9443`)
367368

368369

370+
Categorical Changes
371+
~~~~~~~~~~~~~~~~~~~
372+
373+
.. _whatsnew_0160.api_breaking.categorical:
374+
375+
In prior versions, ``Categoricals`` that had an unspecified ordering (meaning no ``ordered`` keyword was passed) were defaulted as ``ordered`` Categoricals. Going forward, the ``ordered`` keyword in the ``Categorical`` constructor will default to ``False``. Ordering must now be explicit.
376+
377+
Furthermore, previously you *could* change the ``ordered`` attribute of a Categorical by just setting the attribute, e.g. ``cat.ordered=True``; This is now deprecated and you should use ``cat.as_ordered()`` or ``cat.as_unordered()``. These will by default return a **new** object and not modify the existing object. (:issue:`9347`, :issue:`9190`)
378+
379+
Previous Behavior
380+
381+
.. code-block:: python
382+
383+
In [3]: s = Series([0,1,2], dtype='category')
384+
385+
In [4]: s
386+
Out[4]:
387+
0 0
388+
1 1
389+
2 2
390+
dtype: category
391+
Categories (3, int64): [0 < 1 < 2]
392+
393+
In [5]: s.cat.ordered
394+
Out[5]: True
395+
396+
In [6]: s.cat.ordered = False
397+
398+
In [7]: s
399+
Out[7]:
400+
0 0
401+
1 1
402+
2 2
403+
dtype: category
404+
Categories (3, int64): [0, 1, 2]
405+
406+
New Behavior
407+
408+
.. ipython:: python
409+
410+
s = Series([0,1,2], dtype='category')
411+
s
412+
s.cat.ordered
413+
s = s.cat.as_ordered()
414+
s
415+
s.cat.ordered
416+
417+
# you can set in the constructor of the Categorical
418+
s = Series(Categorical([0,1,2],ordered=True))
419+
s
420+
s.cat.ordered
421+
422+
For ease of creation of series of categorical data, we have added the ability to pass keywords when calling ``.astype()``. These are passed directly to the constructor.
423+
424+
.. ipython:: python
425+
426+
s = Series(["a","b","c","a"]).astype('category',ordered=True)
427+
s
428+
s = Series(["a","b","c","a"]).astype('category',categories=list('abcdef'),ordered=False)
429+
s
430+
431+
.. warning::
432+
433+
This simple API change may have suprising effects if a user is relying on the previous defaulted behavior implicity. In particular,
434+
sorting operations with a ``Categorical`` will now raise an error:
435+
436+
.. code-block:: python
437+
438+
In [1]: df = DataFrame({ 'A' : Series(list('aabc')).astype('category'), 'B' : np.arange(4) })
439+
440+
In [2]: df['A'].order()
441+
TypeError: Categorical is not ordered for operation argsort
442+
you can use .as_ordered() to change the Categorical to an ordered one
443+
444+
The solution is to make 'A' orderable, e.g. ``df['A'] = df['A'].cat.as_ordered()``
445+
446+
447+
Indexing Changes
448+
~~~~~~~~~~~~~~~~
449+
450+
.. _whatsnew_0160.api_breaking.indexing:
451+
452+
The behavior of a small sub-set of edge cases for using ``.loc`` have changed (:issue:`8613`). Furthermore we have improved the content of the error messages that are raised:
453+
454+
- slicing with ``.loc`` where the start and/or stop bound is not found in the index is now allowed; this previously would raise a ``KeyError``. This makes the behavior the same as ``.ix`` in this case. This change is only for slicing, not when indexing with a single label.
455+
456+
.. ipython:: python
457+
458+
df = DataFrame(np.random.randn(5,4),
459+
columns=list('ABCD'),
460+
index=date_range('20130101',periods=5))
461+
df
462+
s = Series(range(5),[-2,-1,1,2,3])
463+
s
464+
465+
Previous Behavior
466+
467+
.. code-block:: python
468+
469+
In [4]: df.loc['2013-01-02':'2013-01-10']
470+
KeyError: 'stop bound [2013-01-10] is not in the [index]'
471+
472+
In [6]: s.loc[-10:3]
473+
KeyError: 'start bound [-10] is not the [index]'
474+
475+
New Behavior
476+
477+
.. ipython:: python
478+
479+
df.loc['2013-01-02':'2013-01-10']
480+
s.loc[-10:3]
481+
482+
- allow slicing with float-like values on an integer index for ``.ix``. Previously this was only enabled for ``.loc``:
483+
484+
Previous Behavior
485+
486+
.. code-block:: python
487+
488+
In [8]: s.ix[-1.0:2]
489+
TypeError: the slice start value [-1.0] is not a proper indexer for this index type (Int64Index)
490+
491+
New Behavior
492+
493+
.. ipython:: python
494+
495+
s.ix[-1.0:2]
496+
497+
- provide a useful exception for indexing with an invalid type for that index when using ``.loc``. For example trying to use ``.loc`` on an index of type ``DatetimeIndex`` or ``PeriodIndex`` or ``TimedeltaIndex``, with an integer (or a float).
498+
499+
Previous Behavior
500+
501+
.. code-block:: python
502+
503+
In [4]: df.loc[2:3]
504+
KeyError: 'start bound [2] is not the [index]'
505+
506+
New Behavior
507+
508+
.. code-block:: python
509+
510+
In [4]: df.loc[2:3]
511+
TypeError: Cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'> with <type 'int'> keys
512+
513+
369514
.. _whatsnew_0160.deprecations:
370515

371516
Deprecations

0 commit comments

Comments
 (0)