You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ENH: concat and append now can handle unordered categories (#13767)
Concatting categoricals with non-matching categories will now return object dtype instead of raising an error.
* ENH: concat and append now can handleunordered categories
* reomove union_categoricals kw from concat
Copy file name to clipboardExpand all lines: doc/source/whatsnew/v0.19.0.txt
+49-16
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,8 @@ Highlights include:
15
15
16
16
- :func:`merge_asof` for asof-style time-series joining, see :ref:`here <whatsnew_0190.enhancements.asof_merge>`
17
17
- ``.rolling()`` are now time-series aware, see :ref:`here <whatsnew_0190.enhancements.rolling_ts>`
18
+
- :func:`read_csv` now supports parsing ``Categorical`` data, see :ref:`here <whatsnew_0190.enhancements.read_csv_categorical>`
19
+
- A function :func:`union_categorical` has been added for combining categoricals, see :ref:`here <whatsnew_0190.enhancements.union_categoricals>`
18
20
- pandas development api, see :ref:`here <whatsnew_0190.dev_api>`
19
21
- ``PeriodIndex`` now has its own ``period`` dtype, and changed to be more consistent with other ``Index`` classes. See :ref:`here <whatsnew_0190.api.period>`
20
22
- Sparse data structures now gained enhanced support of ``int`` and ``bool`` dtypes, see :ref:`here <whatsnew_0190.sparse>`
@@ -218,7 +220,7 @@ they are in the file or passed in as the ``names`` parameter (:issue:`7160`, :is
218
220
data = '0,1,2\n3,4,5'
219
221
names = ['a', 'b', 'a']
220
222
221
-
Previous behaviour:
223
+
Previous Behavior:
222
224
223
225
.. code-block:: ipython
224
226
@@ -231,7 +233,7 @@ Previous behaviour:
231
233
The first ``a`` column contains the same data as the second ``a`` column, when it should have
232
234
contained the values ``[0, 3]``.
233
235
234
-
New behaviour:
236
+
New Behavior:
235
237
236
238
.. ipython :: python
237
239
@@ -277,6 +279,38 @@ Individual columns can be parsed as a ``Categorical`` using a dict specification
- A function :func:`union_categoricals` has been added for combining categoricals, see :ref:`Unioning Categoricals<categorical.union>` (:issue:`13361`, :issue:`:13763`, issue:`13846`)
288
+
289
+
.. ipython:: python
290
+
291
+
from pandas.types.concat import union_categoricals
292
+
a = pd.Categorical(["b", "c"])
293
+
b = pd.Categorical(["a", "b"])
294
+
union_categoricals([a, b])
295
+
296
+
- ``concat`` and ``append`` now can concat ``category`` dtypes wifht different
297
+
``categories`` as ``object`` dtype (:issue:`13524`)
298
+
299
+
Previous Behavior:
300
+
301
+
.. code-block:: ipython
302
+
303
+
In [1]: s1 = pd.Series(['a', 'b'], dtype='category')
304
+
In [2]: s2 = pd.Series(['b', 'c'], dtype='category')
305
+
In [3]: pd.concat([s1, s2])
306
+
ValueError: incompatible categories in categorical concat
The ``pd.get_dummies`` function now returns dummy-encoded columns as small integers, rather than floats (:issue:`8725`). This should provide an improved memory footprint.
380
414
381
-
Previous behaviour:
415
+
Previous Behavior:
382
416
383
417
.. code-block:: ipython
384
418
385
-
In [1]: pd.get_dummies(['a', 'b', 'a', 'c']).dtypes
419
+
In [1]: pd.get_dummies(['a', 'b', 'a', 'c']).dtypes
386
420
387
421
Out[1]:
388
422
a float64
@@ -404,7 +438,7 @@ Other enhancements
404
438
405
439
- The ``.get_credentials()`` method of ``GbqConnector`` can now first try to fetch `the application default credentials <https://developers.google.com/identity/protocols/application-default-credentials>`__. See the :ref:`docs <io.bigquery_authentication>` for more details (:issue:`13577`).
406
440
407
-
- The ``.tz_localize()`` method of ``DatetimeIndex`` and ``Timestamp`` has gained the ``errors`` keyword, so you can potentially coerce nonexistent timestamps to ``NaT``. The default behaviour remains to raising a ``NonExistentTimeError`` (:issue:`13057`)
441
+
- The ``.tz_localize()`` method of ``DatetimeIndex`` and ``Timestamp`` has gained the ``errors`` keyword, so you can potentially coerce nonexistent timestamps to ``NaT``. The default behavior remains to raising a ``NonExistentTimeError`` (:issue:`13057`)
408
442
- ``pd.to_numeric()`` now accepts a ``downcast`` parameter, which will downcast the data if possible to smallest specified numerical dtype (:issue:`13352`)
409
443
410
444
.. ipython:: python
@@ -448,7 +482,6 @@ Other enhancements
448
482
- ``DataFrame`` has gained the ``.asof()`` method to return the last non-NaN values according to the selected subset (:issue:`13358`)
449
483
- The ``DataFrame`` constructor will now respect key ordering if a list of ``OrderedDict`` objects are passed in (:issue:`13304`)
450
484
- ``pd.read_html()`` has gained support for the ``decimal`` option (:issue:`12907`)
451
-
- A function :func:`union_categorical` has been added for combining categoricals, see :ref:`Unioning Categoricals<categorical.union>` (:issue:`13361`, :issue:`:13763`, :issue:`13846`)
452
485
- ``Series`` has gained the properties ``.is_monotonic``, ``.is_monotonic_increasing``, ``.is_monotonic_decreasing``, similar to ``Index`` (:issue:`13336`)
453
486
- ``DataFrame.to_sql()`` now allows a single value as the SQL type for all columns (:issue:`11886`).
454
487
- ``Series.append`` now supports the ``ignore_index`` option (:issue:`13677`)
@@ -512,7 +545,7 @@ API changes
512
545
``Series.tolist()`` will now return Python types
513
546
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
514
547
515
-
``Series.tolist()`` will now return Python types in the output, mimicking NumPy ``.tolist()`` behaviour (:issue:`10904`)
548
+
``Series.tolist()`` will now return Python types in the output, mimicking NumPy ``.tolist()`` behavior (:issue:`10904`)
516
549
517
550
518
551
.. ipython:: python
@@ -547,7 +580,7 @@ including ``DataFrame`` (:issue:`1134`, :issue:`4581`, :issue:`13538`)
547
580
548
581
.. warning::
549
582
Until 0.18.1, comparing ``Series`` with the same length, would succeed even if
550
-
the ``.index`` are different (the result ignores ``.index``). As of 0.19.0, this will raises ``ValueError`` to be more strict. This section also describes how to keep previous behaviour or align different indexes, using the flexible comparison methods like ``.eq``.
583
+
the ``.index`` are different (the result ignores ``.index``). As of 0.19.0, this will raises ``ValueError`` to be more strict. This section also describes how to keep previous behavior or align different indexes, using the flexible comparison methods like ``.eq``.
551
584
552
585
553
586
As a result, ``Series`` and ``DataFrame`` operators behave as below:
@@ -615,7 +648,7 @@ Logical operators
615
648
616
649
Logical operators align both ``.index``.
617
650
618
-
Previous Behavior (``Series``), only left hand side ``index`` is kept:
651
+
Previous behavior (``Series``), only left hand side ``index`` is kept:
619
652
620
653
.. code-block:: ipython
621
654
@@ -935,7 +968,7 @@ Index ``+`` / ``-`` no longer used for set operations
935
968
Addition and subtraction of the base Index type and of DatetimeIndex
936
969
(not the numeric index types)
937
970
previously performed set operations (set union and difference). This
938
-
behaviour was already deprecated since 0.15.0 (in favor using the specific
971
+
behavior was already deprecated since 0.15.0 (in favor using the specific
939
972
``.union()`` and ``.difference()`` methods), and is now disabled. When
940
973
possible, ``+`` and ``-`` are now used for element-wise operations, for
941
974
example for concatenating strings or subtracting datetimes
@@ -956,13 +989,13 @@ The same operation will now perform element-wise addition:
956
989
pd.Index(['a', 'b']) + pd.Index(['a', 'c'])
957
990
958
991
Note that numeric Index objects already performed element-wise operations.
959
-
For example, the behaviour of adding two integer Indexes:
992
+
For example, the behavior of adding two integer Indexes:
960
993
961
994
.. ipython:: python
962
995
963
996
pd.Index([1, 2, 3]) + pd.Index([2, 3, 4])
964
997
965
-
is unchanged. The base ``Index`` is now made consistent with this behaviour.
998
+
is unchanged. The base ``Index`` is now made consistent with this behavior.
966
999
967
1000
Further, because of this change, it is now possible to subtract two
968
1001
DatetimeIndex objects resulting in a TimedeltaIndex:
@@ -1130,7 +1163,7 @@ the result of calling :func:`read_csv` without the ``chunksize=`` argument.
1130
1163
1131
1164
data = 'A,B\n0,1\n2,3\n4,5\n6,7'
1132
1165
1133
-
Previous behaviour:
1166
+
Previous Behavior:
1134
1167
1135
1168
.. code-block:: ipython
1136
1169
@@ -1142,7 +1175,7 @@ Previous behaviour:
1142
1175
0 4 5
1143
1176
1 6 7
1144
1177
1145
-
New behaviour:
1178
+
New Behavior:
1146
1179
1147
1180
.. ipython :: python
1148
1181
@@ -1268,7 +1301,7 @@ These types are the same on many platform, but for 64 bit python on Windows,
1268
1301
``np.int_`` is 32 bits, and ``np.intp`` is 64 bits. Changing this behavior improves performance for many
0 commit comments