Skip to content

Commit 96a372e

Browse files
committed
reomove union_categoricals kw from concat
1 parent 589d88d commit 96a372e

File tree

10 files changed

+61
-242
lines changed

10 files changed

+61
-242
lines changed

doc/source/categorical.rst

+16-13
Original file line numberDiff line numberDiff line change
@@ -702,30 +702,33 @@ Concatenation
702702

703703
This section describes concatenations specific to ``category`` dtype. See :ref:`Concatenating objects<merging.concat>` for general description.
704704

705-
By default, ``Series`` or ``DataFrame`` concatenation which contains different
706-
categories results in ``object`` dtype.
705+
By default, ``Series`` or ``DataFrame`` concatenation which contains the same categories
706+
results in ``category`` dtype, otherwise results in ``object`` dtype.
707+
Use ``.astype`` or ``union_categoricals`` to get ``category`` result.
707708

708709
.. ipython:: python
709710
711+
# same categories
710712
s1 = pd.Series(['a', 'b'], dtype='category')
711-
s2 = pd.Series(['b', 'c'], dtype='category')
713+
s2 = pd.Series(['a', 'b', 'a'], dtype='category')
712714
pd.concat([s1, s2])
713715
714-
Specifying ``union_categoricals=True`` allows to concat categories following
715-
``union_categoricals`` rule.
716+
# different categories
717+
s3 = pd.Series(['b', 'c'], dtype='category')
718+
pd.concat([s1, s3])
716719
717-
.. ipython:: python
720+
pd.concat([s1, s3]).astype('category')
721+
union_categoricals([s1.values, s3.values])
718722
719-
pd.concat([s1, s2], union_categoricals=True)
720723
721724
Following table summarizes the results of ``Categoricals`` related concatenations.
722725

723-
| arg1 | arg2 | default | ``union_categoricals=True`` |
724-
|---------|-------------------------------------------|---------|------------------------------|
725-
| category | category (identical categories) | category | category |
726-
| category | category (different categories, both not ordered) | object (dtype is inferred) | category |
727-
| category | category (different categories, either one is ordered) | object (dtype is inferred) | object (dtype is inferred) |
728-
| category | not category | object (dtype is inferred) | object (dtype is inferred)
726+
| arg1 | arg2 | result |
727+
|---------|-------------------------------------------|---------|
728+
| category | category (identical categories) | category |
729+
| category | category (different categories, both not ordered) | object (dtype is inferred) |
730+
| category | category (different categories, either one is ordered) | object (dtype is inferred) |
731+
| category | not category | object (dtype is inferred) |
729732
730733
Getting Data In/Out
731734
-------------------

doc/source/merging.rst

+1-6
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ some configurable handling of "what to do with the other axes":
7979

8080
pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
8181
keys=None, levels=None, names=None, verify_integrity=False,
82-
union_categoricals=False, copy=True)
82+
copy=True)
8383

8484
- ``objs`` : a sequence or mapping of Series, DataFrame, or Panel objects. If a
8585
dict is passed, the sorted keys will be used as the `keys` argument, unless
@@ -107,11 +107,6 @@ some configurable handling of "what to do with the other axes":
107107
- ``verify_integrity`` : boolean, default False. Check whether the new
108108
concatenated axis contains duplicates. This can be very expensive relative
109109
to the actual data concatenation.
110-
- ``union_categoricals`` : boolean, default False.
111-
If True, use union_categoricals rule to concat category dtype.
112-
If False, category dtype is kept if both categories are identical,
113-
otherwise results in object dtype.
114-
See :ref:`Categoricals Concatenation<categorical.concat>` for detail.
115110
- ``copy`` : boolean, default True. If False, do not copy data unnecessarily.
116111

117112
Without a little bit of context and example many of these arguments don't make

doc/source/whatsnew/v0.19.0.txt

+25-24
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ they are in the file or passed in as the ``names`` parameter (:issue:`7160`, :is
220220
data = '0,1,2\n3,4,5'
221221
names = ['a', 'b', 'a']
222222

223-
Previous behaviour:
223+
Previous Behavior:
224224

225225
.. code-block:: ipython
226226

@@ -233,7 +233,7 @@ Previous behaviour:
233233
The first ``a`` column contains the same data as the second ``a`` column, when it should have
234234
contained the values ``[0, 3]``.
235235

236-
New behaviour:
236+
New Behavior:
237237

238238
.. ipython :: python
239239

@@ -293,22 +293,23 @@ Categorical Concatenation
293293
b = pd.Categorical(["a", "b"])
294294
union_categoricals([a, b])
295295

296-
- ``concat`` and ``append`` now can concat unordered ``category`` dtypes using ``union_categorical`` internally. (:issue:`13524`)
296+
- ``concat`` and ``append`` now can concat ``category`` dtypes wifht different
297+
``categories`` as ``object`` dtype (:issue:`13524`)
297298

298-
By default, different categories results in ``object`` dtype.
299+
Previous Behavior:
299300

300-
.. ipython:: python
301+
.. code-block:: ipython
301302

302-
s1 = pd.Series(['a', 'b'], dtype='category')
303-
s2 = pd.Series(['b', 'c'], dtype='category')
304-
pd.concat([s1, s2])
303+
In [1]: s1 = pd.Series(['a', 'b'], dtype='category')
304+
In [2]: s2 = pd.Series(['b', 'c'], dtype='category')
305+
In [3]: pd.concat([s1, s2])
306+
ValueError: incompatible categories in categorical concat
305307

306-
Specifying ``union_categoricals=True`` allows to concat categories following
307-
``union_categoricals`` rule.
308+
New Behavior:
308309

309310
.. ipython:: python
310311

311-
pd.concat([s1, s2], union_categoricals=True)
312+
pd.concat([s1, s2])
312313

313314
.. _whatsnew_0190.enhancements.semi_month_offsets:
314315

@@ -411,11 +412,11 @@ get_dummies dtypes
411412

412413
The ``pd.get_dummies`` function now returns dummy-encoded columns as small integers, rather than floats (:issue:`8725`). This should provide an improved memory footprint.
413414

414-
Previous behaviour:
415+
Previous Behavior:
415416

416417
.. code-block:: ipython
417418

418-
In [1]: pd.get_dummies(['a', 'b', 'a', 'c']).dtypes
419+
In [1]: pd.get_dummies(['a', 'b', 'a', 'c']).dtypes
419420

420421
Out[1]:
421422
a float64
@@ -437,7 +438,7 @@ Other enhancements
437438

438439
- The ``.get_credentials()`` method of ``GbqConnector`` can now first try to fetch `the application default credentials <https://developers.google.com/identity/protocols/application-default-credentials>`__. See the :ref:`docs <io.bigquery_authentication>` for more details (:issue:`13577`).
439440

440-
- The ``.tz_localize()`` method of ``DatetimeIndex`` and ``Timestamp`` has gained the ``errors`` keyword, so you can potentially coerce nonexistent timestamps to ``NaT``. The default behaviour remains to raising a ``NonExistentTimeError`` (:issue:`13057`)
441+
- The ``.tz_localize()`` method of ``DatetimeIndex`` and ``Timestamp`` has gained the ``errors`` keyword, so you can potentially coerce nonexistent timestamps to ``NaT``. The default behavior remains to raising a ``NonExistentTimeError`` (:issue:`13057`)
441442
- ``pd.to_numeric()`` now accepts a ``downcast`` parameter, which will downcast the data if possible to smallest specified numerical dtype (:issue:`13352`)
442443

443444
.. ipython:: python
@@ -544,7 +545,7 @@ API changes
544545
``Series.tolist()`` will now return Python types
545546
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
546547

547-
``Series.tolist()`` will now return Python types in the output, mimicking NumPy ``.tolist()`` behaviour (:issue:`10904`)
548+
``Series.tolist()`` will now return Python types in the output, mimicking NumPy ``.tolist()`` behavior (:issue:`10904`)
548549

549550

550551
.. ipython:: python
@@ -579,7 +580,7 @@ including ``DataFrame`` (:issue:`1134`, :issue:`4581`, :issue:`13538`)
579580

580581
.. warning::
581582
Until 0.18.1, comparing ``Series`` with the same length, would succeed even if
582-
the ``.index`` are different (the result ignores ``.index``). As of 0.19.0, this will raises ``ValueError`` to be more strict. This section also describes how to keep previous behaviour or align different indexes, using the flexible comparison methods like ``.eq``.
583+
the ``.index`` are different (the result ignores ``.index``). As of 0.19.0, this will raises ``ValueError`` to be more strict. This section also describes how to keep previous behavior or align different indexes, using the flexible comparison methods like ``.eq``.
583584

584585

585586
As a result, ``Series`` and ``DataFrame`` operators behave as below:
@@ -647,7 +648,7 @@ Logical operators
647648

648649
Logical operators align both ``.index``.
649650

650-
Previous Behavior (``Series``), only left hand side ``index`` is kept:
651+
Previous behavior (``Series``), only left hand side ``index`` is kept:
651652

652653
.. code-block:: ipython
653654

@@ -966,7 +967,7 @@ Index ``+`` / ``-`` no longer used for set operations
966967

967968
Addition and subtraction of the base Index type (not the numeric subclasses)
968969
previously performed set operations (set union and difference). This
969-
behaviour was already deprecated since 0.15.0 (in favor using the specific
970+
behavior was already deprecated since 0.15.0 (in favor using the specific
970971
``.union()`` and ``.difference()`` methods), and is now disabled. When
971972
possible, ``+`` and ``-`` are now used for element-wise operations, for
972973
example for concatenating strings (:issue:`8227`, :issue:`14127`).
@@ -986,13 +987,13 @@ The same operation will now perform element-wise addition:
986987
pd.Index(['a', 'b']) + pd.Index(['a', 'c'])
987988

988989
Note that numeric Index objects already performed element-wise operations.
989-
For example, the behaviour of adding two integer Indexes:
990+
For example, the behavior of adding two integer Indexes:
990991

991992
.. ipython:: python
992993

993994
pd.Index([1, 2, 3]) + pd.Index([2, 3, 4])
994995

995-
is unchanged. The base ``Index`` is now made consistent with this behaviour.
996+
is unchanged. The base ``Index`` is now made consistent with this behavior.
996997

997998

998999
.. _whatsnew_0190.api.difference:
@@ -1143,7 +1144,7 @@ the result of calling :func:`read_csv` without the ``chunksize=`` argument.
11431144

11441145
data = 'A,B\n0,1\n2,3\n4,5\n6,7'
11451146

1146-
Previous behaviour:
1147+
Previous Behavior:
11471148

11481149
.. code-block:: ipython
11491150

@@ -1155,7 +1156,7 @@ Previous behaviour:
11551156
0 4 5
11561157
1 6 7
11571158

1158-
New behaviour:
1159+
New Behavior:
11591160

11601161
.. ipython :: python
11611162

@@ -1281,7 +1282,7 @@ These types are the same on many platform, but for 64 bit python on Windows,
12811282
``np.int_`` is 32 bits, and ``np.intp`` is 64 bits. Changing this behavior improves performance for many
12821283
operations on that platform.
12831284

1284-
Previous behaviour:
1285+
Previous Behavior:
12851286

12861287
.. code-block:: ipython
12871288

@@ -1290,7 +1291,7 @@ Previous behaviour:
12901291
In [2]: i.get_indexer(['b', 'b', 'c']).dtype
12911292
Out[2]: dtype('int32')
12921293

1293-
New behaviour:
1294+
New Behavior:
12941295

12951296
.. code-block:: ipython
12961297

pandas/core/frame.py

+2-8
Original file line numberDiff line numberDiff line change
@@ -4322,8 +4322,7 @@ def infer(x):
43224322
# ----------------------------------------------------------------------
43234323
# Merging / joining methods
43244324

4325-
def append(self, other, ignore_index=False, verify_integrity=False,
4326-
union_categoricals=False):
4325+
def append(self, other, ignore_index=False, verify_integrity=False):
43274326
"""
43284327
Append rows of `other` to the end of this frame, returning a new
43294328
object. Columns not in this frame are added as new columns.
@@ -4336,10 +4335,6 @@ def append(self, other, ignore_index=False, verify_integrity=False,
43364335
If True, do not use the index labels.
43374336
verify_integrity : boolean, default False
43384337
If True, raise ValueError on creating index with duplicates.
4339-
union_categoricals : bool, default False
4340-
If True, use union_categoricals rule to concat category dtype.
4341-
If False, category dtype is kept if both categories are identical,
4342-
otherwise results in object dtype.
43434338
43444339
Returns
43454340
-------
@@ -4416,8 +4411,7 @@ def append(self, other, ignore_index=False, verify_integrity=False,
44164411
else:
44174412
to_concat = [self, other]
44184413
return concat(to_concat, ignore_index=ignore_index,
4419-
verify_integrity=verify_integrity,
4420-
union_categoricals=union_categoricals)
4414+
verify_integrity=verify_integrity)
44214415

44224416
def join(self, other, on=None, how='left', lsuffix='', rsuffix='',
44234417
sort=False):

pandas/core/internals.py

+6-14
Original file line numberDiff line numberDiff line change
@@ -1144,7 +1144,7 @@ def get_result(other):
11441144
return self._try_coerce_result(result)
11451145

11461146
# error handler if we have an issue operating with the function
1147-
def handle_error(detail):
1147+
def handle_error():
11481148

11491149
if raise_on_error:
11501150
raise TypeError('Could not operate %s with block values %s' %
@@ -1165,7 +1165,7 @@ def handle_error(detail):
11651165
except ValueError as detail:
11661166
raise
11671167
except Exception as detail:
1168-
result = handle_error(detail)
1168+
result = handle_error()
11691169

11701170
# technically a broadcast error in numpy can 'work' by returning a
11711171
# boolean False
@@ -4771,8 +4771,7 @@ def _putmask_smart(v, m, n):
47714771
return nv
47724772

47734773

4774-
def concatenate_block_managers(mgrs_indexers, axes, concat_axis,
4775-
copy, union_categoricals=False):
4774+
def concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy):
47764775
"""
47774776
Concatenate block managers into one.
47784777
@@ -4782,19 +4781,14 @@ def concatenate_block_managers(mgrs_indexers, axes, concat_axis,
47824781
axes : list of Index
47834782
concat_axis : int
47844783
copy : bool
4785-
union_categoricals : bool, default False
4786-
If True, use union_categoricals rule to concat CategoricalBlock.
4787-
If False, CategoricalBlock is kept if both categories are
4788-
identical, otherwise results in ObjectBlock.
47894784
47904785
"""
47914786
concat_plan = combine_concat_plans(
47924787
[get_mgr_concatenation_plan(mgr, indexers)
47934788
for mgr, indexers in mgrs_indexers], concat_axis)
47944789

47954790
blocks = [make_block(
4796-
concatenate_join_units(join_units, concat_axis, copy=copy,
4797-
union_categoricals=union_categoricals),
4791+
concatenate_join_units(join_units, concat_axis, copy=copy),
47984792
placement=placement) for placement, join_units in concat_plan]
47994793

48004794
return BlockManager(blocks, axes)
@@ -4880,8 +4874,7 @@ def get_empty_dtype_and_na(join_units):
48804874
raise AssertionError("invalid dtype determination in get_concat_dtype")
48814875

48824876

4883-
def concatenate_join_units(join_units, concat_axis, copy,
4884-
union_categoricals=False):
4877+
def concatenate_join_units(join_units, concat_axis, copy):
48854878
"""
48864879
Concatenate values from several join units along selected axis.
48874880
"""
@@ -4901,8 +4894,7 @@ def concatenate_join_units(join_units, concat_axis, copy,
49014894
if copy and concat_values.base is not None:
49024895
concat_values = concat_values.copy()
49034896
else:
4904-
concat_values = _concat._concat_compat(
4905-
to_concat, axis=concat_axis, union_categoricals=union_categoricals)
4897+
concat_values = _concat._concat_compat(to_concat, axis=concat_axis)
49064898

49074899
return concat_values
49084900

pandas/core/series.py

+2-8
Original file line numberDiff line numberDiff line change
@@ -1525,8 +1525,7 @@ def searchsorted(self, v, side='left', sorter=None):
15251525
# -------------------------------------------------------------------
15261526
# Combination
15271527

1528-
def append(self, to_append, ignore_index=False, verify_integrity=False,
1529-
union_categoricals=False):
1528+
def append(self, to_append, ignore_index=False, verify_integrity=False):
15301529
"""
15311530
Concatenate two or more Series.
15321531
@@ -1540,10 +1539,6 @@ def append(self, to_append, ignore_index=False, verify_integrity=False,
15401539
15411540
verify_integrity : boolean, default False
15421541
If True, raise Exception on creating index with duplicates
1543-
union_categoricals : bool, default False
1544-
If True, use union_categoricals rule to concat category dtype.
1545-
If False, category dtype is kept if both categories are identical,
1546-
otherwise results in object dtype.
15471542
15481543
Returns
15491544
-------
@@ -1597,8 +1592,7 @@ def append(self, to_append, ignore_index=False, verify_integrity=False,
15971592
else:
15981593
to_concat = [self, to_append]
15991594
return concat(to_concat, ignore_index=ignore_index,
1600-
verify_integrity=verify_integrity,
1601-
union_categoricals=union_categoricals)
1595+
verify_integrity=verify_integrity)
16021596

16031597
def _binop(self, other, func, level=None, fill_value=None):
16041598
"""

0 commit comments

Comments
 (0)