Skip to content

Commit 6b0c7e7

Browse files
jrebackjorisvandenbossche
authored andcommitted
API/BUG: .apply will correctly infer output shape when axis=1 (pandas-dev#18577)
closes pandas-dev#16353 closes pandas-dev#17348 closes pandas-dev#17437 closes pandas-dev#18573 closes pandas-dev#17970 closes pandas-dev#17892 closes pandas-dev#17602 closes pandas-dev#18775 closes pandas-dev#18901 closes pandas-dev#18919
1 parent a7d1103 commit 6b0c7e7

File tree

9 files changed

+885
-192
lines changed

9 files changed

+885
-192
lines changed

doc/source/basics.rst

+8-2
Original file line numberDiff line numberDiff line change
@@ -793,8 +793,14 @@ The :meth:`~DataFrame.apply` method will also dispatch on a string method name.
793793
df.apply('mean')
794794
df.apply('mean', axis=1)
795795
796-
Depending on the return type of the function passed to :meth:`~DataFrame.apply`,
797-
the result will either be of lower dimension or the same dimension.
796+
The return type of the function passed to :meth:`~DataFrame.apply` affects the
797+
type of the ultimate output from DataFrame.apply
798+
799+
* If the applied function returns a ``Series``, the ultimate output is a ``DataFrame``.
800+
The columns match the index of the ``Series`` returned by the applied function.
801+
* If the applied function returns any other type, the ultimate output is a ``Series``.
802+
* A ``result_type`` kwarg is accepted with the options: ``reduce``, ``broadcast``, and ``expand``.
803+
These will determine how list-likes return results expand (or not) to a ``DataFrame``.
798804

799805
:meth:`~DataFrame.apply` combined with some cleverness can be used to answer many questions
800806
about a data set. For example, suppose we wanted to extract the date where the

doc/source/whatsnew/v0.23.0.txt

+71-2
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ Previous Behavior:
142142
4 NaN
143143
dtype: float64
144144

145-
Current Behavior
145+
Current Behavior:
146146

147147
.. ipython:: python
148148

@@ -167,7 +167,7 @@ Previous Behavior:
167167
3 2.5
168168
dtype: float64
169169

170-
Current Behavior
170+
Current Behavior:
171171

172172
.. ipython:: python
173173

@@ -332,6 +332,73 @@ Convert to an xarray DataArray
332332

333333
p.to_xarray()
334334

335+
.. _whatsnew_0230.api_breaking.apply:
336+
337+
Apply Changes
338+
~~~~~~~~~~~~~
339+
340+
:func:`DataFrame.apply` was inconsistent when applying an arbitrary user-defined-function that returned a list-like with ``axis=1``. Several bugs and inconsistencies
341+
are resolved. If the applied function returns a Series, then pandas will return a DataFrame; otherwise a Series will be returned, this includes the case
342+
where a list-like (e.g. ``tuple`` or ``list`` is returned), (:issue:`16353`, :issue:`17437`, :issue:`17970`, :issue:`17348`, :issue:`17892`, :issue:`18573`,
343+
:issue:`17602`, :issue:`18775`, :issue:`18901`, :issue:`18919`)
344+
345+
.. ipython:: python
346+
347+
df = pd.DataFrame(np.tile(np.arange(3), 6).reshape(6, -1) + 1, columns=['A', 'B', 'C'])
348+
df
349+
350+
Previous Behavior. If the returned shape happened to match the original columns, this would return a ``DataFrame``.
351+
If the return shape did not match, a ``Series`` with lists was returned.
352+
353+
.. code-block:: python
354+
355+
In [3]: df.apply(lambda x: [1, 2, 3], axis=1)
356+
Out[3]:
357+
A B C
358+
0 1 2 3
359+
1 1 2 3
360+
2 1 2 3
361+
3 1 2 3
362+
4 1 2 3
363+
5 1 2 3
364+
365+
In [4]: df.apply(lambda x: [1, 2], axis=1)
366+
Out[4]:
367+
0 [1, 2]
368+
1 [1, 2]
369+
2 [1, 2]
370+
3 [1, 2]
371+
4 [1, 2]
372+
5 [1, 2]
373+
dtype: object
374+
375+
376+
New Behavior. The behavior is consistent. These will *always* return a ``Series``.
377+
378+
.. ipython:: python
379+
380+
df.apply(lambda x: [1, 2, 3], axis=1)
381+
df.apply(lambda x: [1, 2], axis=1)
382+
383+
To have expanded columns, you can use ``result_type='expand'``
384+
385+
.. ipython:: python
386+
387+
df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')
388+
389+
To have broadcast the result across, you can use ``result_type='broadcast'``. The shape
390+
must match the original columns.
391+
392+
.. ipython:: python
393+
394+
df.apply(lambda x: [1, 2, 3], axis=1, result_type='broadcast')
395+
396+
Returning a ``Series`` allows one to control the exact return structure and column names:
397+
398+
.. ipython:: python
399+
400+
df.apply(lambda x: Series([1, 2, 3], index=x.index), axis=1)
401+
335402

336403
.. _whatsnew_0230.api_breaking.build_changes:
337404

@@ -456,6 +523,8 @@ Deprecations
456523
- The ``is_copy`` attribute is deprecated and will be removed in a future version (:issue:`18801`).
457524
- ``IntervalIndex.from_intervals`` is deprecated in favor of the :class:`IntervalIndex` constructor (:issue:`19263`)
458525
- :func:``DataFrame.from_items`` is deprecated. Use :func:``DataFrame.from_dict()`` instead, or :func:``DataFrame.from_dict(OrderedDict())`` if you wish to preserve the key order (:issue:`17320`)
526+
- The ``broadcast`` parameter of ``.apply()`` is removed in favor of ``result_type='broadcast'`` (:issue:`18577`)
527+
- The ``reduce`` parameter of ``.apply()`` is removed in favor of ``result_type='reduce'`` (:issue:`18577`)
459528

460529
.. _whatsnew_0230.prior_deprecations:
461530

0 commit comments

Comments
 (0)