Skip to content

Commit b3f07b2

Browse files
jalammarTomAugspurger
authored andcommitted
DOC: add reshaping visuals to the docs (Reshaping and Pivot Tables) (#20960)
1 parent 3dd90a2 commit b3f07b2

7 files changed

+32
-19
lines changed

doc/source/_static/reshaping_melt.png

51.7 KB
Loading
50.9 KB
Loading
53.2 KB
Loading
52.6 KB
Loading
57.2 KB
Loading
56.6 KB
Loading

doc/source/reshaping.rst

+32-19
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,8 @@ To select out everything for variable ``A`` we could do:
6060
6161
df[df['variable'] == 'A']
6262
63+
.. image:: _static/reshaping_pivot.png
64+
6365
But suppose we wish to do time series operations with the variables. A better
6466
representation would be where the ``columns`` are the unique variables and an
6567
``index`` of dates identifies individual observations. To reshape the data into
@@ -96,10 +98,12 @@ are homogeneously-typed.
9698
Reshaping by stacking and unstacking
9799
------------------------------------
98100

99-
Closely related to the :meth:`~DataFrame.pivot` method are the related
100-
:meth:`~DataFrame.stack` and :meth:`~DataFrame.unstack` methods available on
101-
``Series`` and ``DataFrame``. These methods are designed to work together with
102-
``MultiIndex`` objects (see the section on :ref:`hierarchical indexing
101+
.. image:: _static/reshaping_stack.png
102+
103+
Closely related to the :meth:`~DataFrame.pivot` method are the related
104+
:meth:`~DataFrame.stack` and :meth:`~DataFrame.unstack` methods available on
105+
``Series`` and ``DataFrame``. These methods are designed to work together with
106+
``MultiIndex`` objects (see the section on :ref:`hierarchical indexing
103107
<advanced.hierarchical>`). Here are essentially what these methods do:
104108

105109
- ``stack``: "pivot" a level of the (possibly hierarchical) column labels,
@@ -109,6 +113,8 @@ Closely related to the :meth:`~DataFrame.pivot` method are the related
109113
(possibly hierarchical) row index to the column axis, producing a reshaped
110114
``DataFrame`` with a new inner-most level of column labels.
111115

116+
.. image:: _static/reshaping_unstack.png
117+
112118
The clearest way to explain is by example. Let's take a prior example data set
113119
from the hierarchical indexing section:
114120

@@ -149,13 +155,18 @@ unstacks the **last level**:
149155
150156
.. _reshaping.unstack_by_name:
151157

158+
.. image:: _static/reshaping_unstack_1.png
159+
152160
If the indexes have names, you can use the level names instead of specifying
153161
the level numbers:
154162

155163
.. ipython:: python
156164
157165
stacked.unstack('second')
158166
167+
168+
.. image:: _static/reshaping_unstack_0.png
169+
159170
Notice that the ``stack`` and ``unstack`` methods implicitly sort the index
160171
levels involved. Hence a call to ``stack`` and then ``unstack``, or vice versa,
161172
will result in a **sorted** copy of the original ``DataFrame`` or ``Series``:
@@ -266,11 +277,13 @@ the right thing:
266277
Reshaping by Melt
267278
-----------------
268279

280+
.. image:: _static/reshaping_melt.png
281+
269282
The top-level :func:`~pandas.melt` function and the corresponding :meth:`DataFrame.melt`
270-
are useful to massage a ``DataFrame`` into a format where one or more columns
271-
are *identifier variables*, while all other columns, considered *measured
272-
variables*, are "unpivoted" to the row axis, leaving just two non-identifier
273-
columns, "variable" and "value". The names of those columns can be customized
283+
are useful to massage a ``DataFrame`` into a format where one or more columns
284+
are *identifier variables*, while all other columns, considered *measured
285+
variables*, are "unpivoted" to the row axis, leaving just two non-identifier
286+
columns, "variable" and "value". The names of those columns can be customized
274287
by supplying the ``var_name`` and ``value_name`` parameters.
275288

276289
For instance,
@@ -285,7 +298,7 @@ For instance,
285298
cheese.melt(id_vars=['first', 'last'])
286299
cheese.melt(id_vars=['first', 'last'], var_name='quantity')
287300
288-
Another way to transform is to use the :func:`~pandas.wide_to_long` panel data
301+
Another way to transform is to use the :func:`~pandas.wide_to_long` panel data
289302
convenience function. It is less flexible than :func:`~pandas.melt`, but more
290303
user-friendly.
291304

@@ -332,8 +345,8 @@ While :meth:`~DataFrame.pivot` provides general purpose pivoting with various
332345
data types (strings, numerics, etc.), pandas also provides :func:`~pandas.pivot_table`
333346
for pivoting with aggregation of numeric data.
334347

335-
The function :func:`~pandas.pivot_table` can be used to create spreadsheet-style
336-
pivot tables. See the :ref:`cookbook<cookbook.pivot>` for some advanced
348+
The function :func:`~pandas.pivot_table` can be used to create spreadsheet-style
349+
pivot tables. See the :ref:`cookbook<cookbook.pivot>` for some advanced
337350
strategies.
338351

339352
It takes a number of arguments:
@@ -485,7 +498,7 @@ using the ``normalize`` argument:
485498
pd.crosstab(df.A, df.B, normalize='columns')
486499
487500
``crosstab`` can also be passed a third ``Series`` and an aggregation function
488-
(``aggfunc``) that will be applied to the values of the third ``Series`` within
501+
(``aggfunc``) that will be applied to the values of the third ``Series`` within
489502
each group defined by the first two ``Series``:
490503

491504
.. ipython:: python
@@ -508,8 +521,8 @@ Finally, one can also add margins or normalize this output.
508521
Tiling
509522
------
510523

511-
The :func:`~pandas.cut` function computes groupings for the values of the input
512-
array and is often used to transform continuous variables to discrete or
524+
The :func:`~pandas.cut` function computes groupings for the values of the input
525+
array and is often used to transform continuous variables to discrete or
513526
categorical variables:
514527

515528
.. ipython:: python
@@ -539,8 +552,8 @@ used to bin the passed data.::
539552
Computing indicator / dummy variables
540553
-------------------------------------
541554

542-
To convert a categorical variable into a "dummy" or "indicator" ``DataFrame``,
543-
for example a column in a ``DataFrame`` (a ``Series``) which has ``k`` distinct
555+
To convert a categorical variable into a "dummy" or "indicator" ``DataFrame``,
556+
for example a column in a ``DataFrame`` (a ``Series``) which has ``k`` distinct
544557
values, can derive a ``DataFrame`` containing ``k`` columns of 1s and 0s using
545558
:func:`~pandas.get_dummies`:
546559

@@ -577,7 +590,7 @@ This function is often used along with discretization functions like ``cut``:
577590
See also :func:`Series.str.get_dummies <pandas.Series.str.get_dummies>`.
578591

579592
:func:`get_dummies` also accepts a ``DataFrame``. By default all categorical
580-
variables (categorical in the statistical sense, those with `object` or
593+
variables (categorical in the statistical sense, those with `object` or
581594
`categorical` dtype) are encoded as dummy variables.
582595

583596

@@ -587,7 +600,7 @@ variables (categorical in the statistical sense, those with `object` or
587600
'C': [1, 2, 3]})
588601
pd.get_dummies(df)
589602
590-
All non-object columns are included untouched in the output. You can control
603+
All non-object columns are included untouched in the output. You can control
591604
the columns that are encoded with the ``columns`` keyword.
592605

593606
.. ipython:: python
@@ -640,7 +653,7 @@ When a column contains only one level, it will be omitted in the result.
640653
641654
pd.get_dummies(df, drop_first=True)
642655
643-
By default new columns will have ``np.uint8`` dtype.
656+
By default new columns will have ``np.uint8`` dtype.
644657
To choose another dtype, use the``dtype`` argument:
645658

646659
.. ipython:: python

0 commit comments

Comments
 (0)