@@ -60,6 +60,8 @@ To select out everything for variable ``A`` we could do:
60
60
61
61
df[df[' variable' ] == ' A' ]
62
62
63
+ .. image :: _static/reshaping_pivot.png
64
+
63
65
But suppose we wish to do time series operations with the variables. A better
64
66
representation would be where the ``columns `` are the unique variables and an
65
67
``index `` of dates identifies individual observations. To reshape the data into
@@ -96,10 +98,12 @@ are homogeneously-typed.
96
98
Reshaping by stacking and unstacking
97
99
------------------------------------
98
100
99
- Closely related to the :meth: `~DataFrame.pivot ` method are the related
100
- :meth: `~DataFrame.stack ` and :meth: `~DataFrame.unstack ` methods available on
101
- ``Series `` and ``DataFrame ``. These methods are designed to work together with
102
- ``MultiIndex `` objects (see the section on :ref: `hierarchical indexing
101
+ .. image :: _static/reshaping_stack.png
102
+
103
+ Closely related to the :meth: `~DataFrame.pivot ` method are the related
104
+ :meth: `~DataFrame.stack ` and :meth: `~DataFrame.unstack ` methods available on
105
+ ``Series `` and ``DataFrame ``. These methods are designed to work together with
106
+ ``MultiIndex `` objects (see the section on :ref: `hierarchical indexing
103
107
<advanced.hierarchical>`). Here are essentially what these methods do:
104
108
105
109
- ``stack ``: "pivot" a level of the (possibly hierarchical) column labels,
@@ -109,6 +113,8 @@ Closely related to the :meth:`~DataFrame.pivot` method are the related
109
113
(possibly hierarchical) row index to the column axis, producing a reshaped
110
114
``DataFrame `` with a new inner-most level of column labels.
111
115
116
+ .. image :: _static/reshaping_unstack.png
117
+
112
118
The clearest way to explain is by example. Let's take a prior example data set
113
119
from the hierarchical indexing section:
114
120
@@ -149,13 +155,18 @@ unstacks the **last level**:
149
155
150
156
.. _reshaping.unstack_by_name :
151
157
158
+ .. image :: _static/reshaping_unstack_1.png
159
+
152
160
If the indexes have names, you can use the level names instead of specifying
153
161
the level numbers:
154
162
155
163
.. ipython :: python
156
164
157
165
stacked.unstack(' second' )
158
166
167
+
168
+ .. image :: _static/reshaping_unstack_0.png
169
+
159
170
Notice that the ``stack `` and ``unstack `` methods implicitly sort the index
160
171
levels involved. Hence a call to ``stack `` and then ``unstack ``, or vice versa,
161
172
will result in a **sorted ** copy of the original ``DataFrame `` or ``Series ``:
@@ -266,11 +277,13 @@ the right thing:
266
277
Reshaping by Melt
267
278
-----------------
268
279
280
+ .. image :: _static/reshaping_melt.png
281
+
269
282
The top-level :func: `~pandas.melt ` function and the corresponding :meth: `DataFrame.melt `
270
- are useful to massage a ``DataFrame `` into a format where one or more columns
271
- are *identifier variables *, while all other columns, considered *measured
272
- variables *, are "unpivoted" to the row axis, leaving just two non-identifier
273
- columns, "variable" and "value". The names of those columns can be customized
283
+ are useful to massage a ``DataFrame `` into a format where one or more columns
284
+ are *identifier variables *, while all other columns, considered *measured
285
+ variables *, are "unpivoted" to the row axis, leaving just two non-identifier
286
+ columns, "variable" and "value". The names of those columns can be customized
274
287
by supplying the ``var_name `` and ``value_name `` parameters.
275
288
276
289
For instance,
@@ -285,7 +298,7 @@ For instance,
285
298
cheese.melt(id_vars = [' first' , ' last' ])
286
299
cheese.melt(id_vars = [' first' , ' last' ], var_name = ' quantity' )
287
300
288
- Another way to transform is to use the :func: `~pandas.wide_to_long ` panel data
301
+ Another way to transform is to use the :func: `~pandas.wide_to_long ` panel data
289
302
convenience function. It is less flexible than :func: `~pandas.melt `, but more
290
303
user-friendly.
291
304
@@ -332,8 +345,8 @@ While :meth:`~DataFrame.pivot` provides general purpose pivoting with various
332
345
data types (strings, numerics, etc.), pandas also provides :func: `~pandas.pivot_table `
333
346
for pivoting with aggregation of numeric data.
334
347
335
- The function :func: `~pandas.pivot_table ` can be used to create spreadsheet-style
336
- pivot tables. See the :ref: `cookbook<cookbook.pivot> ` for some advanced
348
+ The function :func: `~pandas.pivot_table ` can be used to create spreadsheet-style
349
+ pivot tables. See the :ref: `cookbook<cookbook.pivot> ` for some advanced
337
350
strategies.
338
351
339
352
It takes a number of arguments:
@@ -485,7 +498,7 @@ using the ``normalize`` argument:
485
498
pd.crosstab(df.A, df.B, normalize = ' columns' )
486
499
487
500
``crosstab `` can also be passed a third ``Series `` and an aggregation function
488
- (``aggfunc ``) that will be applied to the values of the third ``Series `` within
501
+ (``aggfunc ``) that will be applied to the values of the third ``Series `` within
489
502
each group defined by the first two ``Series ``:
490
503
491
504
.. ipython :: python
@@ -508,8 +521,8 @@ Finally, one can also add margins or normalize this output.
508
521
Tiling
509
522
------
510
523
511
- The :func: `~pandas.cut ` function computes groupings for the values of the input
512
- array and is often used to transform continuous variables to discrete or
524
+ The :func: `~pandas.cut ` function computes groupings for the values of the input
525
+ array and is often used to transform continuous variables to discrete or
513
526
categorical variables:
514
527
515
528
.. ipython :: python
@@ -539,8 +552,8 @@ used to bin the passed data.::
539
552
Computing indicator / dummy variables
540
553
-------------------------------------
541
554
542
- To convert a categorical variable into a "dummy" or "indicator" ``DataFrame ``,
543
- for example a column in a ``DataFrame `` (a ``Series ``) which has ``k `` distinct
555
+ To convert a categorical variable into a "dummy" or "indicator" ``DataFrame ``,
556
+ for example a column in a ``DataFrame `` (a ``Series ``) which has ``k `` distinct
544
557
values, can derive a ``DataFrame `` containing ``k `` columns of 1s and 0s using
545
558
:func: `~pandas.get_dummies `:
546
559
@@ -577,7 +590,7 @@ This function is often used along with discretization functions like ``cut``:
577
590
See also :func: `Series.str.get_dummies <pandas.Series.str.get_dummies> `.
578
591
579
592
:func: `get_dummies ` also accepts a ``DataFrame ``. By default all categorical
580
- variables (categorical in the statistical sense, those with `object ` or
593
+ variables (categorical in the statistical sense, those with `object ` or
581
594
`categorical ` dtype) are encoded as dummy variables.
582
595
583
596
@@ -587,7 +600,7 @@ variables (categorical in the statistical sense, those with `object` or
587
600
' C' : [1 , 2 , 3 ]})
588
601
pd.get_dummies(df)
589
602
590
- All non-object columns are included untouched in the output. You can control
603
+ All non-object columns are included untouched in the output. You can control
591
604
the columns that are encoded with the ``columns `` keyword.
592
605
593
606
.. ipython :: python
@@ -640,7 +653,7 @@ When a column contains only one level, it will be omitted in the result.
640
653
641
654
pd.get_dummies(df, drop_first = True )
642
655
643
- By default new columns will have ``np.uint8 `` dtype.
656
+ By default new columns will have ``np.uint8 `` dtype.
644
657
To choose another dtype, use the``dtype`` argument:
645
658
646
659
.. ipython :: python
0 commit comments