Skip to content

Commit e50d397

Browse files
ResidentMariojreback
authored andcommitted
API: add top-level melt function as method to DataFrame
xref #12640 xref #14876 Author: Aleksey Bilogur <[email protected]> Closes #15521 from ResidentMario/12640 and squashes the following commits: 1657246 [Aleksey Bilogur] two doc changes 28a38f2 [Aleksey Bilogur] tweak whatsnew entry. 5f306a9 [Aleksey Bilogur] +whatsnew ff895fe [Aleksey Bilogur] Add tests, update docs. 11f3fe4 [Aleksey Bilogur] rm stray debug. 3cbbed5 [Aleksey Bilogur] Melt docstring. d54dc2f [Aleksey Bilogur] +pd.DataFrame.melt.
1 parent faf6401 commit e50d397

File tree

6 files changed

+182
-133
lines changed

6 files changed

+182
-133
lines changed

doc/source/api.rst

+1
Original file line numberDiff line numberDiff line change
@@ -933,6 +933,7 @@ Reshaping, sorting, transposing
933933
DataFrame.swaplevel
934934
DataFrame.stack
935935
DataFrame.unstack
936+
DataFrame.melt
936937
DataFrame.T
937938
DataFrame.to_panel
938939
DataFrame.to_xarray

doc/source/reshaping.rst

+6-5
Original file line numberDiff line numberDiff line change
@@ -265,8 +265,8 @@ the right thing:
265265
Reshaping by Melt
266266
-----------------
267267

268-
The :func:`~pandas.melt` function is useful to massage a
269-
DataFrame into a format where one or more columns are identifier variables,
268+
The top-level :func:``melt` and :func:`~DataFrame.melt` functions are useful to
269+
massage a DataFrame into a format where one or more columns are identifier variables,
270270
while all other columns, considered measured variables, are "unpivoted" to the
271271
row axis, leaving just two non-identifier columns, "variable" and "value". The
272272
names of those columns can be customized by supplying the ``var_name`` and
@@ -281,10 +281,11 @@ For instance,
281281
'height' : [5.5, 6.0],
282282
'weight' : [130, 150]})
283283
cheese
284-
pd.melt(cheese, id_vars=['first', 'last'])
285-
pd.melt(cheese, id_vars=['first', 'last'], var_name='quantity')
284+
cheese.melt(id_vars=['first', 'last'])
285+
cheese.melt(id_vars=['first', 'last'], var_name='quantity')
286286
287-
Another way to transform is to use the ``wide_to_long`` panel data convenience function.
287+
Another way to transform is to use the ``wide_to_long`` panel data convenience
288+
function.
288289

289290
.. ipython:: python
290291

doc/source/whatsnew/v0.20.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -324,6 +324,7 @@ Other Enhancements
324324
- ``Series.sort_index`` accepts parameters ``kind`` and ``na_position`` (:issue:`13589`, :issue:`14444`)
325325

326326
- ``DataFrame`` has gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`).
327+
- ``DataFrame`` has gained a ``melt()`` method, equivalent to ``pd.melt()``, for unpivoting from a wide to long format (:issue:`12640`).
327328
- ``DataFrame.groupby()`` has gained a ``.nunique()`` method to count the distinct values for all columns within each group (:issue:`14336`, :issue:`15197`).
328329

329330
- ``pd.read_excel()`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

pandas/core/frame.py

+104
Original file line numberDiff line numberDiff line change
@@ -4051,6 +4051,110 @@ def unstack(self, level=-1, fill_value=None):
40514051
from pandas.core.reshape import unstack
40524052
return unstack(self, level, fill_value)
40534053

4054+
_shared_docs['melt'] = ("""
4055+
"Unpivots" a DataFrame from wide format to long format, optionally
4056+
leaving identifier variables set.
4057+
4058+
This function is useful to massage a DataFrame into a format where one
4059+
or more columns are identifier variables (`id_vars`), while all other
4060+
columns, considered measured variables (`value_vars`), are "unpivoted" to
4061+
the row axis, leaving just two non-identifier columns, 'variable' and
4062+
'value'.
4063+
4064+
%(versionadded)s
4065+
Parameters
4066+
----------
4067+
frame : DataFrame
4068+
id_vars : tuple, list, or ndarray, optional
4069+
Column(s) to use as identifier variables.
4070+
value_vars : tuple, list, or ndarray, optional
4071+
Column(s) to unpivot. If not specified, uses all columns that
4072+
are not set as `id_vars`.
4073+
var_name : scalar
4074+
Name to use for the 'variable' column. If None it uses
4075+
``frame.columns.name`` or 'variable'.
4076+
value_name : scalar, default 'value'
4077+
Name to use for the 'value' column.
4078+
col_level : int or string, optional
4079+
If columns are a MultiIndex then use this level to melt.
4080+
4081+
See also
4082+
--------
4083+
%(other)s
4084+
pivot_table
4085+
DataFrame.pivot
4086+
4087+
Examples
4088+
--------
4089+
>>> import pandas as pd
4090+
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
4091+
... 'B': {0: 1, 1: 3, 2: 5},
4092+
... 'C': {0: 2, 1: 4, 2: 6}})
4093+
>>> df
4094+
A B C
4095+
0 a 1 2
4096+
1 b 3 4
4097+
2 c 5 6
4098+
4099+
>>> %(caller)sid_vars=['A'], value_vars=['B'])
4100+
A variable value
4101+
0 a B 1
4102+
1 b B 3
4103+
2 c B 5
4104+
4105+
>>> %(caller)sid_vars=['A'], value_vars=['B', 'C'])
4106+
A variable value
4107+
0 a B 1
4108+
1 b B 3
4109+
2 c B 5
4110+
3 a C 2
4111+
4 b C 4
4112+
5 c C 6
4113+
4114+
The names of 'variable' and 'value' columns can be customized:
4115+
4116+
>>> %(caller)sid_vars=['A'], value_vars=['B'],
4117+
... var_name='myVarname', value_name='myValname')
4118+
A myVarname myValname
4119+
0 a B 1
4120+
1 b B 3
4121+
2 c B 5
4122+
4123+
If you have multi-index columns:
4124+
4125+
>>> df.columns = [list('ABC'), list('DEF')]
4126+
>>> df
4127+
A B C
4128+
D E F
4129+
0 a 1 2
4130+
1 b 3 4
4131+
2 c 5 6
4132+
4133+
>>> %(caller)scol_level=0, id_vars=['A'], value_vars=['B'])
4134+
A variable value
4135+
0 a B 1
4136+
1 b B 3
4137+
2 c B 5
4138+
4139+
>>> %(caller)sid_vars=[('A', 'D')], value_vars=[('B', 'E')])
4140+
(A, D) variable_0 variable_1 value
4141+
0 a B E 1
4142+
1 b B E 3
4143+
2 c B E 5
4144+
4145+
""")
4146+
4147+
@Appender(_shared_docs['melt'] %
4148+
dict(caller='df.melt(',
4149+
versionadded='.. versionadded:: 0.20.0\n',
4150+
other='melt'))
4151+
def melt(self, id_vars=None, value_vars=None, var_name=None,
4152+
value_name='value', col_level=None):
4153+
from pandas.core.reshape import melt
4154+
return melt(self, id_vars=id_vars, value_vars=value_vars,
4155+
var_name=var_name, value_name=value_name,
4156+
col_level=col_level)
4157+
40544158
# ----------------------------------------------------------------------
40554159
# Time series-related
40564160

pandas/core/reshape.py

+6-90
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@
2828
import pandas.core.algorithms as algos
2929
from pandas._libs import algos as _algos, reshape as _reshape
3030

31+
from pandas.core.frame import _shared_docs
32+
from pandas.util.decorators import Appender
3133
from pandas.core.index import MultiIndex, _get_na_value
3234

3335

@@ -701,98 +703,12 @@ def _convert_level_number(level_num, columns):
701703
return result
702704

703705

706+
@Appender(_shared_docs['melt'] %
707+
dict(caller='pd.melt(df, ',
708+
versionadded="",
709+
other='DataFrame.melt'))
704710
def melt(frame, id_vars=None, value_vars=None, var_name=None,
705711
value_name='value', col_level=None):
706-
"""
707-
"Unpivots" a DataFrame from wide format to long format, optionally leaving
708-
identifier variables set.
709-
710-
This function is useful to massage a DataFrame into a format where one
711-
or more columns are identifier variables (`id_vars`), while all other
712-
columns, considered measured variables (`value_vars`), are "unpivoted" to
713-
the row axis, leaving just two non-identifier columns, 'variable' and
714-
'value'.
715-
716-
Parameters
717-
----------
718-
frame : DataFrame
719-
id_vars : tuple, list, or ndarray, optional
720-
Column(s) to use as identifier variables.
721-
value_vars : tuple, list, or ndarray, optional
722-
Column(s) to unpivot. If not specified, uses all columns that
723-
are not set as `id_vars`.
724-
var_name : scalar
725-
Name to use for the 'variable' column. If None it uses
726-
``frame.columns.name`` or 'variable'.
727-
value_name : scalar, default 'value'
728-
Name to use for the 'value' column.
729-
col_level : int or string, optional
730-
If columns are a MultiIndex then use this level to melt.
731-
732-
See also
733-
--------
734-
pivot_table
735-
DataFrame.pivot
736-
737-
Examples
738-
--------
739-
>>> import pandas as pd
740-
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
741-
... 'B': {0: 1, 1: 3, 2: 5},
742-
... 'C': {0: 2, 1: 4, 2: 6}})
743-
>>> df
744-
A B C
745-
0 a 1 2
746-
1 b 3 4
747-
2 c 5 6
748-
749-
>>> pd.melt(df, id_vars=['A'], value_vars=['B'])
750-
A variable value
751-
0 a B 1
752-
1 b B 3
753-
2 c B 5
754-
755-
>>> pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
756-
A variable value
757-
0 a B 1
758-
1 b B 3
759-
2 c B 5
760-
3 a C 2
761-
4 b C 4
762-
5 c C 6
763-
764-
The names of 'variable' and 'value' columns can be customized:
765-
766-
>>> pd.melt(df, id_vars=['A'], value_vars=['B'],
767-
... var_name='myVarname', value_name='myValname')
768-
A myVarname myValname
769-
0 a B 1
770-
1 b B 3
771-
2 c B 5
772-
773-
If you have multi-index columns:
774-
775-
>>> df.columns = [list('ABC'), list('DEF')]
776-
>>> df
777-
A B C
778-
D E F
779-
0 a 1 2
780-
1 b 3 4
781-
2 c 5 6
782-
783-
>>> pd.melt(df, col_level=0, id_vars=['A'], value_vars=['B'])
784-
A variable value
785-
0 a B 1
786-
1 b B 3
787-
2 c B 5
788-
789-
>>> pd.melt(df, id_vars=[('A', 'D')], value_vars=[('B', 'E')])
790-
(A, D) variable_0 variable_1 value
791-
0 a B E 1
792-
1 b B E 3
793-
2 c B E 5
794-
795-
"""
796712
# TODO: what about the existing index?
797713
if id_vars is not None:
798714
if not is_list_like(id_vars):

0 commit comments

Comments
 (0)