Skip to content

ENH: DataFrame sort columns by rows: sort_values(axis=1) #13622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/source/whatsnew/v0.19.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,13 @@ Other enhancements
- ``pd.read_html()`` has gained support for the ``decimal`` option (:issue:`12907`)
- A top-level function :func:`union_categorical` has been added for combining categoricals, see :ref:`Unioning Categoricals<categorical.union>` (:issue:`13361`)
- ``Series`` has gained the properties ``.is_monotonic``, ``.is_monotonic_increasing``, ``.is_monotonic_decreasing``, similar to ``Index`` (:issue:`13336`)
- ``DataFrame`` has gained support to re-order the columns based on the values in a row using ``df.sort_values(by='index_label', axis=1)`` (:issue:`10806`)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just this sentence is fine, but use: df.sort_values(by=...., axis=1)

.. ipython:: python

df = pd.DataFrame({'A': [2, 7], 'B': [3, 5], 'C': [4, 8]},
index=['row1', 'row2'])
df.sort_values(by='row2', axis=1)

.. _whatsnew_0190.api:

Expand Down
7 changes: 3 additions & 4 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3127,9 +3127,8 @@ def sort_values(self, by, axis=0, ascending=True, inplace=False,
kind='quicksort', na_position='last'):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the .sort_values doc-string to look like others, e.g. look at sortlevel (for the axis arg)

axis = self._get_axis_number(axis)
other_axis = 0 if axis == 1 else 1

if axis != 0:
raise ValueError('When sorting by column, axis must be 0 (rows)')
if not isinstance(by, list):
by = [by]
if com.is_sequence(ascending) and len(by) != len(ascending):
Expand All @@ -3145,7 +3144,7 @@ def trans(v):

keys = []
for x in by:
k = self[x].values
k = self.xs(x, axis=other_axis).values
if k.ndim == 2:
raise ValueError('Cannot sort by duplicate column %s' %
str(x))
Expand All @@ -3157,7 +3156,7 @@ def trans(v):
from pandas.core.groupby import _nargsort

by = by[0]
k = self[by].values
k = self.xs(by, axis=other_axis).values
if k.ndim == 2:

# try to be helpful
Expand Down
34 changes: 30 additions & 4 deletions pandas/tests/frame/test_sorting.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ def test_sort_values(self):
frame = DataFrame([[1, 1, 2], [3, 1, 0], [4, 5, 6]],
index=[1, 2, 3], columns=list('ABC'))

# by column
# by column (axis=0)
sorted_df = frame.sort_values(by='A')
indexer = frame['A'].argsort().values
expected = frame.ix[frame.index[indexer]]
Expand Down Expand Up @@ -116,9 +116,26 @@ def test_sort_values(self):
self.assertRaises(ValueError, lambda: frame.sort_values(
by=['A', 'B'], axis=2, inplace=True))

msg = 'When sorting by column, axis must be 0'
with assertRaisesRegexp(ValueError, msg):
frame.sort_values(by='A', axis=1)
# by row (axis=1): GH 10806
sorted_df = frame.sort_values(by=3, axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw in an axis='columns' instead of one of the axis=1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that the string columns should be supported as a valid axis value? That's neat. Can you point me to where that's documented though? I can't find reference to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what _get_axis_number(..) does

the doc-string should be something like:

axis : {index (0), columns (1)}

which we programatically generate (this is from DataFrame.min for example), look in generic.py to see how

expected = frame
assert_frame_equal(sorted_df, expected)

sorted_df = frame.sort_values(by=3, axis=1, ascending=False)
expected = frame.reindex(columns=['C', 'B', 'A'])
assert_frame_equal(sorted_df, expected)

sorted_df = frame.sort_values(by=[1, 2], axis='columns')
expected = frame.reindex(columns=['B', 'A', 'C'])
assert_frame_equal(sorted_df, expected)

sorted_df = frame.sort_values(by=[1, 3], axis=1,
ascending=[True, False])
assert_frame_equal(sorted_df, expected)

sorted_df = frame.sort_values(by=[1, 3], axis=1, ascending=False)
expected = frame.reindex(columns=['C', 'B', 'A'])
assert_frame_equal(sorted_df, expected)

msg = r'Length of ascending \(5\) != length of by \(2\)'
with assertRaisesRegexp(ValueError, msg):
Expand All @@ -133,6 +150,11 @@ def test_sort_values_inplace(self):
expected = frame.sort_values(by='A')
assert_frame_equal(sorted_df, expected)

sorted_df = frame.copy()
sorted_df.sort_values(by=1, axis=1, inplace=True)
expected = frame.sort_values(by=1, axis=1)
assert_frame_equal(sorted_df, expected)

sorted_df = frame.copy()
sorted_df.sort_values(by='A', ascending=False, inplace=True)
expected = frame.sort_values(by='A', ascending=False)
Expand Down Expand Up @@ -179,6 +201,10 @@ def test_sort_nan(self):
sorted_df = df.sort_values(['A'], na_position='first', ascending=False)
assert_frame_equal(sorted_df, expected)

expected = df.reindex(columns=['B', 'A'])
sorted_df = df.sort_values(by=1, axis=1, na_position='first')
assert_frame_equal(sorted_df, expected)

# na_position='last', order
expected = DataFrame(
{'A': [1, 1, 2, 4, 6, 8, nan],
Expand Down