Skip to content

ENH: DataFrame sort columns by rows: sort_values(axis=1) #13622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.19.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,7 @@ Other enhancements
- ``pd.read_html()`` has gained support for the ``decimal`` option (:issue:`12907`)
- A top-level function :func:`union_categorical` has been added for combining categoricals, see :ref:`Unioning Categoricals<categorical.union>` (:issue:`13361`)
- ``Series`` has gained the properties ``.is_monotonic``, ``.is_monotonic_increasing``, ``.is_monotonic_decreasing``, similar to ``Index`` (:issue:`13336`)
- ``DataFrame.sort_values()`` has gained `axis=1` support to re-order columns with `by=index_label` (:issue:`10806`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double-backquotes around axis=1 and by=index_label (though I am not entirely sure what that means). you can say by= I think the index_label is confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by=index_label is straight from your issue title: #10806 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, yeah it is. I don't even remember what it is! hah, and a user certainly won't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double backticks I understand. As for re-wording with respect to by, my previous commit did not specify this but you said that was unclear too. Could you please specify how you'd like this line to read?

Copy link
Member

@jorisvandenbossche jorisvandenbossche Jul 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback What do you think of ".. support to re-order the columns based on the values in a row using df.sort_values(by='index_label', axis=1))"

Could also add a small example


.. _whatsnew_0190.api:

Expand Down
7 changes: 3 additions & 4 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3127,9 +3127,8 @@ def sort_values(self, by, axis=0, ascending=True, inplace=False,
kind='quicksort', na_position='last'):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the .sort_values doc-string to look like others, e.g. look at sortlevel (for the axis arg)

axis = self._get_axis_number(axis)
other_axis = 0 if axis == 1 else 1

if axis != 0:
raise ValueError('When sorting by column, axis must be 0 (rows)')
if not isinstance(by, list):
by = [by]
if com.is_sequence(ascending) and len(by) != len(ascending):
Expand All @@ -3145,7 +3144,7 @@ def trans(v):

keys = []
for x in by:
k = self[x].values
k = self.xs(x, axis=other_axis).values
if k.ndim == 2:
raise ValueError('Cannot sort by duplicate column %s' %
str(x))
Expand All @@ -3157,7 +3156,7 @@ def trans(v):
from pandas.core.groupby import _nargsort

by = by[0]
k = self[by].values
k = self.xs(by, axis=other_axis).values
if k.ndim == 2:

# try to be helpful
Expand Down
25 changes: 21 additions & 4 deletions pandas/tests/frame/test_sorting.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ def test_sort_values(self):
frame = DataFrame([[1, 1, 2], [3, 1, 0], [4, 5, 6]],
index=[1, 2, 3], columns=list('ABC'))

# by column
# by column (axis=0)
sorted_df = frame.sort_values(by='A')
indexer = frame['A'].argsort().values
expected = frame.ix[frame.index[indexer]]
Expand Down Expand Up @@ -116,9 +116,26 @@ def test_sort_values(self):
self.assertRaises(ValueError, lambda: frame.sort_values(
by=['A', 'B'], axis=2, inplace=True))

msg = 'When sorting by column, axis must be 0'
with assertRaisesRegexp(ValueError, msg):
frame.sort_values(by='A', axis=1)
# by row (axis=1): GH 10806
sorted_df = frame.sort_values(by=3, axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw in an axis='columns' instead of one of the axis=1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that the string columns should be supported as a valid axis value? That's neat. Can you point me to where that's documented though? I can't find reference to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what _get_axis_number(..) does

the doc-string should be something like:

axis : {index (0), columns (1)}

which we programatically generate (this is from DataFrame.min for example), look in generic.py to see how

expected = frame
assert_frame_equal(sorted_df, expected)

sorted_df = frame.sort_values(by=3, axis=1, ascending=False)
expected = frame.reindex(columns=['C', 'B', 'A'])
assert_frame_equal(sorted_df, expected)

sorted_df = frame.sort_values(by=[1, 2], axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test for the other options, na_position (also add some nas), and inplace.

expected = frame.reindex(columns=['B', 'A', 'C'])
assert_frame_equal(sorted_df, expected)

sorted_df = frame.sort_values(by=[1, 3], axis=1,
ascending=[True, False])
assert_frame_equal(sorted_df, expected)

sorted_df = frame.sort_values(by=[1, 3], axis=1, ascending=False)
expected = frame.reindex(columns=['C', 'B', 'A'])
assert_frame_equal(sorted_df, expected)

msg = r'Length of ascending \(5\) != length of by \(2\)'
with assertRaisesRegexp(ValueError, msg):
Expand Down