Skip to content

ENH: allow index of col names in set_index GH10797 #11944

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.18.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ Other enhancements
- ``DataFrame`` has gained a ``_repr_latex_`` method in order to allow for automatic conversion to latex in a ipython/jupyter notebook using nbconvert. Options ``display.latex.escape`` and ``display.latex.longtable`` have been added to the configuration and are used automatically by the ``to_latex`` method. (:issue:`11778`)
- ``sys.getsizeof(obj)`` returns the memory usage of a pandas object, including the
values it contains (:issue:`11597`)
- ``set_index`` now interprets views of the columns index passed to the keys parameter as lists of existing columns to use as the index (:issue:`10797`)

.. _whatsnew_0180.enhancements.rounding:

Expand Down
24 changes: 21 additions & 3 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -2727,11 +2727,16 @@ def set_index(self, keys, drop=True, append=False, inplace=False,
verify_integrity=False):
"""
Set the DataFrame index (row labels) using one or more existing
columns. By default yields a new object.
columns and/or new arrays of values. By default yields a new object.

Parameters
----------
keys : column label or list of column labels / arrays
keys : column label (str), Index, Series, array, or a list of these things
Existing columns to set as the index (when given columns labels)
and/or new values to set as new index values. If an Index is given,
it will be used as a new index unless it is a view of the column
index, in which case it will be interpreted as a set of existing
columns to set as the index.
drop : boolean, default True
Delete columns to be used as the new index
append : boolean, default False
Expand All @@ -2748,13 +2753,26 @@ def set_index(self, keys, drop=True, append=False, inplace=False,
>>> indexed_df = df.set_index(['A', 'B'])
>>> indexed_df2 = df.set_index(['A', [0, 1, 2, 0, 1, 2]])
>>> indexed_df3 = df.set_index([[0, 1, 2, 0, 1, 2]])
>>> indexed_df4 = df.set_index(df.columns[:2])

Returns
-------
dataframe : DataFrame
"""
if not isinstance(keys, list):
keys = [keys]
if isinstance(keys, Index):
# if the index is a slice of the column index, treat it like
# a list of column labels; otherwise, treat it like a new index
keys_base = keys.base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a reliable approach, nor should you be using the private .base.

I am not convinced we can do this in a reliable way.

while isinstance(keys_base, Index):
keys_base = keys_base.base
cols_base = self.columns.base
while isinstance(cols_base, Index):
cols_base = cols_base.base
if keys_base is not cols_base:
keys = [keys]
else:
keys = [keys]

if inplace:
frame = self
Expand Down
36 changes: 36 additions & 0 deletions pandas/tests/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -2583,6 +2583,42 @@ def test_set_index_empty_column(self):
result = df.set_index(['a', 'x'])
repr(result)

def test_set_index_with_index(self):
# GH10797: It should be possible to use a slice of the column index as
# the `keys` parameter in set_index().

# Test that setting the first two columns as the index can be done
# either with a list of column labels or a slice of the column index.
df = DataFrame({'col1': [1, 2, 3, 4, 5, 6],
'col2': ['a', 'b', 'c', 'a', 'b', 'c'],
'col3': [0.0, 0.0, 1.0, 1.0, 2.0, 2.0]})
expected_index = MultiIndex(levels=[['a', 'b', 'c'], [0.0, 1.0, 2.0]],
labels=[[0, 1, 2, 0, 1, 2],
[0, 0, 1, 1, 2, 2]],
names=['col2', 'col3'])
expected_df = DataFrame(data={'col1': [1, 2, 3, 4, 5, 6]},
index=expected_index)
list_df = df.set_index(['col2', 'col3'])
assert_frame_equal(expected_df, list_df)
index_df = df.set_index(df.columns[1:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so add some tests here where this could fail.

assert_frame_equal(expected_df, index_df)

# Test that passing the entire index results in an empty dataframe (i.e.
# all columns become part of the index).
empty_df = df.set_index(df.columns)
assert_equal(len(empty_df.columns), 0)
assert_equal(empty_df.index.nlevels, 3)

# Test that an index that is created independently of the column index
# is used as a new index - not as a set of column labels.
new_index = Index(data=['col1', 'col1', 'col2', 'col2', 'col3', 'col3'])
expected_df2 = DataFrame({'col1': [1, 2, 3, 4, 5, 6],
'col2': ['a', 'b', 'c', 'a', 'b', 'c'],
'col3': [0.0, 0.0, 1.0, 1.0, 2.0, 2.0]},
index=new_index)
col_name_index_df = df.set_index(new_index)
assert_frame_equal(expected_df2, col_name_index_df)

def test_set_columns(self):
cols = Index(np.arange(len(self.mixed_frame.columns)))
self.mixed_frame.columns = cols
Expand Down