Skip to content

ENH: Rename multi-level columns or indices using their tupelized names #38069

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
normanius opened this issue Nov 25, 2020 · 3 comments
Closed
Labels
Duplicate Report Duplicate issue or pull request Enhancement MultiIndex

Comments

@normanius
Copy link

It's currently quite difficult to rename a single column (or row) in a multi-indexed DataFrame. The problem is illustrated also in this SO article.

The current approach requires at least 3 steps:

df = pd.DataFrame([[1,2,3],[3,4,5],[5,6,7], [7,8,9]])
df.columns = pd.MultiIndex.from_tuples([('i','a'),('i','b'),('ii','a')])

# Alternative 1
i = df.columns.get_loc(('i','b'))
cols = df.columns.values
cols[i] = ('i','c')
df.columns = pd.MultiIndex.from_tuples(cols)

# Alternative 2
df.columns = df.columns.to_flat_index()
df = df.rename(columns={('i','b'):('i','c')})
df.columns = pd.MultiIndex.from_tuples(df.columns)

Enhancement:

In presence of a MultiIndex, rename() should support the tupelized representation of a column.

df = pd.DataFrame([[1,2,3],[3,4,5],[5,6,7], [7,8,9]])
df.columns = pd.MultiIndex.from_tuples([('i','a'),('i','b'),('ii','a')])

# Rename using tuples:
df.rename(columns={('i','b'):('ii','b')})
# Rename using current column name (equivalent to 1a)
col = df[('i','b')]
df.rename(columns={col.name: ('ii','b')})

This would improve the consistency of the API, since many methods accept a tuple to address the name of a multi-index element:

df[('i','a')]
df.drop(('i','a'), axis=1)
df.pop(('i','a'), axis=1)
...

API breaking implications

I regard this improvement as an extension of the current API.

@normanius normanius added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 25, 2020
@jreback
Copy link
Contributor

jreback commented Nov 25, 2020

duplicate of #20421

you can actually just do this though: https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Index.set_names.html?highlight=set_names

df.index = df.index.set_names(...)

@jreback jreback closed this as completed Nov 25, 2020
@jreback jreback added MultiIndex Duplicate Report Duplicate issue or pull request and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 25, 2020
@jreback jreback added this to the No action milestone Nov 25, 2020
@normanius
Copy link
Author

normanius commented Nov 25, 2020

@jreback Is it possible that you misinterpreted my question? It's not about names of index levels. It's about the indices (=name of column). I don't see how the below could be done with set_names nor how this is related to #20421.

# Input:
   i    ii
   a  b  a
0  1  2  3
1  3  4  5
2  5  6  7
3  7  8  9

# Output: same df, except that one column has different index
   i    ii
   a  c  a
0  1  2  3
1  3  4  5
2  5  6  7
3  7  8  9

@jreback
Copy link
Contributor

jreback commented Nov 25, 2020

sorry wrong issue, see #4160 which is a long time open issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Enhancement MultiIndex
Projects
None yet
Development

No branches or pull requests

2 participants