Skip to content

BUG: Pandas column rename function now working for multilevel columns #55169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
DavidKingGH opened this issue Sep 16, 2023 · 6 comments
Open
2 of 3 tasks
Labels
Bug MultiIndex Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@DavidKingGH
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# Create a DataFrame with multi-level columns
data = {('Apple Inc.', 'abstract'): [1, 2], ('Apple Inc.', 'web_url'): [3, 4],
        ('Adobe Inc.', 'abstract'): [5, 6], ('Adobe Inc.', 'web_url'): [7, 8]}

test_df = pd.DataFrame(data)

# It will look something like this:
# (Apple Inc., abstract) (Apple Inc., web_url) (Adobe Inc., abstract) (Adobe Inc., web_url)         
# 1                               3                      5                      7
# 2                               4                      6                      8

# Create a DataFrame with company-ticker mapping
NASDAQ_Ticker = pd.DataFrame({'Company': ['Apple Inc.', 'Adobe Inc.'],
                              'Ticker': ['AAPL', 'ADBE']})

def company_to_ticker_index(df):
    new_columns = {}
    
    for item in df.columns:
        # Directly unpack the tuple into variables
        company_name, label = item
        
        # Find the corresponding ticker symbol for the company
        ticker = NASDAQ_Ticker.loc[NASDAQ_Ticker['Company'] == company_name]['Ticker'].squeeze()
        
        # Create the new column label
        new_label = (ticker, label)
        
        # Add the new label to the dictionary
        new_columns[item] = new_label
    
    # Rename the columns using the dictionary
    df.rename(columns=new_columns, inplace=True)

# Test the function
company_to_ticker_index(test_df)

# Print the new column names to check
print(test_df.columns)

Issue Description

Here I attempt to rename the columns, which should now be tuples with ticker symbols instead of company names. However, the resulting dataframe still unexpectedly reflects the company labels.

Expected Behavior

Relabeling of the dataframe multi-index columns form (company, X) to (ticker, X).

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2e218d1 python : 3.9.13.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22000 machine : AMD64 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252

pandas : 1.5.3
numpy : 1.24.3
pytz : 2022.7
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.2.1
Cython : None
pytest : 7.4.0
hypothesis : None
sphinx : 5.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.12.0
pandas_datareader: 0.10.0
bs4 : 4.12.2
bottleneck : 1.3.5
brotli :
fastparquet : None
fsspec : 2023.4.0
gcsfs : None
matplotlib : 3.7.1
numba : 0.57.1
numexpr : 2.8.4
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2023.4.0
scipy : 1.10.1
snappy :
sqlalchemy : 1.4.39
tables : 3.8.0
tabulate : 0.8.10
xarray : 2023.6.0
xlrd : None
xlwt : None
zstandard : 0.19.0
tzdata : 2023.3

@DavidKingGH DavidKingGH added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 16, 2023
@miltonsin345
Copy link

That's messed

@hedeershowk
Copy link
Contributor

I think you just need to do more like df.rename(columns={'Apple Inc.': 'AAPL'}). You don't need the tuple there. See this stack overflow reply for more details.

@nickzoic
Copy link

Yeah, I've found the same thing, I think, and this is maybe an easier demonstration:

import pandas as pd

df1 = pd.DataFrame([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]).groupby('a').agg({'b': ('count', 'sum')})

print("DF1:")
print(df1)

df2 = df1.rename(columns={('b', 'count'): 'fnord'}, errors='raise')

print("DF2:")
print(df2)

df1.rename(columns={('b', 'fnord'): 'c'}, errors='raise')

produces the output:

DF1:
      b    
  count sum
a          
1     1   2
3     1   4
DF2:
      b    
  count sum
a          
1     1   2
3     1   4
Traceback (most recent call last):
  File "/home/nick/Work/wehi/pandas_test/test2.py", line 13, in <module>
    df1.rename(columns={('b'): 'c'}, errors='raise')
  File "/home/nick/Work/wehi/pandas_test/.direnv/python-3.10.7/lib/python3.10/site-packages/pandas/core/frame.py", line 5640, in rename
    return super()._rename(
  File "/home/nick/Work/wehi/pandas_test/.direnv/python-3.10.7/lib/python3.10/site-packages/pandas/core/generic.py", line 1090, in _rename
    raise KeyError(f"{missing_labels} not found in axis")
KeyError: "[('b', 'fnord')] not found in axis"

You can see that df2 is unchanged from df1.

It isn't just that the column ('b', 'count') isn't found as I've set errors='raise' and if you try renaming some other column combination eg: ('b', 'fnord') it raises a KeyError.

Seems to do the same in 2.0.3, 2.1.1 and e0d6051

@nickzoic
Copy link

nickzoic commented Oct 18, 2023

OK looking at the source code a piece of the puzzle falls into place:

  • the checking of the allowed values is done by pandas.core.generic._rename

  • the transforming of the index values is done by pandas.core.indexes.base._transform_index.

    • this substitutes each part of the index value tuple on whichever level
      • unless level is not None, in which case only on that level)
    • this works fine if errors != "raise".

So these are incompatible, and in the case where errors == "raise" you can't rename multi level indexes.
Either the checking should be fixed or the transforming should be changed.

The former is probably less problematic (even though it doesn't solve my problem[1]) as people will have used the rename-on-every-level behaviour without errors == "raise" and changing this would break stuff.

[1] ... which is better solved by to_flat_index ...

@TabLand
Copy link

TabLand commented Jan 16, 2024

I was stung by this issue earlier today. I was able to workaround my issue by exporting to a dict, performing my column renames there and then creating a new DataFrame. Whilst trying to rename a MultiIndex column using a tuple feels intuitive, it introduces additional complexity in terms of adding or removing levels... I also experienced some circumstances where a tuple key is treated as an individual column name by pandas (e.g when all sub and super column names are unique and / or rows are not indexed - which makes sense).

From what I currently understand of the complexity, I agree that it makes sense to have the checking code consistently match the actual behaviour of the transformation code, and perhaps improving the documentation to clarify that all columns including sub-columns are renamed individually.

I will try to draft a PR within the next week...

@TabLand
Copy link

TabLand commented Feb 19, 2024

Hi Team,
Could someone review & test the patch provided in #56936?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug MultiIndex Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
6 participants