BUG: Pandas column rename function now working for multilevel columns #55169

DavidKingGH · 2023-09-16T15:25:11Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# Create a DataFrame with multi-level columns
data = {('Apple Inc.', 'abstract'): [1, 2], ('Apple Inc.', 'web_url'): [3, 4],
        ('Adobe Inc.', 'abstract'): [5, 6], ('Adobe Inc.', 'web_url'): [7, 8]}

test_df = pd.DataFrame(data)

# It will look something like this:
# (Apple Inc., abstract) (Apple Inc., web_url) (Adobe Inc., abstract) (Adobe Inc., web_url)         
# 1                               3                      5                      7
# 2                               4                      6                      8

# Create a DataFrame with company-ticker mapping
NASDAQ_Ticker = pd.DataFrame({'Company': ['Apple Inc.', 'Adobe Inc.'],
                              'Ticker': ['AAPL', 'ADBE']})

def company_to_ticker_index(df):
    new_columns = {}
    
    for item in df.columns:
        # Directly unpack the tuple into variables
        company_name, label = item
        
        # Find the corresponding ticker symbol for the company
        ticker = NASDAQ_Ticker.loc[NASDAQ_Ticker['Company'] == company_name]['Ticker'].squeeze()
        
        # Create the new column label
        new_label = (ticker, label)
        
        # Add the new label to the dictionary
        new_columns[item] = new_label
    
    # Rename the columns using the dictionary
    df.rename(columns=new_columns, inplace=True)

# Test the function
company_to_ticker_index(test_df)

# Print the new column names to check
print(test_df.columns)

Issue Description

Here I attempt to rename the columns, which should now be tuples with ticker symbols instead of company names. However, the resulting dataframe still unexpectedly reflects the company labels.

Expected Behavior

Relabeling of the dataframe multi-index columns form (company, X) to (ticker, X).

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2e218d1 python : 3.9.13.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22000 machine : AMD64 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252

pandas : 1.5.3
numpy : 1.24.3
pytz : 2022.7
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.2.1
Cython : None
pytest : 7.4.0
hypothesis : None
sphinx : 5.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.12.0
pandas_datareader: 0.10.0
bs4 : 4.12.2
bottleneck : 1.3.5
brotli :
fastparquet : None
fsspec : 2023.4.0
gcsfs : None
matplotlib : 3.7.1
numba : 0.57.1
numexpr : 2.8.4
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2023.4.0
scipy : 1.10.1
snappy :
sqlalchemy : 1.4.39
tables : 3.8.0
tabulate : 0.8.10
xarray : 2023.6.0
xlrd : None
xlwt : None
zstandard : 0.19.0
tzdata : 2023.3

The text was updated successfully, but these errors were encountered:

miltonsin345 · 2023-09-17T05:23:36Z

That's messed

hedeershowk · 2023-09-18T02:36:31Z

I think you just need to do more like df.rename(columns={'Apple Inc.': 'AAPL'}). You don't need the tuple there. See this stack overflow reply for more details.

nickzoic · 2023-10-18T00:54:06Z

Yeah, I've found the same thing, I think, and this is maybe an easier demonstration:

import pandas as pd

df1 = pd.DataFrame([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]).groupby('a').agg({'b': ('count', 'sum')})

print("DF1:")
print(df1)

df2 = df1.rename(columns={('b', 'count'): 'fnord'}, errors='raise')

print("DF2:")
print(df2)

df1.rename(columns={('b', 'fnord'): 'c'}, errors='raise')

produces the output:

DF1:
      b    
  count sum
a          
1     1   2
3     1   4
DF2:
      b    
  count sum
a          
1     1   2
3     1   4
Traceback (most recent call last):
  File "/home/nick/Work/wehi/pandas_test/test2.py", line 13, in <module>
    df1.rename(columns={('b'): 'c'}, errors='raise')
  File "/home/nick/Work/wehi/pandas_test/.direnv/python-3.10.7/lib/python3.10/site-packages/pandas/core/frame.py", line 5640, in rename
    return super()._rename(
  File "/home/nick/Work/wehi/pandas_test/.direnv/python-3.10.7/lib/python3.10/site-packages/pandas/core/generic.py", line 1090, in _rename
    raise KeyError(f"{missing_labels} not found in axis")
KeyError: "[('b', 'fnord')] not found in axis"

You can see that df2 is unchanged from df1.

It isn't just that the column ('b', 'count') isn't found as I've set errors='raise' and if you try renaming some other column combination eg: ('b', 'fnord') it raises a KeyError.

Seems to do the same in 2.0.3, 2.1.1 and e0d6051

nickzoic · 2023-10-18T02:40:33Z

OK looking at the source code a piece of the puzzle falls into place:

the checking of the allowed values is done by pandas.core.generic._rename
- the check is done only if errors == "raise" (see ENH: add errors='raise' option to rename #13473)
- this checks for the whole index value tuple's existence, not the individual levels.
the transforming of the index values is done by pandas.core.indexes.base._transform_index.
- this substitutes each part of the index value tuple on whichever level
  - unless level is not None, in which case only on that level)
- this works fine if errors != "raise".

So these are incompatible, and in the case where errors == "raise" you can't rename multi level indexes.
Either the checking should be fixed or the transforming should be changed.

The former is probably less problematic (even though it doesn't solve my problem[1]) as people will have used the rename-on-every-level behaviour without errors == "raise" and changing this would break stuff.

[1] ... which is better solved by to_flat_index ...

TabLand · 2024-01-16T22:31:45Z

I was stung by this issue earlier today. I was able to workaround my issue by exporting to a dict, performing my column renames there and then creating a new DataFrame. Whilst trying to rename a MultiIndex column using a tuple feels intuitive, it introduces additional complexity in terms of adding or removing levels... I also experienced some circumstances where a tuple key is treated as an individual column name by pandas (e.g when all sub and super column names are unique and / or rows are not indexed - which makes sense).

From what I currently understand of the complexity, I agree that it makes sense to have the checking code consistently match the actual behaviour of the transformation code, and perhaps improving the documentation to clarify that all columns including sub-columns are renamed individually.

I will try to draft a PR within the next week...

TabLand · 2024-02-19T21:53:25Z

Hi Team,
Could someone review & test the patch provided in #56936?

DavidKingGH added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 16, 2023

TabLand mentioned this issue Jan 18, 2024

BUG: Improve MultiIndex label rename checks, docs and tests #56936

Closed

5 tasks

simonjayhawkins added the MultiIndex label Feb 7, 2024

TabLand mentioned this issue Mar 29, 2024

Improve MultiIndex label rename checks #58082

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Pandas column rename function now working for multilevel columns #55169

BUG: Pandas column rename function now working for multilevel columns #55169

DavidKingGH commented Sep 16, 2023

miltonsin345 commented Sep 17, 2023

hedeershowk commented Sep 18, 2023

nickzoic commented Oct 18, 2023

nickzoic commented Oct 18, 2023 •

edited

Loading

TabLand commented Jan 16, 2024 •

edited

Loading

TabLand commented Feb 19, 2024

BUG: Pandas column rename function now working for multilevel columns #55169

BUG: Pandas column rename function now working for multilevel columns #55169

Comments

DavidKingGH commented Sep 16, 2023

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

miltonsin345 commented Sep 17, 2023

hedeershowk commented Sep 18, 2023

nickzoic commented Oct 18, 2023

nickzoic commented Oct 18, 2023 • edited Loading

TabLand commented Jan 16, 2024 • edited Loading

TabLand commented Feb 19, 2024

nickzoic commented Oct 18, 2023 •

edited

Loading

TabLand commented Jan 16, 2024 •

edited

Loading