Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
from pandas import DataFrame
from pandas.testing import assert_frame_equal
df_1 = DataFrame(
{
"name": ["John", "Anna", "Peter", "Linda"],
"surname": ["Smith", "Jones", "Brown", "Wilson"],
"age": [23, 36, 33, 26],
"city": ["New York", "Paris", "Berlin", "London"],
}
).set_index(["name", "surname"])
df_surnames = (
df_1.reset_index()
.assign(surname=["A", "B", "C", "D"])
.set_index(["name", "surname"])
)
try:
assert_frame_equal(df_1, df_surnames, obj="df_surnames")
except AssertionError as e:
print(f"AssertionError: {e}")
# AssertionError: MultiIndex level [1] are different <== the obj parameter is ignored!
# I prepared a repo for this so here's the link: https://github.com/serl/pandas-assert-index-bug/blob/main/main.py
Issue Description
assert_frame_equal
has the obj
argument which we use and abuse to understand where's the problem in our test suite (more info: we have dozens of dataframes in a "data store" object, and we implemented a little function to compare two stores; the obj
is very very useful to point to the faulty dataframe).
So for example, in the example above, if we had a difference in the cities
column, assert_frame_equal(df_1, df_cities, obj="df_cities")
would say df_cities.iloc[:, 1] (column name="city") are different
.
Problem is: for MultiIndex, the obj
is not passed down and we have MultiIndex level [1] are different
.
Expected Behavior
Having obj
somewhere in the exception message. Maybe df_surnames.index[1] are different
?
The problem seems to be here. I would love to prepare a little PR to tackle this if we agree on a solution
Installed Versions
INSTALLED VERSIONS
commit : d9cdd2e
python : 3.12.4.final.0
python-bits : 64
OS : Darwin
OS-release : 23.5.0
Version : Darwin Kernel Version 23.5.0: Wed May 1 20:16:51 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T8103
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.2
numpy : 2.0.1
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : None
pip : 24.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None