pd.testing.assert_frame_equal check_like not working like expected #22052

lassebenni · 2018-07-25T15:50:17Z

Code Sample

pd.testing.assert_frame_equal(
pd.DataFrame([{'filename':'a'}, {'filename': 'b'}]),
pd.DataFrame([{'filename':'b'}, {'filename': 'a'}]),
check_like=True)

AssertionError: DataFrame.iloc[:, 0] are different

DataFrame.iloc[:, 0] values are different (100.0 %)
[left]:  [a, b]
[right]: [b, a]

Problem description

According to the documentation (version 0.23.3) , pandas.testing.assert_frame_equal takes a "check_like" parameter which can be set to True if the function should ignore the order of columns & rows.

check_like : bool, default False
If true, ignore the order of rows & columns

This does not work as I expect it to. When creating two Dataframe using a dict, it asserts them as being different due to the order of the rows.

Expected Output

I would expect the test to pass.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.93-linuxkit-aufs
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.3
pytest: 3.6.2
pip: 9.0.1
setuptools: 33.1.1
Cython: None
numpy: 1.15.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2018-07-25T15:59:27Z

The order doesn't matter, but the same labels need to be with the same data. e.g.

In [31]: a = pd.DataFrame([{'filename':'a'}, {'filename': 'b'}], index=['a', 'b'])

In [32]: b = pd.DataFrame([{'filename':'b'}, {'filename': 'a'}], index=['b', 'a'])

In [33]: pd.util.testing.assert_frame_equal(a, b)

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-33-302d0eeab6d2> in <module>()
----> 1 pd.util.testing.assert_frame_equal(a, b)

~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in assert_frame_equal(left, right, check_dtype, check_index_type, check_column_type, check_frame_type, check_less_precise, check_names, by_blocks, check_exact, check_datetimelike_compat, check_categorical, check_like, obj)
   1330                        check_exact=check_exact,
   1331                        check_categorical=check_categorical,
-> 1332                        obj='{obj}.index'.format(obj=obj))
   1333
   1334     # column comparison

~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in assert_index_equal(left, right, exact, check_names, check_less_precise, check_exact, check_categorical, obj)
    858                                      check_less_precise=check_less_precise,
    859                                      check_dtype=exact,
--> 860                                      obj=obj, lobj=left, robj=right)
    861
    862     # metadata comparison

pandas/_libs/testing.pyx in pandas._libs.testing.assert_almost_equal()

pandas/_libs/testing.pyx in pandas._libs.testing.assert_almost_equal()

~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in raise_assert_detail(obj, message, left, right, diff)
   1033         msg += "\n[diff]: {diff}".format(diff=diff)
   1034
-> 1035     raise AssertionError(msg)
   1036
   1037

AssertionError: DataFrame.index are different

DataFrame.index values are different (100.0 %)
[left]:  Index(['a', 'b'], dtype='object')
[right]: Index(['b', 'a'], dtype='object')

But this passes

In [34]: pd.util.testing.assert_frame_equal(a, b, check_like=True)

TomAugspurger · 2018-07-25T16:00:07Z

@lassebenni could you make a PR updating the docstring of assert_frame_equal to clarify this?

lassebenni · 2018-07-26T14:09:38Z

@TomAugspurger

Thank you for the quick response!

I am not very familiar with Pandas Dataframes, so I missed the part of explicitly defining an index for the data.

My use case is as following: I have some code that applies transformations to Spark Dataframes. For testing purposes I want to compare the expected result to the actual result. For this I transform the Spark Dataframes to Pandas Dataframes: df.toPandas(). After which I want to compare the two: pd.testing.assert_frame_equal(expected, actual, check_like=True).

Assuming the transformations applied create rows and columns in a different order than expected, I was hoping that check_like=True would handle the differences without me having to sort the resulting DataFrame by column(s).

But it seems that I will have to sort the DataFrame either way, since the indexes for the values have to match:

In [31]: a = pd.DataFrame([{'filename':'a'}, {'filename': 'b'}], index=['a', 'b'])

In [32]: b = pd.DataFrame([{'filename':'b'}, {'filename': 'a'}], index=['b', 'a'])

In short, I hoped that:

a = pd.DataFrame([{'filename':'a'}, {'filename': 'b'}], index=['a', 'b'])
b = create_dataframe_transformation('filename', ['b', 'a'])

pd.testing.assert_frame_equal(a, b, check_like=True)

would pass, without having to do:

a = a.sort_values('filename')
b = b.sort_values('filename')

TLDR: for check_like to work, the values needs to have the same order of indexing.

TomAugspurger added the Docs label Jul 25, 2018

WillAyd added the good first issue label Jul 26, 2018

jorisvandenbossche added this to the Contributions Welcome milestone Jul 26, 2018

DeanLa mentioned this issue Jul 28, 2018

DOC: Clarify check_like behavior in assert_frame_equal #22106

Merged

1 task

jreback modified the milestones: Contributions Welcome, 0.24.0 Jul 31, 2018

WillAyd closed this as completed in #22106 Jul 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pd.testing.assert_frame_equal check_like not working like expected #22052

pd.testing.assert_frame_equal check_like not working like expected #22052

lassebenni commented Jul 25, 2018 •

edited

Loading

INSTALLED VERSIONS

TomAugspurger commented Jul 25, 2018

TomAugspurger commented Jul 25, 2018

lassebenni commented Jul 26, 2018 •

edited

Loading

pd.testing.assert_frame_equal check_like not working like expected #22052

pd.testing.assert_frame_equal check_like not working like expected #22052

Comments

lassebenni commented Jul 25, 2018 • edited Loading

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Jul 25, 2018

TomAugspurger commented Jul 25, 2018

lassebenni commented Jul 26, 2018 • edited Loading

lassebenni commented Jul 25, 2018 •

edited

Loading

Output of `pd.show_versions()`

lassebenni commented Jul 26, 2018 •

edited

Loading