-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pd.testing.assert_frame_equal check_like not working like expected #22052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The order doesn't matter, but the same labels need to be with the same data. e.g. In [31]: a = pd.DataFrame([{'filename':'a'}, {'filename': 'b'}], index=['a', 'b'])
In [32]: b = pd.DataFrame([{'filename':'b'}, {'filename': 'a'}], index=['b', 'a'])
In [33]: pd.util.testing.assert_frame_equal(a, b) ---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-33-302d0eeab6d2> in <module>()
----> 1 pd.util.testing.assert_frame_equal(a, b)
~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in assert_frame_equal(left, right, check_dtype, check_index_type, check_column_type, check_frame_type, check_less_precise, check_names, by_blocks, check_exact, check_datetimelike_compat, check_categorical, check_like, obj)
1330 check_exact=check_exact,
1331 check_categorical=check_categorical,
-> 1332 obj='{obj}.index'.format(obj=obj))
1333
1334 # column comparison
~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in assert_index_equal(left, right, exact, check_names, check_less_precise, check_exact, check_categorical, obj)
858 check_less_precise=check_less_precise,
859 check_dtype=exact,
--> 860 obj=obj, lobj=left, robj=right)
861
862 # metadata comparison
pandas/_libs/testing.pyx in pandas._libs.testing.assert_almost_equal()
pandas/_libs/testing.pyx in pandas._libs.testing.assert_almost_equal()
~/Envs/dask-dev/lib/python3.6/site-packages/pandas/util/testing.py in raise_assert_detail(obj, message, left, right, diff)
1033 msg += "\n[diff]: {diff}".format(diff=diff)
1034
-> 1035 raise AssertionError(msg)
1036
1037
AssertionError: DataFrame.index are different
DataFrame.index values are different (100.0 %)
[left]: Index(['a', 'b'], dtype='object')
[right]: Index(['b', 'a'], dtype='object') But this passes
|
@lassebenni could you make a PR updating the docstring of |
Thank you for the quick response! I am not very familiar with Pandas Dataframes, so I missed the part of explicitly defining an index for the data. My use case is as following: I have some code that applies transformations to Spark Dataframes. For testing purposes I want to compare the expected result to the actual result. For this I transform the Spark Dataframes to Pandas Dataframes: Assuming the transformations applied create rows and columns in a different order than expected, I was hoping that But it seems that I will have to sort the DataFrame either way, since the indexes for the values have to match:
In short, I hoped that:
would pass, without having to do:
TLDR: for |
Code Sample
Problem description
According to the documentation (version 0.23.3) , pandas.testing.assert_frame_equal takes a "check_like" parameter which can be set to True if the function should ignore the order of columns & rows.
This does not work as I expect it to. When creating two Dataframe using a dict, it asserts them as being different due to the order of the rows.
Expected Output
I would expect the test to pass.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.93-linuxkit-aufs
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.3
pytest: 3.6.2
pip: 9.0.1
setuptools: 33.1.1
Cython: None
numpy: 1.15.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: