Skip to content

assert_series_equal(..., check_exact=True) reports identically constructed Series are not equal. #22400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RhysU opened this issue Aug 17, 2018 · 4 comments · Fixed by #47627
Closed
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@RhysU
Copy link

RhysU commented Aug 17, 2018

Code Sample

#!/usr/bin/env python
import numpy as np
import pandas as pd
import pandas.testing as pdt

# Notice these are identically constructed objects
x = pd.Series([0,
               0.0131142231938,
               1.77774652865e-05,
               np.array([0.4722720840328748, 0.4216929783681722])])
y = pd.Series([0,
               0.0131142231938,
               1.77774652865e-05,
               np.array([0.4722720840328748, 0.4216929783681722])])

print(x)

pdt.assert_series_equal(x, x)  # Works as expected
pdt.assert_series_equal(x, x, check_exact=True)  # Works as expected
pdt.assert_series_equal(x, y)  # Works as expected
pdt.assert_series_equal(x, y, check_exact=True)  # Unexpectedly fails

Problem description

The current behavior reports that two identically constructed Series are different for check_exact=True. Additionally, this operation emits unexpected warnings. That is, I see the following:

$ ./recreate
0                                           0
1                                   0.0131142
2                                 1.77775e-05
3    [0.4722720840328748, 0.4216929783681722]
dtype: object
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Exception ignored in: 'pandas._libs.lib.array_equivalent_object'
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Traceback (most recent call last):
  File "./recreate", line 21, in <module>
    pdt.assert_series_equal(x, y, check_exact=True)  # Unexpectedly fails
  File "/nas/dft/ire/rhys/envs/testing/lib/python3.5/site-packages/pandas/util/testing.py", line 1215, in assert_series_equal
    obj='{obj}'.format(obj=obj),)
  File "/nas/dft/ire/rhys/envs/testing/lib/python3.5/site-packages/pandas/util/testing.py", line 1104, in assert_numpy_array_equal
    _raise(left, right, err_msg)
  File "/nas/dft/ire/rhys/envs/testing/lib/python3.5/site-packages/pandas/util/testing.py", line 1098, in _raise
    raise_assert_detail(obj, msg, left, right)
  File "/nas/dft/ire/rhys/envs/testing/lib/python3.5/site-packages/pandas/util/testing.py", line 1035, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: Series are different

Series values are different (0.0 %)
[left]:  [0, 0.0131142231938, 1.77774652865e-05, [0.4722720840328748, 0.4216929783681722]]
[right]: [0, 0.0131142231938, 1.77774652865e-05, [0.4722720840328748, 0.4216929783681722]]

Expected Output

I expect two identically constructed Series to be equal, which would give:

$ ./recreate
0                                           0
1                                   0.0131142
2                                 1.77775e-05
3    [0.4722720840328748, 0.4216929783681722]
dtype: object

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.35-pv-ts2
machine: x86_64
processor:
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.utf8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 10.0.1
setuptools: 40.0.0
Cython: None
numpy: 1.15.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
None

@TomAugspurger
Copy link
Contributor

Might want to check with NumPy, though it's probably doing the right thing here.

In [51]: np.testing.assert_array_equal(x.values, y.values)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-51-5b0b6c40059c> in <module>()
----> 1 np.testing.assert_array_equal(x.values, y.values)

~/sandbox/numpy/numpy/testing/_private/utils.py in assert_array_equal(x, y, err_msg, verbose)
    860     __tracebackhide__ = True  # Hide traceback for py.test
    861     assert_array_compare(operator.__eq__, x, y, err_msg=err_msg,
--> 862                          verbose=verbose, header='Arrays are not equal')
    863
    864

~/sandbox/numpy/numpy/testing/_private/utils.py in assert_array_compare(comparison, x, y, err_msg, verbose, header, precision, equal_nan, equal_inf)
    784                                 verbose=verbose, header=header,
    785                                 names=('x', 'y'), precision=precision)
--> 786             raise AssertionError(msg)
    787     except ValueError:
    788         import traceback

AssertionError:
Arrays are not equal

(mismatch 100.0%)
 x: array([0, 0.0131142231938, 1.77774652865e-05,
       array([0.47227208, 0.42169298])], dtype=object)
 y: array([0, 0.0131142231938, 1.77774652865e-05,
       array([0.47227208, 0.42169298])], dtype=object)

@gfyoung gfyoung added the Testing pandas testing functions or related to the test suite label Aug 19, 2018
@gfyoung
Copy link
Member

gfyoung commented Aug 19, 2018

Indeed, I think the nested arrays are tripping numpy up somehow. Under the hood, we are using numpy's equality overload to compare these arrays.

>>> x.values == y.values
False

>>> a = np.array([1, 2])
>>> b = np.array([1, 2])
>>> a == b
array([ True,  True])

With nested arrays, numpy is somehow not doing element-wise comparison as I would expect (if you check element-wise, you'll see all of them are equal as expected). However, as it currently stands, the single False being returned is causing us to interpret that all elements were not equal (similar to what numpy does as above), hence why we see 100% mismatch.

cc @charris

@mroeschke
Copy link
Member

This looks to not raise an AssertionError on master. Could use a test to confirm on our side

In [3]: import numpy as np
   ...: import pandas as pd
   ...: import pandas.testing as pdt
   ...:
   ...: # Notice these are identically constructed objects
   ...: x = pd.Series([0,
   ...:                0.0131142231938,
   ...:                1.77774652865e-05,
   ...:                np.array([0.4722720840328748, 0.4216929783681722])])
   ...: y = pd.Series([0,
   ...:                0.0131142231938,
   ...:                1.77774652865e-05,
   ...:                np.array([0.4722720840328748, 0.4216929783681722])])
   ...:
   ...: print(x)
   ...:
   ...: pdt.assert_series_equal(x, x)  # Works as expected
   ...: pdt.assert_series_equal(x, x, check_exact=True)  # Works as expected
   ...: pdt.assert_series_equal(x, y)  # Works as expected
   ...: pdt.assert_series_equal(x, y, check_exact=True)
0                                           0
1                                    0.013114
2                                    0.000018
3    [0.4722720840328748, 0.4216929783681722]
dtype: object

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Testing pandas testing functions or related to the test suite labels Jun 22, 2021
@srotondo
Copy link
Contributor

srotondo commented Jul 6, 2022

take

srotondo pushed a commit to srotondo/pandas that referenced this issue Jul 7, 2022
srotondo pushed a commit to srotondo/pandas that referenced this issue Jul 7, 2022
mroeschke pushed a commit that referenced this issue Jul 7, 2022
* TST: Added test for consistent type with unique agg #22558

* TST: Added test for consistent type with unique agg #22558

* TST: Moved and restructured test #22558

* TST: Added test for nested series #22400

* TST: Added equality test for nested series #22400

Co-authored-by: Steven Rotondo <[email protected]>
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this issue Jul 13, 2022
* TST: Added test for consistent type with unique agg pandas-dev#22558

* TST: Added test for consistent type with unique agg pandas-dev#22558

* TST: Moved and restructured test pandas-dev#22558

* TST: Added test for nested series pandas-dev#22400

* TST: Added equality test for nested series pandas-dev#22400

Co-authored-by: Steven Rotondo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants