Skip to content

tm.assert_series_equal checking on Integer EA should be exact by default? #30347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
simonjayhawkins opened this issue Dec 19, 2019 · 5 comments
Closed
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Testing pandas testing functions or related to the test suite

Comments

@simonjayhawkins
Copy link
Member

Code Sample, a copy-pastable example if possible

>>>
>>> import pandas as pd
>>> pd.__version__
'0.26.0.dev0+1328.g03dffb1eb'
>>>
>>> import pandas.testing as tm
>>> s1 = pd.Series([10000000000000000], dtype="Int64")
>>> s1
0    10000000000000000
dtype: Int64
>>>
>>> s2 = pd.Series([9999999999999999], dtype="Int64")
>>> s2
0    9999999999999999
dtype: Int64
>>>
>>> tm.assert_series_equal(s1,s2)
>>>

Problem description

does not raise

Expected Output

AssertionError: ExtensionArray are different

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 03dffb1
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : None.None

pandas : 0.26.0.dev0+1328.g03dffb1eb
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 42.0.2.post20191201
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 4.54.1
sphinx : 2.3.0
blosc : None
feather : None
xlsxwriter : 1.2.6
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.10.2
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.3.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.2
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.3.2
s3fs : 0.4.0
scipy : 1.3.1
sqlalchemy : 1.3.11
tables : 3.6.1
xarray : 0.14.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.6

discovered while investigating #30268

@simonjayhawkins simonjayhawkins changed the title tm.assert_series_equal checking on Integer EA should be exact. tm.assert_series_equal checking on Integer EA should be exact by default? Dec 19, 2019
@simonjayhawkins simonjayhawkins added the Testing pandas testing functions or related to the test suite label Dec 19, 2019
@TomAugspurger
Copy link
Contributor

What do we do for a NumPy int64-dtype array?

@simonjayhawkins
Copy link
Member Author

simonjayhawkins commented Dec 19, 2019

That behaves the same, perhaps less restrictive issue title? I see some previous discussion on this (e.g. #26272) and concerns of it being a breaking API change.

For Integer EA, maybe this won't apply?

IMO integers should be exact, could be identifiers, i.e. invoice number. If our tests are not checking for exactness by default then we could have latent bugs.

with check_exact works fine.

@TomAugspurger
Copy link
Contributor

Agreed. +1 for sure on IntegerArray, and probably on int64, though I don't recall the details on why it's not exact right now.

@jbrockmendel
Copy link
Member

Instead of making the default behavior dtype-specific, should we just make the default in assert_extension_array_equal check_exact=True? I think also in tm.assert_equal we hardcode check_exact=False which should change.

@mroeschke
Copy link
Member

Closing in favor of #55882

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

4 participants