-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
assert_frame_equal not differentiating NaN and None within object dtype #18463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Generally None and np.nan are considered equivalent in pandas, both are seen as missing value. For other types of dtypes the None will on construction be converted to NaN, so you don't have this issue in such a case:
and for example the
That said, to be able to test we are actually retuning NaN and not None in the issue you referenced, I agree it is annoying that you don't have a way to discriminate between both in For the PR as a workaround now, you can get the values of that column and assert that those are equal (eg |
I would add a kwarg to I don't think working around this is a good idea, let's actually fix the testing. |
@jreback do you know why The reason I'm asking is because while we could add a kwarg to the FWIW if we wanted to modify the behavior of the pandas/pandas/core/internals.py Line 1532 in aec3347
The |
so for example, |
from above, this is actually wrong and we should strictly check this.
|
I think this also affects pd.NA. In [45]: a = pd.Series([1, np.nan], dtype=object)
In [46]: b = pd.Series([1, pd.NA], dtype=object)
In [47]: pd.testing.assert_series_equal(a, b) A keyword to control this would be great. Based on the recent experience with def assert_series_equal(..., check_na=None):
if check_na is None:
check_na = pandas.config.get("testing.check_na")
if check_na is None;
warnings.warn("default to false, changing to strict in the future")
check_na = False |
In _libs.testing.assert_almost_equal we have
Changing that to
breaks 431 tests. 224 of those are in tests/strings/, 76 in tests/arithmetic/ is_matching_na has a I definitely think we should become stricter here. As Tom mentioned when we made check_freq stricter, doing this probably needs a deprecation cycle. |
Code Sample, a copy-pastable example if possible
Problem description
No
AssertionError
gets raised in this case, even though I would not expectNone
andnp.nan
to be considered equal values (I found this while testing #18450)Note that if you simply compared a DataFrame with only
np.nan
with another containing onlyNone
you would get anAttributeError
that the dtypes are different (float64
vsobject
) but because the missing values are mixed into object dtypes here I think that differentiation gets lostExpected Output
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: d64995a
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0.dev0+205.gd64995a4f
pytest: 3.2.5
pip: 9.0.1
setuptools: 36.4.0
Cython: 0.26
numpy: 1.13.1
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.3
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: