Skip to content

~ operator on Series with BooleanDtype casts to object and Series with object dtype gives TypeError comparing to Series with BooleanDtype #31489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gerritholl opened this issue Jan 31, 2020 · 4 comments
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays
Milestone

Comments

@gerritholl
Copy link

gerritholl commented Jan 31, 2020

Code Sample, a copy-pastable example if possible

Using pandas 1.0.0:

~pandas.Series([True, False], dtype=pandas.BooleanDtype())

Or:

s1 = pandas.Series([True, False], dtype=pandas.BooleanDtype())
s2 = pandas.Series([False, True], dtype=pandas.BooleanDtype())
(~s1).equals(s2) and (~s2).equals(s1)

Problem description

Using the pandas.BooleanDtype() dtype, the ~ operator unexpectedly:

  • results in a Series with dtype object instead of pandas.BooleanDtype()
  • operates on the boolean values as if they were integers (resulting in -2 for ~True and -1 for ~False), rather than booleans (resulting in False for ~True and True for ~False).

This is unexpected and undesirable, because:

  • On numpy arrays or Series with dtype np.bool_, the same operation behaves differently:
    • The dtype remains np.bool_
    • ~True results in False and ~False results in True
  • When having a Series with a boolean dtype, the ~ operator is a convenient way to negate the value

As a workaround, I'm using numpy.logical_not(s1), which does preserve the dtype and have the expected result.

The second example results in an exception TypeError: data type not understood, because (~s1) with its object dtype doesn't understand the comparison against s2 with its pandas.BooleanDtype() dtype.

Expected Output

I would expect that:

  • the first example results in pandas.Series([False, True], dtype=pandas.BooleanDtype())
  • the second example results in True

Output of pd.show_versions()

In [35]: pandas.show_versions()
/media/nas/x21324/miniconda3/envs/py37e/lib/python3.7/site-packages/fastparquet/dataframe.py:5: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
from pandas.core.index import CategoricalIndex, RangeIndex, Index, MultiIndex

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.12.14-lp150.12.82-default
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.0.0
numpy : 1.17.5
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0.post20200119
Cython : 0.29.14
pytest : 5.3.5
hypothesis : None
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.0
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : None
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : 0.14.1
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.48.0

@jorisvandenbossche
Copy link
Member

I thought we recently fixed this, and on master it also works correctly:

In [9]: ~pandas.Series([True, False], dtype=pandas.BooleanDtype())   
Out[9]: 
0    False
1     True
dtype: boolean

so maybe something went wrong with backporting it

@jorisvandenbossche jorisvandenbossche added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Jan 31, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.1 milestone Jan 31, 2020
@jorisvandenbossche
Copy link
Member

Yep, #31183 was never backported. It seems the bot was down for a moment ...

So we will backport the fix for 1.0.1

@gerritholl
Copy link
Author

Is it useful to report bugs that exist in v1.0.0 but not in master, or should I always check with the latest git master before reporting? Normally I think I should check with the latest git master but it appears that the report "backport missing" was still useful in this case?

@jorisvandenbossche
Copy link
Member

Indeed, in this case it was certainly useful that you reported it for 1.0.0, since that led us to discover we forgot to include the existing fix 1.0.0.

In general, it's certainly welcome to test on master, but if you are not directly able to, it's no problem to report it with the released version (the majority of the bug reports is based on the latest released version)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

No branches or pull requests

2 participants