~ operator on Series with BooleanDtype casts to object and Series with object dtype gives TypeError comparing to Series with BooleanDtype #31489

gerritholl · 2020-01-31T10:42:26Z

Code Sample, a copy-pastable example if possible

Using pandas 1.0.0:

~pandas.Series([True, False], dtype=pandas.BooleanDtype())

Or:

s1 = pandas.Series([True, False], dtype=pandas.BooleanDtype())
s2 = pandas.Series([False, True], dtype=pandas.BooleanDtype())
(~s1).equals(s2) and (~s2).equals(s1)

Problem description

Using the pandas.BooleanDtype() dtype, the ~ operator unexpectedly:

results in a Series with dtype object instead of pandas.BooleanDtype()
operates on the boolean values as if they were integers (resulting in -2 for ~True and -1 for ~False), rather than booleans (resulting in False for ~True and True for ~False).

This is unexpected and undesirable, because:

On numpy arrays or Series with dtype np.bool_, the same operation behaves differently:
- The dtype remains np.bool_
- ~True results in False and ~False results in True
When having a Series with a boolean dtype, the ~ operator is a convenient way to negate the value

As a workaround, I'm using numpy.logical_not(s1), which does preserve the dtype and have the expected result.

The second example results in an exception TypeError: data type not understood, because (~s1) with its object dtype doesn't understand the comparison against s2 with its pandas.BooleanDtype() dtype.

Expected Output

I would expect that:

the first example results in pandas.Series([False, True], dtype=pandas.BooleanDtype())
the second example results in True

Output of `pd.show_versions()`

In [35]: pandas.show_versions()
/media/nas/x21324/miniconda3/envs/py37e/lib/python3.7/site-packages/fastparquet/dataframe.py:5: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
from pandas.core.index import CategoricalIndex, RangeIndex, Index, MultiIndex

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.12.14-lp150.12.82-default
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.0.0
numpy : 1.17.5
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0.post20200119
Cython : 0.29.14
pytest : 5.3.5
hypothesis : None
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.0
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : None
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : 0.14.1
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-01-31T12:16:30Z

I thought we recently fixed this, and on master it also works correctly:

In [9]: ~pandas.Series([True, False], dtype=pandas.BooleanDtype())   
Out[9]: 
0    False
1     True
dtype: boolean

so maybe something went wrong with backporting it

jorisvandenbossche · 2020-01-31T12:20:13Z

Yep, #31183 was never backported. It seems the bot was down for a moment ...

So we will backport the fix for 1.0.1

gerritholl · 2020-01-31T12:22:32Z

Is it useful to report bugs that exist in v1.0.0 but not in master, or should I always check with the latest git master before reporting? Normally I think I should check with the latest git master but it appears that the report "backport missing" was still useful in this case?

jorisvandenbossche · 2020-01-31T12:32:34Z

Indeed, in this case it was certainly useful that you reported it for 1.0.0, since that led us to discover we forgot to include the existing fix 1.0.0.

In general, it's certainly welcome to test on master, but if you are not directly able to, it's no problem to report it with the released version (the majority of the bug reports is based on the latest released version)

jorisvandenbossche added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Jan 31, 2020

jorisvandenbossche added this to the 1.0.1 milestone Jan 31, 2020

jorisvandenbossche mentioned this issue Jan 31, 2020

Backport PR #31183: BUG: Series/Frame invert dtypes' #31493

Merged

jorisvandenbossche closed this as completed Jan 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

~ operator on Series with BooleanDtype casts to object and Series with object dtype gives TypeError comparing to Series with BooleanDtype #31489

~ operator on Series with BooleanDtype casts to object and Series with object dtype gives TypeError comparing to Series with BooleanDtype #31489

gerritholl commented Jan 31, 2020 •

edited

Loading

INSTALLED VERSIONS

jorisvandenbossche commented Jan 31, 2020

jorisvandenbossche commented Jan 31, 2020

gerritholl commented Jan 31, 2020

jorisvandenbossche commented Jan 31, 2020

~ operator on Series with BooleanDtype casts to object and Series with object dtype gives TypeError comparing to Series with BooleanDtype #31489

~ operator on Series with BooleanDtype casts to object and Series with object dtype gives TypeError comparing to Series with BooleanDtype #31489

Comments

gerritholl commented Jan 31, 2020 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jorisvandenbossche commented Jan 31, 2020

jorisvandenbossche commented Jan 31, 2020

gerritholl commented Jan 31, 2020

jorisvandenbossche commented Jan 31, 2020

gerritholl commented Jan 31, 2020 •

edited

Loading

Output of `pd.show_versions()`