You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
s1=pandas.Series([True, False], dtype=pandas.BooleanDtype())
s2=pandas.Series([False, True], dtype=pandas.BooleanDtype())
(~s1).equals(s2) and (~s2).equals(s1)
Problem description
Using the pandas.BooleanDtype() dtype, the ~ operator unexpectedly:
results in a Series with dtype object instead of pandas.BooleanDtype()
operates on the boolean values as if they were integers (resulting in -2 for ~True and -1 for ~False), rather than booleans (resulting in False for ~True and True for ~False).
This is unexpected and undesirable, because:
On numpy arrays or Series with dtype np.bool_, the same operation behaves differently:
The dtype remains np.bool_
~True results in False and ~False results in True
When having a Series with a boolean dtype, the ~ operator is a convenient way to negate the value
As a workaround, I'm using numpy.logical_not(s1), which does preserve the dtype and have the expected result.
The second example results in an exception TypeError: data type not understood, because (~s1) with its object dtype doesn't understand the comparison against s2 with its pandas.BooleanDtype() dtype.
Expected Output
I would expect that:
the first example results in pandas.Series([False, True], dtype=pandas.BooleanDtype())
the second example results in True
Output of pd.show_versions()
In [35]: pandas.show_versions()
/media/nas/x21324/miniconda3/envs/py37e/lib/python3.7/site-packages/fastparquet/dataframe.py:5: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
from pandas.core.index import CategoricalIndex, RangeIndex, Index, MultiIndex
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.12.14-lp150.12.82-default
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
Is it useful to report bugs that exist in v1.0.0 but not in master, or should I always check with the latest git master before reporting? Normally I think I should check with the latest git master but it appears that the report "backport missing" was still useful in this case?
Indeed, in this case it was certainly useful that you reported it for 1.0.0, since that led us to discover we forgot to include the existing fix 1.0.0.
In general, it's certainly welcome to test on master, but if you are not directly able to, it's no problem to report it with the released version (the majority of the bug reports is based on the latest released version)
Code Sample, a copy-pastable example if possible
Using pandas 1.0.0:
Or:
Problem description
Using the
pandas.BooleanDtype()
dtype, the~
operator unexpectedly:Series
with dtypeobject
instead ofpandas.BooleanDtype()
-2
for~True
and-1
for~False
), rather than booleans (resulting inFalse
for~True
andTrue
for~False
).This is unexpected and undesirable, because:
Series
with dtypenp.bool_
, the same operation behaves differently:np.bool_
~True
results inFalse
and~False
results inTrue
~
operator is a convenient way to negate the valueAs a workaround, I'm using
numpy.logical_not(s1)
, which does preserve the dtype and have the expected result.The second example results in an exception
TypeError: data type not understood
, because (~s1) with itsobject
dtype doesn't understand the comparison againsts2
with itspandas.BooleanDtype()
dtype.Expected Output
I would expect that:
pandas.Series([False, True], dtype=pandas.BooleanDtype())
True
Output of
pd.show_versions()
In [35]: pandas.show_versions()
/media/nas/x21324/miniconda3/envs/py37e/lib/python3.7/site-packages/fastparquet/dataframe.py:5: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
from pandas.core.index import CategoricalIndex, RangeIndex, Index, MultiIndex
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.12.14-lp150.12.82-default
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.0.0
numpy : 1.17.5
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0.post20200119
Cython : 0.29.14
pytest : 5.3.5
hypothesis : None
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.0
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : None
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : 0.14.1
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.48.0
The text was updated successfully, but these errors were encountered: