You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Missing values cannot be coerced into a column of dtype bool. The expected behaviour, to be consistent with the analogous case for integers, is to throw a ValueError. The user is then aware of the issue and can specify a converter or parse the boolean as a float (subject to #16698).
Instead the c engine silently ignores the requested dtype and returns a column of type object. This behaviour is especially confusing on large datasets with low_memory=True as you get the warning: Column 1 has mixed types. Specify dtype option on import or set low_memory=False.
The python engine silently converts all missing values to True and returns a column of dtype bool.
Expected Output
ValueError('Boolean column has NA values in column 1')
Yes, I agree this should match the integer behavior, thanks for the report, PR welcome!
df = pd.read_csv(StringIO(data), header=None, names=['a','b'], dtype={'a': 'bool', 'b': 'int64'}, engine='c')
ValueError Traceback (most recent call last)
<snip>
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
ValueError: Integer column has NA values in column 1
Code Sample, a copy-pastable example if possible
outputs:
object
a b
0 False 1.0
1 NaN 1.0
2 True NaN
outputs:
bool
a b
0 False 1.0
1 True 1.0
2 True NaN
Problem description
Missing values cannot be coerced into a column of dtype bool. The expected behaviour, to be consistent with the analogous case for integers, is to throw a ValueError. The user is then aware of the issue and can specify a converter or parse the boolean as a float (subject to #16698).
Instead the c engine silently ignores the requested dtype and returns a column of type object. This behaviour is especially confusing on large datasets with low_memory=True as you get the warning: Column 1 has mixed types. Specify dtype option on import or set low_memory=False.
The python engine silently converts all missing values to True and returns a column of dtype bool.
Expected Output
ValueError('Boolean column has NA values in column 1')
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_US.UTF-8
pandas: 0.23.0.dev0+725.gf67c6fa80
pytest: 3.5.0
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: 1.0.1
pyarrow: 0.9.0
xarray: 0.10.2
IPython: 6.3.0
sphinx: 1.7.2
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.1
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.2
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.6
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.4
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: