GH20591 read_csv raise ValueError for bool columns with missing values (C engine) #23968

JustinZhengBC · 2018-11-28T08:11:46Z

closes read_csv ignores dtype for bool columns with missing values #20591
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

As explained in the referenced issue, trying to cast missing values to bool dtype should result in a ValueError similar to when casting to int

pep8speaks · 2018-11-28T08:11:49Z

Hello @JustinZhengBC! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/io/parsers.py !
There are no PEP8 issues in the file pandas/tests/io/parser/test_na_values.py !

Comment last updated on November 29, 2018 at 07:14 Hours UTC

codecov · 2018-11-28T08:59:56Z

Codecov Report

Merging #23968 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #23968      +/-   ##
==========================================
- Coverage   92.31%   92.31%   -0.01%     
==========================================
  Files         161      161              
  Lines       51513    51554      +41     
==========================================
+ Hits        47554    47591      +37     
- Misses       3959     3963       +4

Flag	Coverage Δ
#multiple	`90.71% <100%> (-0.01%)`	⬇️
#single	`42.44% <0%> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.38% <100%> (+0.01%)`	⬆️
pandas/core/arrays/timedeltas.py	`96% <0%> (-0.26%)`	⬇️
pandas/core/indexes/base.py	`96.32% <0%> (-0.17%)`	⬇️
pandas/core/config.py	`87.04% <0%> (-0.13%)`	⬇️
pandas/io/sas/sas_xport.py	`90.14% <0%> (-0.1%)`	⬇️
pandas/io/formats/printing.py	`93.01% <0%> (-0.08%)`	⬇️
pandas/plotting/_core.py	`83.58% <0%> (-0.05%)`	⬇️
pandas/core/computation/align.py	`97.84% <0%> (-0.05%)`	⬇️
pandas/core/reshape/melt.py	`97.5% <0%> (-0.05%)`	⬇️
pandas/core/dtypes/concat.py	`96.63% <0%> (-0.04%)`	⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 43b2dab...b8bca36. Read the comment docs.

pandas/tests/io/parser/test_na_values.py

JustinZhengBC · 2018-11-28T19:45:53Z

Currently the python engine delegates type conversion to np.ndarray.astype in this case. Should it be overridden in this case for consistency with the C engine, or left as is so numpy users don't get unexpected behaviour?

pandas/tests/io/parser/test_na_values.py

jreback · 2018-11-28T22:13:32Z

Currently the python engine delegates type conversion to np.ndarray.astype in this case. Should it be overridden in this case for consistency with the C engine, or left as is so numpy users don't get unexpected behaviour?

@gfyoung ?

gfyoung · 2018-11-28T22:51:40Z

Currently the python engine delegates type conversion to np.ndarray.astype in this case. Should it be overridden in this case for consistency with the C engine, or left as is so numpy users don't get unexpected behaviour?

I would consider current behavior to be a bug. Let's override so that we have consistent behavior.

pandas/tests/io/parser/test_na_values.py

pandas/io/parsers.py

doc/source/whatsnew/v0.24.0.rst

JustinZhengBC · 2018-11-29T00:58:53Z

@gfyoung I managed to do it in _convert_data. It requires iterating over the columns of data though (which happens again in _convert_to_ndarrays). I needed the try/catch block because is_bool_dtype throws ValueErrors on the line dtype._is_boolean when given inputs like "category" and "foo". Is that intended behaviour?

pandas/io/parsers.py

doc/source/whatsnew/v0.24.0.rst

pandas/io/parsers.py

pandas/tests/io/parser/test_na_values.py

…lues

pandas/io/parsers.py

gfyoung

cc @jreback

jreback · 2018-12-02T16:22:04Z

thanks @JustinZhengBC

…s (C engine) (pandas-dev#23968)

jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions IO CSV read_csv, to_csv labels Nov 28, 2018

jreback requested changes Nov 28, 2018

View reviewed changes

pandas/tests/io/parser/test_na_values.py Outdated Show resolved Hide resolved

JustinZhengBC force-pushed the BUG-20591 branch 2 times, most recently from ea1ec6f to b176a72 Compare November 28, 2018 19:42

JustinZhengBC changed the title ~~GH20591 read_csv raise ValueError for bool columns with missing values~~ GH20591 read_csv raise ValueError for bool columns with missing values (C engine) Nov 28, 2018

jreback added this to the 0.24.0 milestone Nov 28, 2018

jreback reviewed Nov 28, 2018

View reviewed changes

pandas/tests/io/parser/test_na_values.py Outdated Show resolved Hide resolved

gfyoung reviewed Nov 28, 2018

View reviewed changes

pandas/tests/io/parser/test_na_values.py Outdated Show resolved Hide resolved

gfyoung reviewed Nov 28, 2018

View reviewed changes

pandas/io/parsers.py Outdated Show resolved Hide resolved

gfyoung reviewed Nov 28, 2018

View reviewed changes

doc/source/whatsnew/v0.24.0.rst Outdated Show resolved Hide resolved

JustinZhengBC force-pushed the BUG-20591 branch 2 times, most recently from 26804d5 to 6c67491 Compare November 29, 2018 00:53

gfyoung reviewed Nov 29, 2018

View reviewed changes

pandas/io/parsers.py Outdated Show resolved Hide resolved

gfyoung reviewed Nov 29, 2018

View reviewed changes

doc/source/whatsnew/v0.24.0.rst Outdated Show resolved Hide resolved

gfyoung reviewed Nov 29, 2018

View reviewed changes

pandas/io/parsers.py Outdated Show resolved Hide resolved

gfyoung reviewed Nov 29, 2018

View reviewed changes

pandas/tests/io/parser/test_na_values.py Show resolved Hide resolved

JustinZhengBC added 6 commits November 29, 2018 02:36

BUG-20591 read_csv raises ValueError for bool columns with missing va…

8cdbcdb

…lues

BUG-20591 specify error in test

c428287

BUG-20591 modify python parser as well

0e9f23d

fix typo

0d67f22

BUG-20591 move logic to _convert_to_ndarrays

b4a1780

fix lint

b9a0f13

JustinZhengBC added 2 commits November 29, 2018 02:36

BUG-20591 test custom na values

92b0cc3

fix typo

b038137

JustinZhengBC force-pushed the BUG-20591 branch from b14f542 to b038137 Compare November 29, 2018 10:36

gfyoung reviewed Nov 29, 2018

View reviewed changes

pandas/io/parsers.py Outdated Show resolved Hide resolved

BUG-20591 fix na_count

b8bca36

gfyoung approved these changes Dec 2, 2018

View reviewed changes

jreback approved these changes Dec 2, 2018

View reviewed changes

jreback merged commit fe2969e into pandas-dev:master Dec 2, 2018

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

GH20591 read_csv raise ValueError for bool columns with missing value…

027ca9b

…s (C engine) (pandas-dev#23968)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

GH20591 read_csv raise ValueError for bool columns with missing value…

1444b33

…s (C engine) (pandas-dev#23968)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH20591 read_csv raise ValueError for bool columns with missing values (C engine) #23968

GH20591 read_csv raise ValueError for bool columns with missing values (C engine) #23968

JustinZhengBC commented Nov 28, 2018 •

edited

Loading

pep8speaks commented Nov 28, 2018 •

edited

Loading

codecov bot commented Nov 28, 2018 •

edited

Loading

JustinZhengBC commented Nov 28, 2018

jreback commented Nov 28, 2018

gfyoung commented Nov 28, 2018 •

edited

Loading

JustinZhengBC commented Nov 29, 2018

gfyoung left a comment

jreback commented Dec 2, 2018

GH20591 read_csv raise ValueError for bool columns with missing values (C engine) #23968

GH20591 read_csv raise ValueError for bool columns with missing values (C engine) #23968

Conversation

JustinZhengBC commented Nov 28, 2018 • edited Loading

pep8speaks commented Nov 28, 2018 • edited Loading

Comment last updated on November 29, 2018 at 07:14 Hours UTC

codecov bot commented Nov 28, 2018 • edited Loading

Codecov Report

JustinZhengBC commented Nov 28, 2018

jreback commented Nov 28, 2018

gfyoung commented Nov 28, 2018 • edited Loading

JustinZhengBC commented Nov 29, 2018

gfyoung left a comment

Choose a reason for hiding this comment

jreback commented Dec 2, 2018

JustinZhengBC commented Nov 28, 2018 •

edited

Loading

pep8speaks commented Nov 28, 2018 •

edited

Loading

codecov bot commented Nov 28, 2018 •

edited

Loading

gfyoung commented Nov 28, 2018 •

edited

Loading