Inconsistent behaviour of read_csv() between python and c engine with null values. #23056
Labels
Duplicate Report
Duplicate issue or pull request
IO CSV
read_csv, to_csv
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Code Sample
Using pandas v0.23.4
test.csv file
The c engine is used here.
Problem description
When using dtype to convert a column to string, the empty values are not shown to be null when running df.isnull() if the python engine is used with read_csv().
This is inconsistent with the c engine which I believe has the correct behaviour of identifying these values as null. This also causes issues when working with null values e.g. dropnull() does not drop these rows.
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 18.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_GB.UTF-8
pandas: 0.23.4
pytest: 3.8.0
pip: 9.0.1
setuptools: 38.2.4
Cython: None
numpy: 1.15.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.4.9
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.1.15
pymysql: None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: