read_csv: silently skips null values from single column CSV #12443

joristork · 2016-02-25T08:24:24Z

pd.read_csv()'s skip_blank_lines parameter defaults to True

I'm not sure if that is a good idea?

If I write away a dataframe that has 1 column and includes N null values to a CSV, without writing away an index (index=False), then I do not expect that when I read the dataframe back from that CSV the result is N rows shorter than the original.

For example I was puzzled when I wrote away a column with 1000 rows to CSV, then read that column back from the CSV to find it only had 37 rows.

In [2]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-1-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_ZA.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 1.5.6
setuptools: 20.1.1
Cython: None
numpy: 1.11.0b3
scipy: 0.17.0
statsmodels: None
IPython: 4.1.1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
Jinja2: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-02-25T13:03:53Z

you would have to show an example

gfyoung · 2016-08-21T01:52:00Z

@jreback : I don't quite understand this issue because "null value" is extremely vague. Also, pandas does generally respect null values and doesn't skip them. I would vote to close this unless an example can be provided (which it hasn't since this is almost 6 months old).

jreback · 2016-08-21T02:41:02Z

I think a single column is written w/o an index
it has mostly nulls so blank lines got written

this is user error as that is the point of the index

joristork changed the title ~~skip_blank_lines~~ pd.read_csv() skip_blank_lines defaults to True Feb 25, 2016

joristork changed the title ~~pd.read_csv() skip_blank_lines defaults to True~~ pd.read_csv(): silently skips null values from single column CSV Feb 25, 2016

jorisvandenbossche changed the title ~~pd.read_csv(): silently skips null values from single column CSV~~ read_csv: silently skips null values from single column CSV Feb 25, 2016

jreback added IO CSV read_csv, to_csv Needs Info Clarification about behavior needed to assess issue labels Feb 25, 2016

jreback closed this as completed Aug 21, 2016

ahawryluk mentioned this issue Feb 24, 2021

Add back skip_blank_lines to read_excel in pandas v>1.1.4 #39808

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_csv: silently skips null values from single column CSV #12443

read_csv: silently skips null values from single column CSV #12443

joristork commented Feb 25, 2016

jreback commented Feb 25, 2016

gfyoung commented Aug 21, 2016 •

edited

Loading

jreback commented Aug 21, 2016

read_csv: silently skips null values from single column CSV #12443

read_csv: silently skips null values from single column CSV #12443

Comments

joristork commented Feb 25, 2016

jreback commented Feb 25, 2016

gfyoung commented Aug 21, 2016 • edited Loading

jreback commented Aug 21, 2016

gfyoung commented Aug 21, 2016 •

edited

Loading