Skip to content

read_csv: silently skips null values from single column CSV #12443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
joristork opened this issue Feb 25, 2016 · 3 comments
Closed

read_csv: silently skips null values from single column CSV #12443

joristork opened this issue Feb 25, 2016 · 3 comments
Labels
IO CSV read_csv, to_csv Needs Info Clarification about behavior needed to assess issue

Comments

@joristork
Copy link

pd.read_csv()'s skip_blank_lines parameter defaults to True

I'm not sure if that is a good idea?

If I write away a dataframe that has 1 column and includes N null values to a CSV, without writing away an index (index=False), then I do not expect that when I read the dataframe back from that CSV the result is N rows shorter than the original.

For example I was puzzled when I wrote away a column with 1000 rows to CSV, then read that column back from the CSV to find it only had 37 rows.

In [2]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-1-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_ZA.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 1.5.6
setuptools: 20.1.1
Cython: None
numpy: 1.11.0b3
scipy: 0.17.0
statsmodels: None
IPython: 4.1.1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
Jinja2: None
@joristork joristork changed the title skip_blank_lines pd.read_csv() skip_blank_lines defaults to True Feb 25, 2016
@joristork joristork changed the title pd.read_csv() skip_blank_lines defaults to True pd.read_csv(): silently skips null values from single column CSV Feb 25, 2016
@jorisvandenbossche jorisvandenbossche changed the title pd.read_csv(): silently skips null values from single column CSV read_csv: silently skips null values from single column CSV Feb 25, 2016
@jreback
Copy link
Contributor

jreback commented Feb 25, 2016

you would have to show an example

@jreback jreback added IO CSV read_csv, to_csv Needs Info Clarification about behavior needed to assess issue labels Feb 25, 2016
@gfyoung
Copy link
Member

gfyoung commented Aug 21, 2016

@jreback : I don't quite understand this issue because "null value" is extremely vague. Also, pandas does generally respect null values and doesn't skip them. I would vote to close this unless an example can be provided (which it hasn't since this is almost 6 months old).

@jreback
Copy link
Contributor

jreback commented Aug 21, 2016

I think a single column is written w/o an index
it has mostly nulls so blank lines got written

this is user error as that is the point of the index

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

3 participants