Skip to content

BUG: read_csv() incorrectly parses data in case '\n' is used as a separator #43528

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
romanov-o opened this issue Sep 12, 2021 · 5 comments · Fixed by #44899
Closed
2 of 3 tasks

BUG: read_csv() incorrectly parses data in case '\n' is used as a separator #43528

romanov-o opened this issue Sep 12, 2021 · 5 comments · Fixed by #44899
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Milestone

Comments

@romanov-o
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
import io

df = pd.read_csv(io.StringIO('A\nB\nC'), sep='\n', header=None, names=range(3))

Issue Description

pd.read_csv() incorrectly parses data in case '\n' is used as a separator.

df = pd.read_csv(io.StringIO('A\nB\nC'), sep='\n', header=None, names=range(3))

produces:

   0   1   2
0  A NaN NaN
1  B NaN NaN
2  C NaN NaN

Expected Behavior

Expected:

   0   1   2
0  A   B   C

Installed Versions

INSTALLED VERSIONS

commit : 5f648bf
python : 3.8.10.final.0
python-bits : 64
OS : Darwin
OS-release : 20.6.0
Version : Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.3.2
numpy : 1.19.2
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : 0.29.24
pytest : 6.2.5
hypothesis : None
sphinx : 4.1.2
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : 2.9.1 (dt dec pq3 ext lo64)
jinja2 : 3.0.1
IPython : 7.27.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.08.1
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.8
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.4.23
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.54.0

@romanov-o romanov-o added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 12, 2021
@leonarduschen
Copy link
Contributor

How would it detect newlines though?

@phofl
Copy link
Member

phofl commented Sep 12, 2021

Yeah agreed. I think we should raise if \n is provided as separator.

@leonarduschen
Copy link
Contributor

I'll go ahead and make a PR for that, should be a quick one

@leonarduschen
Copy link
Contributor

Wait - I realized we have the parameter lineterminator - which is \n by default.

import pandas as pd
import io

df = pd.read_csv(io.StringIO('A\nB\nC'), sep='\n', header=None, names=range(3), lineterminator='\r')

gives

   0  1  2
0  A  B  C

Maybe we raise when sep is the same as lineterminator ?

@phofl
Copy link
Member

phofl commented Sep 13, 2021

A separator with more than one character forces the python engine, which does not accept a lineterminator

@mroeschke mroeschke added Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv Enhancement and removed Needs Triage Issue that has not been reviewed by a pandas team member Bug labels Sep 30, 2021
@jreback jreback added this to the 1.4 milestone Dec 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants