BUG: read_csv() incorrectly parses data in case '\n' is used as a separator #43528

romanov-o · 2021-09-12T05:47:11Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
import io

df = pd.read_csv(io.StringIO('A\nB\nC'), sep='\n', header=None, names=range(3))

Issue Description

pd.read_csv() incorrectly parses data in case '\n' is used as a separator.

df = pd.read_csv(io.StringIO('A\nB\nC'), sep='\n', header=None, names=range(3))

produces:

   0   1   2
0  A NaN NaN
1  B NaN NaN
2  C NaN NaN

Expected Behavior

Expected:

   0   1   2
0  A   B   C

Installed Versions

INSTALLED VERSIONS

commit : 5f648bf
python : 3.8.10.final.0
python-bits : 64
OS : Darwin
OS-release : 20.6.0
Version : Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.3.2
numpy : 1.19.2
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : 0.29.24
pytest : 6.2.5
hypothesis : None
sphinx : 4.1.2
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : 2.9.1 (dt dec pq3 ext lo64)
jinja2 : 3.0.1
IPython : 7.27.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.08.1
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.8
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.4.23
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.54.0

The text was updated successfully, but these errors were encountered:

leonarduschen · 2021-09-12T12:17:52Z

How would it detect newlines though?

phofl · 2021-09-12T21:26:22Z

Yeah agreed. I think we should raise if \n is provided as separator.

leonarduschen · 2021-09-13T01:03:51Z

I'll go ahead and make a PR for that, should be a quick one

leonarduschen · 2021-09-13T01:10:31Z

Wait - I realized we have the parameter lineterminator - which is \n by default.

import pandas as pd
import io

df = pd.read_csv(io.StringIO('A\nB\nC'), sep='\n', header=None, names=range(3), lineterminator='\r')

gives

   0  1  2
0  A  B  C

Maybe we raise when sep is the same as lineterminator ?

phofl · 2021-09-13T08:14:39Z

A separator with more than one character forces the python engine, which does not accept a lineterminator

romanov-o added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 12, 2021

mroeschke added Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv Enhancement and removed Needs Triage Issue that has not been reviewed by a pandas team member Bug labels Sep 30, 2021

phofl mentioned this issue Dec 15, 2021

BUG: read_csv not raising when \n as sep #44899

Merged

4 tasks

jreback added this to the 1.4 milestone Dec 16, 2021

jreback closed this as completed in #44899 Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: read_csv() incorrectly parses data in case '\n' is used as a separator #43528

BUG: read_csv() incorrectly parses data in case '\n' is used as a separator #43528

romanov-o commented Sep 12, 2021

INSTALLED VERSIONS

leonarduschen commented Sep 12, 2021

phofl commented Sep 12, 2021

leonarduschen commented Sep 13, 2021

leonarduschen commented Sep 13, 2021

phofl commented Sep 13, 2021

BUG: read_csv() incorrectly parses data in case '\n' is used as a separator #43528

BUG: read_csv() incorrectly parses data in case '\n' is used as a separator #43528

Comments

romanov-o commented Sep 12, 2021

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

leonarduschen commented Sep 12, 2021

phofl commented Sep 12, 2021

leonarduschen commented Sep 13, 2021

leonarduschen commented Sep 13, 2021

phofl commented Sep 13, 2021