BUG: When csv has 1 line, pandas cannot read csv. `ParserError` occurred. #51924

yuji38kwmt · 2023-03-13T04:47:45Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas
from io import StringIO

data1="""
1,Alice
2,Bob
3,Chris
"""
df1 = pandas.read_csv(StringIO(data1), header=None, names=["id","name","country"])
print(df1)
#    id   name  country
# 0   1  Alice      NaN
# 1   2    Bob      NaN
# 2   3  Chris      NaN


data2="""
1,Alice
"""
df2 = pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"])
# ParserError: Too many columns specified: expected 3 and found 2

Issue Description

I want to read csv which does not contain header line.

When csv has 3 lines, pandas can read csv.
But csv has 1 line, pandas cannot read csv. The following error occurred.

ParserError                               Traceback (most recent call last)
Cell In[33], line 4
      1 data2="""
      2 1,Alice
      3 """
----> 4 df2 = pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"])

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
    209     else:
    210         kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
    327         msg.format(arguments=_format_argument_list(allow_args)),
    328         FutureWarning,
    329         stacklevel=find_stack_level(),
    330     )
--> 331 return func(*args, **kwargs)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    935 kwds_defaults = _refine_defaults_read(
    936     dialect,
    937     delimiter,
   (...)
    946     defaults={"delimiter": ","},
    947 )
    948 kwds.update(kwds_defaults)
--> 950 return _read(filepath_or_buffer, kwds)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:611, in _read(filepath_or_buffer, kwds)
    608     return parser
    610 with parser:
--> 611     return parser.read(nrows)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1778, in TextFileReader.read(self, nrows)
   1771 nrows = validate_integer("nrows", nrows)
   1772 try:
   1773     # error: "ParserBase" has no attribute "read"
   1774     (
   1775         index,
   1776         columns,
   1777         col_dict,
-> 1778     ) = self._engine.read(  # type: ignore[attr-defined]
   1779         nrows
   1780     )
   1781 except Exception:
   1782     self.close()

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py:230, in CParserWrapper.read(self, nrows)
    228 try:
    229     if self.low_memory:
--> 230         chunks = self._reader.read_low_memory(nrows)
    231         # destructive to chunks
    232         data = _concatenate_chunks(chunks)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/_libs/parsers.pyx:808, in pandas._libs.parsers.TextReader.read_low_memory()

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/_libs/parsers.pyx:890, in pandas._libs.parsers.TextReader._read_rows()

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/_libs/parsers.pyx:952, in pandas._libs.parsers.TextReader._convert_column_data()

ParserError: Too many columns specified: expected 3 and found 2

Expected Behavior

Even if csv has 1 line, pandas can read csv.

Installed Versions

INSTALLED VERSIONS

commit : 2e218d1
python : 3.11.2.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-60-generic
Version : #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : ja_JP.UTF-8
LOCALE : ja_JP.UTF-8

pandas : 1.5.3
numpy : 1.24.2
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 65.5.0
pip : 23.0.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.11.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

The text was updated successfully, but these errors were encountered:

asishm · 2023-03-13T13:53:37Z

Can't reproduce on main

yuji38kwmt · 2023-03-14T01:24:29Z

Sorry, I had not confirmed on the main branch.

Which pull request solved this bug?

PKNaveen · 2023-03-15T01:51:53Z

In reference to this, Stackoverflow Pandas Parse-errors

Pandas.read csv DocumentationPandas DOC

I did replicate this error, changing the Engine to 'Python' seems to correct this issue. If the engine command is not issued the error occurs and if u specify the C engine or pyarrow the same error occurs. I am not sure why this is so , is the C engine running on default for this read?

yuji38kwmt · 2023-03-18T15:20:21Z

Thansks!

When I specified enginge='python", the error did not occurr.

In [18]: data2="""
    ...: 1,Alice
    ...: """

In [25]: pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"], engine="python")
Out[25]: 
   id   name  country
0   1  Alice      NaN

In [26]: pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"], engine="c")
---------------------------------------------------------------------------
...
ParserError: Too many columns specified: expected 3 and found 2


In [30]: pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"], engine="pyarrow")
---------------------------------------------------------------------------
...
ValueError: Length mismatch: Expected axis has 2 elements, new values have 3 elements

*  pyarrow v11.0.0

I am not sure why this is so , is the C engine running on default for this read?

I'm sorry, but I don't know what the default engine is. How can I confirm the default engine?

PKNaveen · 2023-03-22T17:16:20Z

According to the documentation there's no information regarding what engine is default, but the stack overflow link posted before says C engine is taken as default. This is my assumption that the default is C

lithomas1 · 2023-05-05T10:15:40Z

Going to close, because I can't reproduce on main (currently 2.1 dev for me). You might want to try 2.0/2.0.1 to see if that fixes it.

yuji38kwmt · 2023-05-06T22:39:59Z

I have confirmed that there is no problem in pandas v2.0.1 .

In [82]: import pandas
    ...: from io import StringIO
    ...: 
    ...: data1="""
    ...: 1,Alice
    ...: 2,Bob
    ...: 3,Chris
    ...: """
    ...: df1 = pandas.read_csv(StringIO(data1), header=None, names=["id","name","country"])
    ...: print(df1)
    ...: #    id   name  country
    ...: # 0   1  Alice      NaN
    ...: # 1   2    Bob      NaN
    ...: # 2   3  Chris      NaN
    ...: 
    ...: 
    ...: data2="""
    ...: 1,Alice
    ...: """
    ...: df2 = pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"])
    ...: # ParserError: Too many columns specified: expected 3 and found 2
   id   name  country
0   1  Alice      NaN
1   2    Bob      NaN
2   3  Chris      NaN

In [83]: pandas.__version__
Out[83]: '2.0.1'

yuji38kwmt added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 13, 2023

yuji38kwmt changed the title ~~BUG:~~ BUG: When csv has 1 line, pandas cannot read csv. ParserError occurred. Mar 13, 2023

lithomas1 added IO CSV read_csv, to_csv and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 5, 2023

lithomas1 closed this as completed May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: When csv has 1 line, pandas cannot read csv. `ParserError` occurred. #51924

BUG: When csv has 1 line, pandas cannot read csv. `ParserError` occurred. #51924

yuji38kwmt commented Mar 13, 2023 •

edited

Loading

INSTALLED VERSIONS

asishm commented Mar 13, 2023

yuji38kwmt commented Mar 14, 2023 •

edited

Loading

PKNaveen commented Mar 15, 2023 •

edited

Loading

yuji38kwmt commented Mar 18, 2023 •

edited

Loading

PKNaveen commented Mar 22, 2023

lithomas1 commented May 5, 2023

yuji38kwmt commented May 6, 2023

BUG: When csv has 1 line, pandas cannot read csv. ParserError occurred. #51924

BUG: When csv has 1 line, pandas cannot read csv. ParserError occurred. #51924

Comments

yuji38kwmt commented Mar 13, 2023 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

asishm commented Mar 13, 2023

yuji38kwmt commented Mar 14, 2023 • edited Loading

PKNaveen commented Mar 15, 2023 • edited Loading

yuji38kwmt commented Mar 18, 2023 • edited Loading

PKNaveen commented Mar 22, 2023

lithomas1 commented May 5, 2023

yuji38kwmt commented May 6, 2023

BUG: When csv has 1 line, pandas cannot read csv. `ParserError` occurred. #51924

BUG: When csv has 1 line, pandas cannot read csv. `ParserError` occurred. #51924

yuji38kwmt commented Mar 13, 2023 •

edited

Loading

yuji38kwmt commented Mar 14, 2023 •

edited

Loading

PKNaveen commented Mar 15, 2023 •

edited

Loading

yuji38kwmt commented Mar 18, 2023 •

edited

Loading