Skip to content

BUG: When csv has 1 line, pandas cannot read csv. ParserError occurred. #51924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
yuji38kwmt opened this issue Mar 13, 2023 · 7 comments
Closed
2 of 3 tasks
Labels
Bug IO CSV read_csv, to_csv

Comments

@yuji38kwmt
Copy link

yuji38kwmt commented Mar 13, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas
from io import StringIO

data1="""
1,Alice
2,Bob
3,Chris
"""
df1 = pandas.read_csv(StringIO(data1), header=None, names=["id","name","country"])
print(df1)
#    id   name  country
# 0   1  Alice      NaN
# 1   2    Bob      NaN
# 2   3  Chris      NaN


data2="""
1,Alice
"""
df2 = pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"])
# ParserError: Too many columns specified: expected 3 and found 2

Issue Description

I want to read csv which does not contain header line.

When csv has 3 lines, pandas can read csv.
But csv has 1 line, pandas cannot read csv. The following error occurred.

ParserError                               Traceback (most recent call last)
Cell In[33], line 4
      1 data2="""
      2 1,Alice
      3 """
----> 4 df2 = pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"])

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
    209     else:
    210         kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
    327         msg.format(arguments=_format_argument_list(allow_args)),
    328         FutureWarning,
    329         stacklevel=find_stack_level(),
    330     )
--> 331 return func(*args, **kwargs)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    935 kwds_defaults = _refine_defaults_read(
    936     dialect,
    937     delimiter,
   (...)
    946     defaults={"delimiter": ","},
    947 )
    948 kwds.update(kwds_defaults)
--> 950 return _read(filepath_or_buffer, kwds)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:611, in _read(filepath_or_buffer, kwds)
    608     return parser
    610 with parser:
--> 611     return parser.read(nrows)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1778, in TextFileReader.read(self, nrows)
   1771 nrows = validate_integer("nrows", nrows)
   1772 try:
   1773     # error: "ParserBase" has no attribute "read"
   1774     (
   1775         index,
   1776         columns,
   1777         col_dict,
-> 1778     ) = self._engine.read(  # type: ignore[attr-defined]
   1779         nrows
   1780     )
   1781 except Exception:
   1782     self.close()

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py:230, in CParserWrapper.read(self, nrows)
    228 try:
    229     if self.low_memory:
--> 230         chunks = self._reader.read_low_memory(nrows)
    231         # destructive to chunks
    232         data = _concatenate_chunks(chunks)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/_libs/parsers.pyx:808, in pandas._libs.parsers.TextReader.read_low_memory()

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/_libs/parsers.pyx:890, in pandas._libs.parsers.TextReader._read_rows()

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/_libs/parsers.pyx:952, in pandas._libs.parsers.TextReader._convert_column_data()

ParserError: Too many columns specified: expected 3 and found 2

Expected Behavior

Even if csv has 1 line, pandas can read csv.

Installed Versions

INSTALLED VERSIONS

commit : 2e218d1
python : 3.11.2.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-60-generic
Version : #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : ja_JP.UTF-8
LOCALE : ja_JP.UTF-8

pandas : 1.5.3
numpy : 1.24.2
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 65.5.0
pip : 23.0.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.11.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

@yuji38kwmt yuji38kwmt added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 13, 2023
@yuji38kwmt yuji38kwmt changed the title BUG: BUG: When csv has 1 line, pandas cannot read csv. ParserError occurred. Mar 13, 2023
@asishm
Copy link
Contributor

asishm commented Mar 13, 2023

Can't reproduce on main

@yuji38kwmt
Copy link
Author

yuji38kwmt commented Mar 14, 2023

Sorry, I had not confirmed on the main branch.

Which pull request solved this bug?

@PKNaveen
Copy link

PKNaveen commented Mar 15, 2023

In reference to this, Stackoverflow Pandas Parse-errors

Pandas.read csv DocumentationPandas DOC

I did replicate this error, changing the Engine to 'Python' seems to correct this issue. If the engine command is not issued the error occurs and if u specify the C engine or pyarrow the same error occurs. I am not sure why this is so , is the C engine running on default for this read?

@yuji38kwmt
Copy link
Author

yuji38kwmt commented Mar 18, 2023

Thansks!

When I specified enginge='python", the error did not occurr.

In [18]: data2="""
    ...: 1,Alice
    ...: """

In [25]: pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"], engine="python")
Out[25]: 
   id   name  country
0   1  Alice      NaN

In [26]: pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"], engine="c")
---------------------------------------------------------------------------
...
ParserError: Too many columns specified: expected 3 and found 2


In [30]: pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"], engine="pyarrow")
---------------------------------------------------------------------------
...
ValueError: Length mismatch: Expected axis has 2 elements, new values have 3 elements
*  pyarrow v11.0.0

I am not sure why this is so , is the C engine running on default for this read?

I'm sorry, but I don't know what the default engine is. How can I confirm the default engine?

@PKNaveen
Copy link

According to the documentation there's no information regarding what engine is default, but the stack overflow link posted before says C engine is taken as default. This is my assumption that the default is C

@lithomas1 lithomas1 added IO CSV read_csv, to_csv and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 5, 2023
@lithomas1
Copy link
Member

Going to close, because I can't reproduce on main (currently 2.1 dev for me). You might want to try 2.0/2.0.1 to see if that fixes it.

@yuji38kwmt
Copy link
Author

I have confirmed that there is no problem in pandas v2.0.1 .

In [82]: import pandas
    ...: from io import StringIO
    ...: 
    ...: data1="""
    ...: 1,Alice
    ...: 2,Bob
    ...: 3,Chris
    ...: """
    ...: df1 = pandas.read_csv(StringIO(data1), header=None, names=["id","name","country"])
    ...: print(df1)
    ...: #    id   name  country
    ...: # 0   1  Alice      NaN
    ...: # 1   2    Bob      NaN
    ...: # 2   3  Chris      NaN
    ...: 
    ...: 
    ...: data2="""
    ...: 1,Alice
    ...: """
    ...: df2 = pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"])
    ...: # ParserError: Too many columns specified: expected 3 and found 2
   id   name  country
0   1  Alice      NaN
1   2    Bob      NaN
2   3  Chris      NaN

In [83]: pandas.__version__
Out[83]: '2.0.1'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

4 participants