read_csv errors when low_memory=True, index_col is not None, and nrows=0 #21141

pschafhalter · 2018-05-20T05:03:39Z

Code Sample

import pandas as pd
from pandas.compat import StringIO

data = """A,B,C,D,E
2000-01-03 00:00:00,0.980268513777,3.68573087906,-0.364216805298,-1.15973806169,foo
2000-01-04 00:00:00,1.04791624281,-0.0412318367011,-0.16181208307,0.212549316967,bar
2000-01-05 00:00:00,0.498580885705,0.731167677815,-0.537677223318,1.34627041952,baz
2000-01-06 00:00:00,1.12020151869,1.56762092543,0.00364077397681,0.67525259227,qux
2000-01-07 00:00:00,-0.487094399463,0.571454623474,-1.6116394093,0.103468562917,foo2"""

df = pd.read_csv(StringIO(data), low_memory=True, index_col=0, nrows=0)

Problem description

The above code results in TypeError: 'NoneType' object is not iterable. read_csv behaves correctly if low_memory=False, index_col=None or nrows>0.

Traceback:

Traceback (most recent call last):
  File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 1848, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._read_low_memory
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pd_bug.py", line 12, in <module>
    df = pd.read_csv(StringIO(data), low_memory=True, index_col=0, nrows=0)
  File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 446, in _read
    data = parser.read(nrows)
  File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 1036, in read
    ret = self._engine.read(nrows)
  File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 1855, in read
    dtype=self.kwds.get('dtype'))
  File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 3215, in _get_empty_meta
    data = [Series([], dtype=dtype[name]) for name in index_names]
TypeError: 'NoneType' object is not iterable

Expected Output

No error.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-36-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: 3.3.1
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.3
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.1.1
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: 4.2.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

chris-b1 · 2018-05-20T14:37:03Z

Sure, this is obviously a bit of a corner case, but PR to fix welcome!

jorisvandenbossche · 2018-06-06T13:23:39Z

(removed the milestone as this is not a priority IMO)

gfyoung · 2018-06-11T21:50:29Z

In light of the conversation happening in #21176, I'm more for just disallowing nrows=0.

* nrows = 0 * low_memory=True * index_col != None Closes pandas-devgh-21141

Closes gh-21141

Closes gh-21141 (cherry picked from commit c2da06c)

Closes pandas-devgh-21141

chris-b1 added Bug IO CSV read_csv, to_csv Effort Low good first issue labels May 20, 2018

chris-b1 added this to the Next Major Release milestone May 20, 2018

r00ta mentioned this issue May 22, 2018

BUG: read_csv with specified kwargs #21176

Merged

1 task

jreback modified the milestones: Next Major Release, 0.23.1 May 23, 2018

jorisvandenbossche modified the milestones: 0.23.1, Next Major Release Jun 6, 2018

cgopalan pushed a commit to cgopalan/pandas that referenced this issue Jun 11, 2018

BUG: Nrows cannot be zero for read_csv. Fixes pandas-dev#21141

42ed1a8

cgopalan added a commit to cgopalan/pandas that referenced this issue Jun 11, 2018

BUG: Nrows cannot be zero for read_csv. Fixes pandas-dev#21141

c22f6d4

cgopalan mentioned this issue Jun 11, 2018

BUG: Nrows cannot be zero for read_csv. Fixes #21141 #21431

Closed

4 tasks

gfyoung pushed a commit to r00ta/pandas that referenced this issue Jun 16, 2018

BUG: Handle read_csv corner case

a99a122

* nrows = 0 * low_memory=True * index_col != None Closes pandas-devgh-21141

jreback modified the milestones: Next Major Release, 0.23.2 Jun 19, 2018

gfyoung pushed a commit to r00ta/pandas that referenced this issue Jun 19, 2018

BUG: Handle read_csv corner case

eedd45e

* nrows = 0 * low_memory=True * index_col != None Closes pandas-devgh-21141

jreback closed this as completed in #21176 Jun 19, 2018

jreback pushed a commit that referenced this issue Jun 19, 2018

BUG: Handle read_csv corner case (#21176)

c2da06c

Closes gh-21141

jorisvandenbossche pushed a commit that referenced this issue Jun 29, 2018

BUG: Handle read_csv corner case (#21176)

930617c

Closes gh-21141 (cherry picked from commit c2da06c)

jorisvandenbossche pushed a commit that referenced this issue Jul 2, 2018

BUG: Handle read_csv corner case (#21176)

030a058

Closes gh-21141 (cherry picked from commit c2da06c)

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this issue Oct 1, 2018

BUG: Handle read_csv corner case (pandas-dev#21176)

20a430a

Closes pandas-devgh-21141

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_csv errors when low_memory=True, index_col is not None, and nrows=0 #21141

read_csv errors when low_memory=True, index_col is not None, and nrows=0 #21141

pschafhalter commented May 20, 2018 •

edited by gfyoung

Loading

INSTALLED VERSIONS

chris-b1 commented May 20, 2018

jorisvandenbossche commented Jun 6, 2018

gfyoung commented Jun 11, 2018

read_csv errors when low_memory=True, index_col is not None, and nrows=0 #21141

read_csv errors when low_memory=True, index_col is not None, and nrows=0 #21141

Comments

pschafhalter commented May 20, 2018 • edited by gfyoung Loading

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

chris-b1 commented May 20, 2018

jorisvandenbossche commented Jun 6, 2018

gfyoung commented Jun 11, 2018

pschafhalter commented May 20, 2018 •

edited by gfyoung

Loading

Output of `pd.show_versions()`