read_excel failed with specific configuration #15835

fortooon · 2017-03-29T13:41:23Z

Code Sample to reproduce bug

from pandas import read_excel

pre_kwargs = {
  "sheetname" : "Sheet1",
  "parse_cols" : [0,3],
  "keep_default_na" : False,
  "header" : 0
}

file = "/path_to_file..../test12.xlsx"
header_v = read_excel(file, **pre_kwargs).columns.values

use attached file : test12.xlsx

Problem description

I can't read row as header with empty value cell with number formatting.
Parser just failed.

Traceback (most recent call last):
  File "/home/alexey.lisicyn/testPand.py", line 22, in <module>
    header_v = read_excel(file, **pre_kwargs).columns.values
  File "/tmp/opt/linux-CentOS_4.4-x64/P7/python-2.7.7-dbg/lib/python2.7/site-packages/pandas/io/excel.py", line 170, in read_excel
    skip_footer=skip_footer, converters=converters, **kwds)
  File "/tmp/opt/linux-CentOS_4.4-x64/P7/python-2.7.7-dbg/lib/python2.7/site-packages/pandas/io/excel.py", line 438, in _parse_excel
    output[asheetname] = parser.read()
  File "/tmp/opt/linux-CentOS_4.4-x64/P7/python-2.7.7-dbg/lib/python2.7/site-packages/pandas/io/parsers.py", line 747, in read
    ret = self._engine.read(nrows)
  File "/tmp/opt/linux-CentOS_4.4-x64/P7/python-2.7.7-dbg/lib/python2.7/site-packages/pandas/io/parsers.py", line 1611, in read
    index, columns = self._make_index(data, alldata, columns, indexnamerow)
  File "/tmp/opt/linux-CentOS_4.4-x64/P7/python-2.7.7-dbg/lib/python2.7/site-packages/pandas/io/parsers.py", line 920, in _make_index
    index = self._agg_index(index)
  File "/tmp/opt/linux-CentOS_4.4-x64/P7/python-2.7.7-dbg/lib/python2.7/site-packages/pandas/io/parsers.py", line 1012, in _agg_index
    arr, _ = self._convert_types(arr, col_na_values | col_na_fvalues)
TypeError: unsupported operand type(s) for |: 'list' and 'set'

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 2.7.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-66-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: None
pip: 7.0.3
setuptools: 1.1.6
Cython: None
numpy: 1.7.1
scipy: 0.14.0
statsmodels: None
IPython: 0.13.2
sphinx: None
patsy: None
dateutil: 2.3
pytz: 2014.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: 2.3.0
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None

The text was updated successfully, but these errors were encountered:

chris-b1 · 2017-03-29T17:51:04Z

keep_default_na for some reason doesn't seem to work with read_excel, suppose it should. Passing something to na_values is a workaround, PR to fix welcome.

pre_kwargs = {
  "sheetname" : "Sheet1",
  "parse_cols" : [0,3],
  "keep_default_na" : False,
  "na_values" : set(), # addition
  "header" : 0
}

f = 'https://github.com/pandas-dev/pandas/files/879107/test12.xlsx'

pd.read_excel(f, **pre_kwargs)
Out[87]: 
   1
0  1

gfyoung · 2017-04-03T17:44:40Z

@fortooon : Nice catch! You just happened to hit a corner case in parsing:

>>> from pandas import read_csv
>>> from pandas.compat import StringIO
>>>
>>> data = "a,1\b,2"
>>> read_csv(StringIO(data), keep_default_na=False, index_col=0)
...
TypeError: unsupported operand type(s) for |: 'list' and 'set'

Note that this is using the C engine (also breaks with the Python engine). PR coming soon to patch this.

@jreback : This is not compat issue, just a plain bug.

@chris-b1 : This is not a read_excel bug. In fact, propagation is just fine.

Closes pandas-devgh-15835

chris-b1 added Difficulty Novice IO Excel read_excel, to_excel labels Mar 29, 2017

chris-b1 added this to the Next Major Release milestone Mar 29, 2017

jreback added the Compat pandas objects compatability with Numpy or Python functions label Mar 29, 2017

gfyoung added a commit to forking-repos/pandas that referenced this issue Apr 3, 2017

BUG: Patch handling no NA values in TextFileReader

ff7eb61

Closes pandas-devgh-15835

gfyoung mentioned this issue Apr 3, 2017

BUG: Patch handling no NA values in TextFileReader #15881

Closed

jreback modified the milestones: 0.20.0, Next Major Release Apr 3, 2017

jreback added Bug IO CSV read_csv, to_csv and removed Compat pandas objects compatability with Numpy or Python functions labels Apr 3, 2017

gfyoung added a commit to forking-repos/pandas that referenced this issue Apr 3, 2017

BUG: Patch handling no NA values in TextFileReader

0bb6f64

Closes pandas-devgh-15835

jreback closed this as completed in ff652a5 Apr 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_excel failed with specific configuration #15835

read_excel failed with specific configuration #15835

fortooon commented Mar 29, 2017

INSTALLED VERSIONS

chris-b1 commented Mar 29, 2017

gfyoung commented Apr 3, 2017 •

edited

Loading

read_excel failed with specific configuration #15835

read_excel failed with specific configuration #15835

Comments

fortooon commented Mar 29, 2017

Code Sample to reproduce bug

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

chris-b1 commented Mar 29, 2017

gfyoung commented Apr 3, 2017 • edited Loading

Output of `pd.show_versions()`

gfyoung commented Apr 3, 2017 •

edited

Loading