Skip to content

read_excel failed with specific configuration #15835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fortooon opened this issue Mar 29, 2017 · 2 comments
Closed

read_excel failed with specific configuration #15835

fortooon opened this issue Mar 29, 2017 · 2 comments
Labels
Bug IO CSV read_csv, to_csv IO Excel read_excel, to_excel
Milestone

Comments

@fortooon
Copy link

Code Sample to reproduce bug

from pandas import read_excel

pre_kwargs = {
  "sheetname" : "Sheet1",
  "parse_cols" : [0,3],
  "keep_default_na" : False,
  "header" : 0
}

file = "/path_to_file..../test12.xlsx"
header_v = read_excel(file, **pre_kwargs).columns.values

use attached file : test12.xlsx

Problem description

I can't read row as header with empty value cell with number formatting.
Parser just failed.

Traceback (most recent call last):
  File "/home/alexey.lisicyn/testPand.py", line 22, in <module>
    header_v = read_excel(file, **pre_kwargs).columns.values
  File "/tmp/opt/linux-CentOS_4.4-x64/P7/python-2.7.7-dbg/lib/python2.7/site-packages/pandas/io/excel.py", line 170, in read_excel
    skip_footer=skip_footer, converters=converters, **kwds)
  File "/tmp/opt/linux-CentOS_4.4-x64/P7/python-2.7.7-dbg/lib/python2.7/site-packages/pandas/io/excel.py", line 438, in _parse_excel
    output[asheetname] = parser.read()
  File "/tmp/opt/linux-CentOS_4.4-x64/P7/python-2.7.7-dbg/lib/python2.7/site-packages/pandas/io/parsers.py", line 747, in read
    ret = self._engine.read(nrows)
  File "/tmp/opt/linux-CentOS_4.4-x64/P7/python-2.7.7-dbg/lib/python2.7/site-packages/pandas/io/parsers.py", line 1611, in read
    index, columns = self._make_index(data, alldata, columns, indexnamerow)
  File "/tmp/opt/linux-CentOS_4.4-x64/P7/python-2.7.7-dbg/lib/python2.7/site-packages/pandas/io/parsers.py", line 920, in _make_index
    index = self._agg_index(index)
  File "/tmp/opt/linux-CentOS_4.4-x64/P7/python-2.7.7-dbg/lib/python2.7/site-packages/pandas/io/parsers.py", line 1012, in _agg_index
    arr, _ = self._convert_types(arr, col_na_values | col_na_fvalues)
TypeError: unsupported operand type(s) for |: 'list' and 'set'

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-66-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: None
pip: 7.0.3
setuptools: 1.1.6
Cython: None
numpy: 1.7.1
scipy: 0.14.0
statsmodels: None
IPython: 0.13.2
sphinx: None
patsy: None
dateutil: 2.3
pytz: 2014.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: 2.3.0
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None

@chris-b1
Copy link
Contributor

keep_default_na for some reason doesn't seem to work with read_excel, suppose it should. Passing something to na_values is a workaround, PR to fix welcome.

pre_kwargs = {
  "sheetname" : "Sheet1",
  "parse_cols" : [0,3],
  "keep_default_na" : False,
  "na_values" : set(), # addition
  "header" : 0
}

f = 'https://github.com/pandas-dev/pandas/files/879107/test12.xlsx'

pd.read_excel(f, **pre_kwargs)
Out[87]: 
   1
0  1

@chris-b1 chris-b1 added Difficulty Novice IO Excel read_excel, to_excel labels Mar 29, 2017
@chris-b1 chris-b1 added this to the Next Major Release milestone Mar 29, 2017
@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Mar 29, 2017
@gfyoung
Copy link
Member

gfyoung commented Apr 3, 2017

@fortooon : Nice catch! You just happened to hit a corner case in parsing:

>>> from pandas import read_csv
>>> from pandas.compat import StringIO
>>>
>>> data = "a,1\b,2"
>>> read_csv(StringIO(data), keep_default_na=False, index_col=0)
...
TypeError: unsupported operand type(s) for |: 'list' and 'set'

Note that this is using the C engine (also breaks with the Python engine). PR coming soon to patch this.

@jreback : This is not compat issue, just a plain bug.

@chris-b1 : This is not a read_excel bug. In fact, propagation is just fine.

gfyoung added a commit to forking-repos/pandas that referenced this issue Apr 3, 2017
@jreback jreback modified the milestones: 0.20.0, Next Major Release Apr 3, 2017
@jreback jreback added Bug IO CSV read_csv, to_csv and removed Compat pandas objects compatability with Numpy or Python functions labels Apr 3, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Apr 3, 2017
@jreback jreback closed this as completed in ff652a5 Apr 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

4 participants