Skip to content

BUG: Cannot read XLSX files with xlrd version 2.0.0 #38410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
keiv-fly opened this issue Dec 11, 2020 · 6 comments
Closed

BUG: Cannot read XLSX files with xlrd version 2.0.0 #38410

keiv-fly opened this issue Dec 11, 2020 · 6 comments
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@keiv-fly
Copy link

Code that fails with xlrd==2.0.0:

pd.read_excel("file.xlsx")
raises the error: XLRDError: Excel xlsx file; not supported

The documentation for xlrd==2.0.0 says that it does not support any other files other than "xls"
xlrd updated their version today on 2020-12-11

/local_disk0/.ephemeral_nfs/envs/pythonEnv-1e568db2-af63-4098-8ee9-a8375bd77616/lib/python3.7/site-packages/pandas/io/excel/_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
    302 
    303     if not isinstance(io, ExcelFile):
--> 304         io = ExcelFile(io, engine=engine)
    305     elif engine and engine != io.engine:
    306         raise ValueError(

/local_disk0/.ephemeral_nfs/envs/pythonEnv-1e568db2-af63-4098-8ee9-a8375bd77616/lib/python3.7/site-packages/pandas/io/excel/_base.py in __init__(self, io, engine)
    819         self._io = stringify_path(io)
    820 
--> 821         self._reader = self._engines[engine](self._io)
    822 
    823     def __fspath__(self):

/local_disk0/.ephemeral_nfs/envs/pythonEnv-1e568db2-af63-4098-8ee9-a8375bd77616/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py in __init__(self, filepath_or_buffer)
     19         err_msg = "Install xlrd >= 1.0.0 for Excel support"
     20         import_optional_dependency("xlrd", extra=err_msg)
---> 21         super().__init__(filepath_or_buffer)
     22 
     23     @property

/local_disk0/.ephemeral_nfs/envs/pythonEnv-1e568db2-af63-4098-8ee9-a8375bd77616/lib/python3.7/site-packages/pandas/io/excel/_base.py in __init__(self, filepath_or_buffer)
    351             self.book = self.load_workbook(filepath_or_buffer)
    352         elif isinstance(filepath_or_buffer, str):
--> 353             self.book = self.load_workbook(filepath_or_buffer)
    354         elif isinstance(filepath_or_buffer, bytes):
    355             self.book = self.load_workbook(BytesIO(filepath_or_buffer))

/local_disk0/.ephemeral_nfs/envs/pythonEnv-1e568db2-af63-4098-8ee9-a8375bd77616/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py in load_workbook(self, filepath_or_buffer)
     34             return open_workbook(file_contents=data)
     35         else:
---> 36             return open_workbook(filepath_or_buffer)
     37 
     38     @property

/local_disk0/.ephemeral_nfs/envs/pythonEnv-1e568db2-af63-4098-8ee9-a8375bd77616/lib/python3.7/site-packages/xlrd/__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows, ignore_workbook_corruption)
    168     # files that xlrd can parse don't start with the expected signature.
    169     if file_format and file_format != 'xls':
--> 170         raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
    171 
    172     bk = open_workbook_xls(

XLRDError: Excel xlsx file; not supported
@keiv-fly keiv-fly added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 11, 2020
@jreback
Copy link
Contributor

jreback commented Dec 11, 2020

as the erorr message indicates xlrd no longer supports this, you can downgrade.

@hansgilead
Copy link

jreback unfortunately didn't really address the issue here, which is that pd.read_excel("file.xlsx") is supposed to use openpyxl for files that are identified as .xlsx but in this instance it is incorrectly trying to use xlrd. this looks like it really is a bug in pd.read_excel for which a workaround is to specify engine="openpyxl" in the call. adding that parameter solved this issue for me.

try this (though it should not be necessary it might fix it for you):

pd.read_excel("file.xlsx", engine="openpyxl")

from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

engine : str, default None
If io is not a buffer or path, this must be set to identify io. Supported engines: “xlrd”, “openpyxl”, “odf”, “pyxlsb”. Engine compatibility :

  • “xlrd” supports old-style Excel files (.xls).
  • “openpyxl” supports newer Excel file formats.
  • “odf” supports OpenDocument file formats (.odf, .ods, .odt).
  • “pyxlsb” supports Binary Excel files.

Changed in version 1.2.0: The engine xlrd now only supports old-style .xls files. When engine=None, the following logic will be used to determine the engine:

  • If path_or_buffer is an OpenDocument format (.odf, .ods, .odt), then odf will be used.
  • Otherwise if path_or_buffer is an xls format, xlrd will be used.
  • Otherwise if openpyxl is installed, then openpyxl will be used.
  • Otherwise if xlrd >= 2.0 is installed, a ValueError will be raised.
  • Otherwise xlrd will be used and a FutureWarning will be raised. This case will raise a ValueError in a future version of pandas.

@jreback
Copy link
Contributor

jreback commented Dec 31, 2020

@keiv-fly you would have to open with a reproducible example and version info

@jreback
Copy link
Contributor

jreback commented Dec 31, 2020

sorry @hansgilead

@keiv-fly
Copy link
Author

@keiv-fly you would have to open with a reproducible example and version info

A reproducible example with the latest stable pandas and xlrd:

import pandas as pd
pd.read_excel("file.xlsx")

file.xlsx is a new completely empty excel sheet.

Error:

---------------------------------------------------------------------------
XLRDError                                 Traceback (most recent call last)
<ipython-input-1-4c247bba163b> in <module>
      1 import pandas as pd
----> 2 pd.read_excel("file.xlsx")

~\anaconda3\envs\py38\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    294                 )
    295                 warnings.warn(msg, FutureWarning, stacklevel=stacklevel)
--> 296             return func(*args, **kwargs)
    297 
    298         return wrapper

~\anaconda3\envs\py38\lib\site-packages\pandas\io\excel\_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols)
    302 
    303     if not isinstance(io, ExcelFile):
--> 304         io = ExcelFile(io, engine=engine)
    305     elif engine and engine != io.engine:
    306         raise ValueError(

~\anaconda3\envs\py38\lib\site-packages\pandas\io\excel\_base.py in __init__(self, path_or_buffer, engine)
    865         self._io = stringify_path(path_or_buffer)
    866 
--> 867         self._reader = self._engines[engine](self._io)
    868 
    869     def __fspath__(self):

~\anaconda3\envs\py38\lib\site-packages\pandas\io\excel\_xlrd.py in __init__(self, filepath_or_buffer)
     20         err_msg = "Install xlrd >= 1.0.0 for Excel support"
     21         import_optional_dependency("xlrd", extra=err_msg)
---> 22         super().__init__(filepath_or_buffer)
     23 
     24     @property

~\anaconda3\envs\py38\lib\site-packages\pandas\io\excel\_base.py in __init__(self, filepath_or_buffer)
    351             self.book = self.load_workbook(filepath_or_buffer)
    352         elif isinstance(filepath_or_buffer, str):
--> 353             self.book = self.load_workbook(filepath_or_buffer)
    354         elif isinstance(filepath_or_buffer, bytes):
    355             self.book = self.load_workbook(BytesIO(filepath_or_buffer))

~\anaconda3\envs\py38\lib\site-packages\pandas\io\excel\_xlrd.py in load_workbook(self, filepath_or_buffer)
     35             return open_workbook(file_contents=data)
     36         else:
---> 37             return open_workbook(filepath_or_buffer)
     38 
     39     @property

~\anaconda3\envs\py38\lib\site-packages\xlrd\__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows, ignore_workbook_corruption)
    168     # files that xlrd can parse don't start with the expected signature.
    169     if file_format and file_format != 'xls':
--> 170         raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
    171 
    172     bk = open_workbook_xls(

XLRDError: Excel xlsx file; not supported

pip freeze:

The most important:

pandas==1.2.0
xlrd==2.0.1 (pip freeze gives a directory, but the version is this one)
openpyxl==3.0.5 (pip freeze gives a directory, but the version is this one)

All output:

alabaster==0.7.12
anaconda-client==1.7.2
anaconda-project==0.8.3
appdirs==1.4.4
argh==0.26.2
argon2-cffi @ file:///C:/ci/argon2-cffi_1596828585465/work
asn1crypto @ file:///tmp/build/80754af9/asn1crypto_1596577642040/work
astroid @ file:///C:/ci/astroid_1592487315634/work
astropy @ file:///C:/ci/astropy_1606904903785/work
async-generator==1.10
atomicwrites==1.4.0
attrs @ file:///tmp/build/80754af9/attrs_1604765588209/work
autopep8 @ file:///tmp/build/80754af9/autopep8_1596578164842/work
Babel @ file:///tmp/build/80754af9/babel_1607110387436/work
backcall==0.2.0
backports.shutil-get-terminal-size @ file:///tmp/build/80754af9/backports.shutil_get_terminal_size_1608222128777/work
bcrypt @ file:///C:/ci/bcrypt_1597936263757/work
beautifulsoup4 @ file:///tmp/build/80754af9/beautifulsoup4_1601924105527/work
bitarray @ file:///C:/ci/bitarray_1605065210072/work
bkcharts==0.2
black==19.10b0
bleach @ file:///tmp/build/80754af9/bleach_1600439572647/work
bokeh @ file:///C:/ci/bokeh_1603297934731/work
boto==2.49.0
Bottleneck==1.3.2
brotlipy==0.7.0
certifi==2020.12.5
cffi @ file:///C:/ci/cffi_1606255208697/work
chardet @ file:///C:/ci/chardet_1607690654534/work
click==7.1.2
cloudpickle @ file:///tmp/build/80754af9/cloudpickle_1598884132938/work
clyent==1.2.2
colorama @ file:///tmp/build/80754af9/colorama_1607707115595/work
comtypes==1.1.8
contextlib2==0.6.0.post1
cryptography @ file:///C:/ci/cryptography_1607637849569/work
cycler==0.10.0
Cython @ file:///C:/ci/cython_1605457704149/work
cytoolz==0.11.0
dask @ file:///tmp/build/80754af9/dask-core_1607706933335/work
decorator==4.4.2
defusedxml==0.6.0
diff-match-patch @ file:///tmp/build/80754af9/diff-match-patch_1594828741838/work
distributed @ file:///C:/ci/distributed_1607714134337/work
docutils==0.16
entrypoints==0.3
et-xmlfile==1.0.1
fastcache==1.1.0
filelock==3.0.12
flake8 @ file:///tmp/build/80754af9/flake8_1601911421857/work
Flask==1.1.2
fsspec @ file:///tmp/build/80754af9/fsspec_1602684995936/work
future==0.18.2
gevent @ file:///C:/ci/gevent_1609093188435/work
glob2==0.7
gmpy2==2.0.8
greenlet @ file:///C:/ci/greenlet_1600885448389/work
h5py==2.10.0
HeapDict==1.0.1
html5lib @ file:///tmp/build/80754af9/html5lib_1593446221756/work
idna @ file:///tmp/build/80754af9/idna_1593446292537/work
imagecodecs @ file:///C:/ci/imagecodecs_1603271628972/work
imageio @ file:///tmp/build/80754af9/imageio_1594161405741/work
imagesize==1.2.0
importlib-metadata @ file:///tmp/build/80754af9/importlib-metadata_1602276842396/work
iniconfig @ file:///tmp/build/80754af9/iniconfig_1602780191262/work
intervaltree @ file:///tmp/build/80754af9/intervaltree_1598376443606/work
ipykernel @ file:///C:/ci/ipykernel_1596190155316/work/dist/ipykernel-5.3.4-py3-none-any.whl
ipython @ file:///C:/ci/ipython_1604083276484/work
ipython-genutils @ file:///tmp/build/80754af9/ipython_genutils_1606773439826/work
ipywidgets @ file:///tmp/build/80754af9/ipywidgets_1608834360725/work
isort @ file:///tmp/build/80754af9/isort_1602603989581/work
itsdangerous==1.1.0
jdcal==1.4.1
jedi @ file:///C:/ci/jedi_1606914528444/work
Jinja2==2.11.2
joblib @ file:///tmp/build/80754af9/joblib_1607970656719/work
json5==0.9.5
jsonschema @ file:///tmp/build/80754af9/jsonschema_1602607155483/work
jupyter==1.0.0
jupyter-client @ file:///tmp/build/80754af9/jupyter_client_1601311786391/work
jupyter-console @ file:///tmp/build/80754af9/jupyter_console_1598884538475/work
jupyter-core @ file:///C:/ci/jupyter_core_1606149073702/work
jupyterlab==2.2.6
jupyterlab-pygments @ file:///tmp/build/80754af9/jupyterlab_pygments_1601490720602/work
jupyterlab-server @ file:///tmp/build/80754af9/jupyterlab_server_1594164409481/work
keyring @ file:///C:/ci/keyring_1608756176625/work
kiwisolver @ file:///C:/ci/kiwisolver_1604014703538/work
lazy-object-proxy @ file:///C:/b/work
libarchive-c==2.9
line-profiler==2.1.2
llvmlite==0.34.0
locket==0.2.0
lxml @ file:///C:/ci/lxml_1606498900999/work
MarkupSafe==1.1.1
matplotlib==3.3.3
mccabe==0.6.1
menuinst==1.4.16
mistune==0.8.4
mkl-fft==1.2.0
mkl-random==1.1.1
mkl-service==2.3.0
mock @ file:///tmp/build/80754af9/mock_1607622725907/work
more-itertools @ file:///tmp/build/80754af9/more-itertools_1605111547926/work
mpmath==1.1.0
msgpack @ file:///C:/ci/msgpack-python_1607965861637/work
multipledispatch==0.6.0
mypy-extensions==0.4.3
nbclient @ file:///tmp/build/80754af9/nbclient_1602783176460/work
nbconvert @ file:///C:/ci/nbconvert_1601914925608/work
nbformat @ file:///tmp/build/80754af9/nbformat_1602783287752/work
nest-asyncio @ file:///tmp/build/80754af9/nest-asyncio_1606153767164/work
networkx @ file:///tmp/build/80754af9/networkx_1598376031484/work
nltk @ file:///tmp/build/80754af9/nltk_1592496090529/work
nose @ file:///tmp/build/80754af9/nose_1606773131901/work
notebook @ file:///C:/ci/notebook_1608817539935/work
numba==0.51.2
numexpr @ file:///C:/ci/numexpr_1607675674463/work
numpy @ file:///C:/ci/numpy_and_numpy_base_1603466732592/work
numpydoc @ file:///tmp/build/80754af9/numpydoc_1605117425582/work
olefile==0.46
openpyxl @ file:///tmp/build/80754af9/openpyxl_1598113097404/work
packaging @ file:///tmp/build/80754af9/packaging_1607971725249/work
pandas==1.2.0
pandocfilters @ file:///C:/ci/pandocfilters_1605102497129/work
paramiko @ file:///tmp/build/80754af9/paramiko_1598886428689/work
parso==0.7.0
partd==1.1.0
path @ file:///C:/ci/path_1607537314965/work
pathlib2 @ file:///C:/ci/pathlib2_1607025069150/work
pathspec==0.7.0
pathtools==0.1.2
patsy==0.5.1
pep8==1.7.1
pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work
pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work
Pillow @ file:///C:/ci/pillow_1603823068645/work
pkginfo==1.6.1
plotly==4.14.1
pluggy==0.13.1
ply==3.11
prometheus-client @ file:///tmp/build/80754af9/prometheus_client_1606344362066/work
prompt-toolkit @ file:///tmp/build/80754af9/prompt-toolkit_1602688806899/work
psutil @ file:///C:/ci/psutil_1598370330503/work
ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1605560620615/work/dist/ptyprocess-0.6.0-py2.py3-none-any.whl
py @ file:///tmp/build/80754af9/py_1607971587848/work
pyarrow==1.0.1
pycodestyle==2.6.0
pycosat==0.6.3
pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work
pycurl==7.43.0.6
pydocstyle @ file:///tmp/build/80754af9/pydocstyle_1598885001695/work
pyerfa @ file:///C:/ci/pyerfa_1606860299350/work
pyflakes==2.2.0
Pygments @ file:///tmp/build/80754af9/pygments_1607368905949/work
pylint @ file:///C:/ci/pylint_1598617153160/work
pyls-black @ file:///tmp/build/80754af9/pyls-black_1607553132291/work
pyls-spyder @ file:///tmp/build/80754af9/pyls-spyder_1608134179673/work
PyNaCl @ file:///C:/ci/pynacl_1595000047588/work
pyodbc===4.0.0-unsupported
pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1608057966937/work
pyparsing==2.4.7
pyreadline==2.1
pyrsistent @ file:///C:/ci/pyrsistent_1600141795814/work
PySocks @ file:///C:/ci/pysocks_1605287845585/work
pytest==0.0.0
python-dateutil==2.8.1
python-jsonrpc-server @ file:///tmp/build/80754af9/python-jsonrpc-server_1600278539111/work
python-language-server @ file:///tmp/build/80754af9/python-language-server_1607972495879/work
pytz @ file:///tmp/build/80754af9/pytz_1608922264688/work
PyWavelets @ file:///C:/ci/pywavelets_1601658407916/work
pywin32==227
pywin32-ctypes==0.2.0
pywinpty==0.5.7
PyYAML==5.3.1
pyzmq==20.0.0
QDarkStyle==2.8.1
QtAwesome @ file:///tmp/build/80754af9/qtawesome_1602272867890/work
qtconsole @ file:///tmp/build/80754af9/qtconsole_1600870028330/work
QtPy==1.9.0
regex @ file:///C:/ci/regex_1606691183008/work
requests @ file:///tmp/build/80754af9/requests_1608241421344/work
retrying==1.3.3
rope @ file:///tmp/build/80754af9/rope_1602264064449/work
rstcheck==3.3.1
Rtree==0.9.4
ruamel-yaml==0.15.87
scikit-image==0.17.2
scikit-learn @ file:///C:/ci/scikit-learn_1598377018496/work
scipy @ file:///C:/ci/scipy_1604596260408/work
seaborn @ file:///tmp/build/80754af9/seaborn_1608578541026/work
selenium==3.141.0
Send2Trash @ file:///tmp/build/80754af9/send2trash_1607525499227/work
simplegeneric==0.8.1
singledispatch @ file:///tmp/build/80754af9/singledispatch_1602523705405/work
sip==4.19.13
six @ file:///C:/ci/six_1605187374963/work
snowballstemmer==2.0.0
sortedcollections==1.2.1
sortedcontainers @ file:///tmp/build/80754af9/sortedcontainers_1606865132123/work
soupsieve @ file:///tmp/build/80754af9/soupsieve_1607965878077/work
Sphinx @ file:///tmp/build/80754af9/sphinx_1608836605859/work
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==1.0.3
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.4
sphinxcontrib-websupport @ file:///tmp/build/80754af9/sphinxcontrib-websupport_1597081412696/work
spyder @ file:///C:/b/work
spyder-kernels @ file:///C:/ci/spyder-kernels_1608560699887/work
SQLAlchemy @ file:///C:/ci/sqlalchemy_1608243831000/work
statsmodels==0.12.1
sympy @ file:///C:/ci/sympy_1608119646897/work
tables==3.6.1
tblib @ file:///tmp/build/80754af9/tblib_1597928476713/work
terminado==0.9.1
testpath==0.4.4
threadpoolctl @ file:///tmp/tmp9twdgx9k/threadpoolctl-2.1.0-py3-none-any.whl
three-merge @ file:///tmp/build/80754af9/three-merge_1607553261110/work
tifffile @ file:///tmp/build/80754af9/tifffile_1607624018440/work
toml @ file:///tmp/build/80754af9/toml_1592853716807/work
toolz @ file:///tmp/build/80754af9/toolz_1601054250827/work
tornado @ file:///C:/ci/tornado_1606942392901/work
tqdm @ file:///tmp/build/80754af9/tqdm_1608922753628/work
traitlets @ file:///tmp/build/80754af9/traitlets_1602787416690/work
typed-ast==1.4.1
typing-extensions @ file:///tmp/build/80754af9/typing_extensions_1598376058250/work
ujson @ file:///C:/ci/ujson_1602505266082/work
unicodecsv==0.14.1
urllib3 @ file:///tmp/build/80754af9/urllib3_1606938623459/work
watchdog @ file:///C:/ci/watchdog_1606939226617/work
wcwidth @ file:///tmp/build/80754af9/wcwidth_1593447189090/work
webencodings==0.5.1
Werkzeug==1.0.1
widgetsnbextension==3.5.1
win-inet-pton @ file:///C:/ci/win_inet_pton_1605306167264/work
win-unicode-console==0.5
wincertstore==0.2
wrapt==1.11.2
xlrd @ file:///tmp/build/80754af9/xlrd_1608072521494/work
XlsxWriter @ file:///tmp/build/80754af9/xlsxwriter_1602692860603/work
xlwings==0.21.4
xlwt==1.3.0
yapf @ file:///tmp/build/80754af9/yapf_1593528177422/work
zict==2.0.0
zipp @ file:///tmp/build/80754af9/zipp_1604001098328/work
zope.event==4.5.0
zope.interface @ file:///C:/ci/zope.interface_1606940376312/work

Output of the python command:

Python 3.8.5 (default, Sep  3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.

@LinqLover
Copy link

Since xlrd does not support xlsx files any longer, wouldn't it be straightforward to use openpyxl instead as the default engine for xlsx files?

https://stackoverflow.com/a/65266270/13994294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

4 participants