Skip to content

Backport PR #38456 on branch 1.2.x (DOC: update wording about when xlrd engine can be used) #38660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 28 additions & 3 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2820,15 +2820,40 @@ parse HTML tables in the top-level pandas io function ``read_html``.
Excel files
-----------

The :func:`~pandas.read_excel` method can read Excel 2003 (``.xls``)
files using the ``xlrd`` Python module. Excel 2007+ (``.xlsx``) files
can be read using either ``xlrd`` or ``openpyxl``. Binary Excel (``.xlsb``)
The :func:`~pandas.read_excel` method can read Excel 2007+ (``.xlsx``) files
using the ``openpyxl`` Python module. Excel 2003 (``.xls``) files
can be read using ``xlrd``. Binary Excel (``.xlsb``)
files can be read using ``pyxlsb``.
The :meth:`~DataFrame.to_excel` instance method is used for
saving a ``DataFrame`` to Excel. Generally the semantics are
similar to working with :ref:`csv<io.read_csv_table>` data.
See the :ref:`cookbook<cookbook.excel>` for some advanced strategies.

.. warning::

The `xlwt <https://xlwt.readthedocs.io/en/latest/>`__ package for writing old-style ``.xls``
excel files is no longer maintained.
The `xlrd <https://xlrd.readthedocs.io/en/latest/>`__ package is now only for reading
old-style ``.xls`` files.

Previously, the default argument ``engine=None`` to :func:`~pandas.read_excel`
would result in using the ``xlrd`` engine in many cases, including new
Excel 2007+ (``.xlsx``) files.
If `openpyxl <https://openpyxl.readthedocs.io/en/stable/>`__ is installed,
many of these cases will now default to using the ``openpyxl`` engine.
See the :func:`read_excel` documentation for more details.

Thus, it is strongly encouraged to install ``openpyxl`` to read Excel 2007+
(``.xlsx``) files.
**Please do not report issues when using ``xlrd`` to read ``.xlsx`` files.**
This is no longer supported, switch to using ``openpyxl`` instead.

Attempting to use the the ``xlwt`` engine will raise a ``FutureWarning``
unless the option :attr:`io.excel.xls.writer` is set to ``"xlwt"``.
While this option is now deprecated and will also raise a ``FutureWarning``,
it can be globally set and the warning suppressed. Users are recommended to
write ``.xlsx`` files using the ``openpyxl`` engine instead.

.. _io.excel_reader:

Reading Excel files
Expand Down
29 changes: 15 additions & 14 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,22 @@ including other versions of pandas.

.. warning::

The packages `xlrd <https://xlrd.readthedocs.io/en/latest/>`_ for reading excel
files and `xlwt <https://xlwt.readthedocs.io/en/latest/>`_ for
writing excel files are no longer maintained. These are the only engines in pandas
that support the xls format.

Previously, the default argument ``engine=None`` to ``pd.read_excel``
would result in using the ``xlrd`` engine in many cases. If
`openpyxl <https://openpyxl.readthedocs.io/en/stable/>`_ is installed,
The `xlwt <https://xlwt.readthedocs.io/en/latest/>`_ package for writing old-style ``.xls``
excel files is no longer maintained.
The `xlrd <https://xlrd.readthedocs.io/en/latest/>`_ package is now only for reading
old-style ``.xls`` files.

Previously, the default argument ``engine=None`` to :func:`~pandas.read_excel`
would result in using the ``xlrd`` engine in many cases, including new
Excel 2007+ (``.xlsx``) files.
If `openpyxl <https://openpyxl.readthedocs.io/en/stable/>`_ is installed,
many of these cases will now default to using the ``openpyxl`` engine.
See the :func:`read_excel` documentation for more details. Attempting to read
``.xls`` files or specifying ``engine="xlrd"`` to ``pd.read_excel`` will not
raise a warning. However users should be aware that ``xlrd`` is already
broken with certain package configurations, for example with Python 3.9
when `defusedxml <https://github.com/tiran/defusedxml/>`_ is installed, and
is anticipated to be unusable in the future.
See the :func:`read_excel` documentation for more details.

Thus, it is strongly encouraged to install ``openpyxl`` to read Excel 2007+
(``.xlsx``) files.
**Please do not report issues when using ``xlrd`` to read ``.xlsx`` files.**
This is no longer supported, switch to using ``openpyxl`` instead.

Attempting to use the the ``xlwt`` engine will raise a ``FutureWarning``
unless the option :attr:`io.excel.xls.writer` is set to ``"xlwt"``.
Expand Down
24 changes: 13 additions & 11 deletions pandas/io/excel/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,16 +105,16 @@
Supported engines: "xlrd", "openpyxl", "odf", "pyxlsb".
Engine compatibility :

- "xlrd" supports most old/new Excel file formats.
- "xlrd" supports old-style Excel files (.xls).
- "openpyxl" supports newer Excel file formats.
- "odf" supports OpenDocument file formats (.odf, .ods, .odt).
- "pyxlsb" supports Binary Excel files.

.. versionchanged:: 1.2.0
The engine `xlrd <https://xlrd.readthedocs.io/en/latest/>`_
is no longer maintained, and is not supported with
python >= 3.9. When ``engine=None``, the following logic will be
used to determine the engine.
now only supports old-style ``.xls`` files.
When ``engine=None``, the following logic will be
used to determine the engine:

- If ``path_or_buffer`` is an OpenDocument format (.odf, .ods, .odt),
then `odf <https://pypi.org/project/odfpy/>`_ will be used.
Expand Down Expand Up @@ -920,7 +920,7 @@ class ExcelFile:
"""
Class for parsing tabular excel sheets into DataFrame objects.

Uses xlrd engine by default. See read_excel for more documentation
See read_excel for more documentation

Parameters
----------
Expand All @@ -933,17 +933,17 @@ class ExcelFile:
Supported engines: ``xlrd``, ``openpyxl``, ``odf``, ``pyxlsb``
Engine compatibility :

- ``xlrd`` supports most old/new Excel file formats.
- ``xlrd`` supports old-style Excel files (.xls).
- ``openpyxl`` supports newer Excel file formats.
- ``odf`` supports OpenDocument file formats (.odf, .ods, .odt).
- ``pyxlsb`` supports Binary Excel files.

.. versionchanged:: 1.2.0

The engine `xlrd <https://xlrd.readthedocs.io/en/latest/>`_
is no longer maintained, and is not supported with
python >= 3.9. When ``engine=None``, the following logic will be
used to determine the engine.
now only supports old-style ``.xls`` files.
When ``engine=None``, the following logic will be
used to determine the engine:

- If ``path_or_buffer`` is an OpenDocument format (.odf, .ods, .odt),
then `odf <https://pypi.org/project/odfpy/>`_ will be used.
Expand All @@ -954,8 +954,10 @@ class ExcelFile:
then ``openpyxl`` will be used.
- Otherwise ``xlrd`` will be used and a ``FutureWarning`` will be raised.

Specifying ``engine="xlrd"`` will continue to be allowed for the
indefinite future.
.. warning::

Please do not report issues when using ``xlrd`` to read ``.xlsx`` files.
This is not supported, switch to using ``openpyxl`` instead.
"""

from pandas.io.excel._odfreader import ODFReader
Expand Down