-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DEPR: Deprecate using xlrd
engine for read_excel
#35029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
3a76a36
101aa97
081ecf8
ada4354
3233381
0f4c8a1
499f9a0
825c61c
88093f6
44f157b
fffbacb
bb53725
d8dcb04
f9876dd
bc3ec47
fe10a89
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ | |
import os | ||
from textwrap import fill | ||
from typing import Any, Mapping, Union | ||
import warnings | ||
|
||
from pandas._config import config | ||
|
||
|
@@ -837,7 +838,7 @@ class ExcelFile: | |
engine : str, default None | ||
If io is not a buffer or path, this must be set to identify io. | ||
Supported engines: ``xlrd``, ``openpyxl``, ``odf``, ``pyxlsb``, | ||
default ``xlrd``. | ||
default ``xlrd`` for .xls* files, ``odf`` for .ods files. | ||
Engine compatibility : | ||
- ``xlrd`` supports most old/new Excel file formats. | ||
- ``openpyxl`` supports newer Excel file formats. | ||
|
@@ -860,15 +861,26 @@ class ExcelFile: | |
def __init__( | ||
self, path_or_buffer, engine=None, storage_options: StorageOptions = None | ||
): | ||
ext = None | ||
if not isinstance(path_or_buffer, (BufferedIOBase, RawIOBase)): | ||
ext = os.path.splitext(str(path_or_buffer))[-1][1:] | ||
|
||
if engine is None: | ||
engine = "xlrd" | ||
if isinstance(path_or_buffer, (BufferedIOBase, RawIOBase)): | ||
if _is_ods_stream(path_or_buffer): | ||
engine = "odf" | ||
else: | ||
ext = os.path.splitext(str(path_or_buffer))[-1] | ||
if ext == ".ods": | ||
if ext == "ods": | ||
engine = "odf" | ||
|
||
WillAyd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
elif engine == "xlrd" and ext in ("xlsx", "xlsm"): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This warning should be in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you sure? This will mean that that people also get a warning when they ask for the default (which is still xlrd), instead of when they explicitly ask for xlrd. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea so the point of it is that people who want to suppress the warnings will get a head start and explicitly request |
||
warnings.warn( | ||
'The Excel reader engine "xlrd" is deprecated, use "openpyxl" instead. ' | ||
'Specify engine="openpyxl" to suppress this warning.', | ||
FutureWarning, | ||
stacklevel=2, | ||
) | ||
if engine not in self._engines: | ||
raise ValueError(f"Unknown engine: {engine}") | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
from datetime import datetime | ||
from typing import List | ||
|
||
import numpy as np | ||
|
@@ -517,7 +518,11 @@ def _convert_cell(self, cell, convert_float: bool) -> Scalar: | |
|
||
# TODO: replace with openpyxl constants | ||
if cell.is_date: | ||
return cell.value | ||
try: | ||
# workaround for inaccurate timestamp notation in excel | ||
return datetime.fromtimestamp(round(cell.value.timestamp())) | ||
except (AttributeError, OSError): | ||
return cell.value | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this changing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was a work-around for a bug in openpyxl (https://foss.heptapod.net/openpyxl/openpyxl/-/issues/1493), but is only apparent when you do a round trip save to xlsx and read back xlsx using openpyxl. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If adding code we should have a test. so either need to add test or can remove. my preference would be to have this in a separate PR, so should raise pandas issue for this if removing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair enough, remove it here and make a separate PR that includes a regression test. |
||
elif cell.data_type == "e": | ||
return np.nan | ||
elif cell.data_type == "b": | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for updating. it's the ``read_excel
"xlrd" engine is deprecated
bit that I wanted removedIIUC the xlrd is not deprecated. it's only that that default engine used will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it deprecated for use with xlsx files? IOW in the future, will only openpyxl be supported for xlsx? Sounds reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, that's not my understanding of the discussion in #28547
from #28547 (comment)
Indeed, the discussion has not explicitly mentioned disallowing
xlrd
for formats thatopenpyxl
supports. but if the xlrd engine is not removed, we should decide now whether we would restrict it's use.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My read of this is to only keep xlrd for xls where it is required, and to deprecate where it is not. In the long run, if xlrd breaks and no one takes over its maintenance, then we will either have to vendor xlrd or remove support for xls. In either case minimizing use of xlrd seems like a good idea to me.
I suppose to me the only point of deprecating is to start a path to removal of a feature. If there is not going to be removal in the future, then why bother with deprecation?
One could always discourage xlrd + xlsx with a noisy FutureWarning telling users that xlrd is unmaintained and they should install openpyxl for reading xlsx files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd is very specific regarding this issue. He states that we should warn first, then change the default (and maybe even remove the xlrd engine for xlsx and xlsm files altogether).
But not at this point, only a deprecation warning is asked, to notify users that this engine is no longer the preferred engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify we should only be changing the default reader to openpyxl. I think it's fine to keep xlrd around as a YMMV situation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I just reverted the changes that made openpyxl the default. I am now very confused. I thought that this change was just about warning people about pending deprecation of the xlrd reader, not to switch them over already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See your comments of July 1 and 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see where the wording is confusing, but yes we only warn now and change in the future. We always manage user-facing changes that way