-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Pandas read_excel sometimes using xlrd which has deprecated code in python 3.8.1 #30851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
read_excel uses xlrd by default. We have a PR to deprecate that default but not yet merged #29375 You can explicitly request |
Closing as we by default always use xlrd right now; could use help to deprecate it in mentioned PR if something you are interested in @mhooreman |
Thanks @WillAyd . I'll pass openpyxl as argument.
while the code of io.excel._base.ExcelFile gives: _engines = {"xlrd": _XlrdReader, "openpyxl": _OpenpyxlReader, "odf": _ODFReader} |
Sorry to come back to this guys, but I have xls files (old format) in my data source as well. So, xlrd will have to be used in that case. Any workaround idea? |
The doc issue you mention should already be fixed in dev. With regards to the files, none of the other excel engines support reading .xls files. You can continue to use xlrd (I don't think we will outright remove, just move default to openpyxl) but obviously that project is unmaintained so no there are no guarantees on how that will work in the long run |
python 3.10 pandas==1.3.5 2021/12 the problem still continues with xml.parsers.expat.ExpatError: mismatched tag: Planillas_EXCEL\variable_tiempo.xlsx dimi_fecha File "C:\Python310\lib\xml\etree\ElementTree.py", line 1718, in feed self.parser.Parse(data, False) xml.parsers.expat.ExpatError: mismatched tag: line 2, column 313904 During handling of the above exception, another exception occurred: Traceback (most recent call last): df = pd.read_excel(file_, engine='openpyxl') or df = pd.read_excel(file_) File "C:\Python310\lib\site-packages\pandas\util_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "C:\Python310\lib\site-packages\pandas\io\excel_base.py", line 372, in read_excel data = io.parse( File "C:\Python310\lib\site-packages\pandas\io\excel_base.py", line 1272, in parse return self._reader.parse( File "C:\Python310\lib\site-packages\pandas\io\excel_base.py", line 539, in parse data = self.get_sheet_data(sheet, convert_float) |
Hello,
Under some circumstances that I'm unable to systematically reproduce, pd.read_excel still uses xlrd.
With python 3.8, I get a DeprecationWarning, which I can't fix.
Since there is no maintainer anymore for xlrd, I must come back to you to get some advises. Would you be so kind to help me?
I'm using pd.read_excel within joblib parallel subprocesses, using the devault backend. There is no specific option given to read_excel, and I have a mix of xls and xlsx files. The "interesting part" of the exception is show below.
Unfortunately, when I try it manually using ipython interactive shell, I have no issue, even with the parallel joblib processing.
Thanks a lot.
The text was updated successfully, but these errors were encountered: