BadZipFile error when using read_excel on .xlsx #26813

OD1995 · 2019-06-12T16:06:14Z

Code Sample, a copy-pastable example if possible

UKregions = pd.read_excel(r"K:\Sport\Sponsors\Lookups.xlsx",sheet_name="UKregions")

Problem description

Traceback (most recent call last):

  File "<ipython-input-3-acf52be7bb80>", line 1, in <module>
    UKregions0 = pd.read_excel(r"K:\Sport\Sponsors\Adidas\2018 - PSOV\Lookups.xlsx",sheet_name="UKregions")

  File "E:\ANACONDA\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
    return func(*args, **kwargs)

  File "E:\ANACONDA\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
    return func(*args, **kwargs)

  File "E:\ANACONDA\lib\site-packages\pandas\io\excel.py", line 350, in read_excel
    io = ExcelFile(io, engine=engine)

  File "E:\ANACONDA\lib\site-packages\pandas\io\excel.py", line 653, in __init__
    self._reader = self._engines[engine](self._io)

  File "E:\ANACONDA\lib\site-packages\pandas\io\excel.py", line 424, in __init__
    self.book = xlrd.open_workbook(filepath_or_buffer)

  File "E:\ANACONDA\lib\site-packages\xlrd\__init__.py", line 117, in open_workbook
    zf = zipfile.ZipFile(filename)

  File "E:\ANACONDA\lib\zipfile.py", line 1131, in __init__
    self._RealGetContents()

  File "E:\ANACONDA\lib\zipfile.py", line 1198, in _RealGetContents
    raise BadZipFile("File is not a zip file")

BadZipFile: File is not a zip file

Expected Output

A pandas DataFrame

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.24.2
pytest: 4.6.2
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.10
numpy: 1.16.4
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: 2.1.0
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: 1.2.1
tables: 3.5.2
numexpr: 2.6.9
feather: None
matplotlib: 3.1.0
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.8
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.4
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-06-12T16:42:28Z

Can you open the file with xlrd directly?

…

On Wed, Jun 12, 2019 at 11:06 AM OD1995 ***@***.***> wrote: Code Sample, a copy-pastable example if possible UKregions = pd.read_excel(r"K:\Sport\Sponsors\Lookups.xlsx",sheet_name="UKregions") Problem description Traceback (most recent call last): File "<ipython-input-3-acf52be7bb80>", line 1, in <module> UKregions0 = pd.read_excel(r"K:\Sport\Sponsors\Adidas\2018 - PSOV\Lookups.xlsx",sheet_name="UKregions") File "E:\ANACONDA\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper return func(*args, **kwargs) File "E:\ANACONDA\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper return func(*args, **kwargs) File "E:\ANACONDA\lib\site-packages\pandas\io\excel.py", line 350, in read_excel io = ExcelFile(io, engine=engine) File "E:\ANACONDA\lib\site-packages\pandas\io\excel.py", line 653, in __init__ self._reader = self._engines[engine](self._io) File "E:\ANACONDA\lib\site-packages\pandas\io\excel.py", line 424, in __init__ self.book = xlrd.open_workbook(filepath_or_buffer) File "E:\ANACONDA\lib\site-packages\xlrd\__init__.py", line 117, in open_workbook zf = zipfile.ZipFile(filename) File "E:\ANACONDA\lib\zipfile.py", line 1131, in __init__ self._RealGetContents() File "E:\ANACONDA\lib\zipfile.py", line 1198, in _RealGetContents raise BadZipFile("File is not a zip file") BadZipFile: File is not a zip file Expected Output A pandas DataFrame Output of pd.show_versions() INSTALLED VERSIONS commit: None python: 3.6.8.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None pandas: 0.24.2 pytest: 4.6.2 pip: 19.1.1 setuptools: 41.0.1 Cython: 0.29.10 numpy: 1.16.4 scipy: 1.2.1 pyarrow: None xarray: None IPython: 7.5.0 sphinx: 2.1.0 patsy: 0.5.1 dateutil: 2.8.0 pytz: 2019.1 blosc: None bottleneck: 1.2.1 tables: 3.5.2 numexpr: 2.6.9 feather: None matplotlib: 3.1.0 openpyxl: 2.6.2 xlrd: 1.2.0 xlwt: 1.3.0 xlsxwriter: 1.1.8 lxml.etree: 4.3.3 bs4: 4.7.1 html5lib: 1.0.1 sqlalchemy: 1.3.4 pymysql: None psycopg2: None jinja2: 2.10.1 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#26813?email_source=notifications&email_token=AAKAOIU6FTPEEWA4XXU2ZWDP2ENH7A5CNFSM4HXKQ322YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GZDKZ3A>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOITZWEWSDXTT4ODXPFTP2ENH7ANCNFSM4HXKQ32Q> .

OD1995 · 2019-06-13T07:47:57Z

@TomAugspurger No, code and error below:

Code:

A = xlrd.open_workbook(r"K:\Sport\Sponsors\Lookups.xlsx")

Error:

Traceback (most recent call last):

  File "<ipython-input-7-fd902e6b151e>", line 1, in <module>
    odf = xlrd.open_workbook(lookupsPath)

  File "E:\ANACONDA\lib\site-packages\xlrd\__init__.py", line 117, in open_workbook
    zf = zipfile.ZipFile(filename)

  File "E:\ANACONDA\lib\zipfile.py", line 1131, in __init__
    self._RealGetContents()

  File "E:\ANACONDA\lib\zipfile.py", line 1198, in _RealGetContents
    raise BadZipFile("File is not a zip file")

BadZipFile: File is not a zip file

WillAyd · 2019-06-13T10:27:03Z

Unfortunately this isn’t a pandas issue - you would have to open with xlrd

cjw296 · 2019-06-13T14:09:21Z

@WillAyd - please can you stop offloading your project's problems onto me?

I've made the position with xlrd pretty clear:
#26487 (the issue you closed)
#11499

Thanks.

cjw296 · 2019-06-13T14:12:58Z

@TomAugspurger / @WillAyd - more frustratingly, this is exactly the kind of non-issue that I'd like to avoid having dumped on the xlrd project: the error's pretty clear - if the file can't even be unzipped there is zero chance it's a valid xlsx, so why just say "well, this must be an xlrd problem, please go complain there"?

TomAugspurger · 2019-06-13T14:14:35Z

Didn't look through the whole traceback. Agreed it looks like an issue with the file.

…

On Thu, Jun 13, 2019 at 9:13 AM Chris Withers ***@***.***> wrote: @WillAyd <https://github.com/WillAyd> - more frustratingly, this is exactly the kind of non-issue that I'd like to avoid having dumped on the xlrd project: the error's pretty clear - if the file can't even be unzipped there is zero chance it's a valid xlsx, so why just say "well, this must be an xlrd problem, please go complain there"? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26813?email_source=notifications&email_token=AAKAOIVXEGYKPN3Z2ROK24TP2JIXFA5CNFSM4HXKQ322YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXT2LLQ#issuecomment-501720494>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIWTPJTUFV4TD3KSIATP2JIXFANCNFSM4HXKQ32Q> .

cjw296 · 2019-06-13T14:18:12Z

If you’re going to offer opinions which create work for other maintainers, perhaps you could in future?

…

On 13 Jun 2019, at 15:15, Tom Augspurger ***@***.***> wrote: Didn't look through the whole traceback. Agreed it looks like an issue with the file. On Thu, Jun 13, 2019 at 9:13 AM Chris Withers ***@***.***> wrote: > @WillAyd <https://github.com/WillAyd> - more frustratingly, this is > exactly the kind of non-issue that I'd like to avoid having dumped on the > xlrd project: the error's pretty clear - if the file can't even be unzipped > there is zero chance it's a valid xlsx, so why just say "well, this must be > an xlrd problem, please go complain there"? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#26813?email_source=notifications&email_token=AAKAOIVXEGYKPN3Z2ROK24TP2JIXFA5CNFSM4HXKQ322YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXT2LLQ#issuecomment-501720494>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAKAOIWTPJTUFV4TD3KSIATP2JIXFANCNFSM4HXKQ32Q> > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

TomAugspurger · 2019-06-13T14:28:36Z

Sure. On Thu, Jun 13, 2019 at 9:18 AM Chris Withers <[email protected]> wrote:

…

If you’re going to offer opinions which create work for other maintainers, perhaps you could in future? > On 13 Jun 2019, at 15:15, Tom Augspurger ***@***.***> wrote: > > Didn't look through the whole traceback. Agreed it looks like an issue with > the file. > > On Thu, Jun 13, 2019 at 9:13 AM Chris Withers ***@***.***> > wrote: > > > @WillAyd <https://github.com/WillAyd> - more frustratingly, this is > > exactly the kind of non-issue that I'd like to avoid having dumped on the > > xlrd project: the error's pretty clear - if the file can't even be unzipped > > there is zero chance it's a valid xlsx, so why just say "well, this must be > > an xlrd problem, please go complain there"? > > > > — > > You are receiving this because you were mentioned. > > Reply to this email directly, view it on GitHub > > < #26813?email_source=notifications&email_token=AAKAOIVXEGYKPN3Z2ROK24TP2JIXFA5CNFSM4HXKQ322YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXT2LLQ#issuecomment-501720494 >, > > or mute the thread > > < https://github.com/notifications/unsubscribe-auth/AAKAOIWTPJTUFV4TD3KSIATP2JIXFANCNFSM4HXKQ32Q > > > . > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub, or mute the thread. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26813?email_source=notifications&email_token=AAKAOISFUKBRIV277W2UJITP2JJK3A5CNFSM4HXKQ322YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXT25LY#issuecomment-501722799>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIUHUDFFVHTVHLKW4LDP2JJK3ANCNFSM4HXKQ32Q> .

WillAyd · 2019-06-13T15:00:27Z

Hey @cjw296 ! Just want to clarify my intention here - I'm not trying to "offload problems" or create more work. I'm just trying to route issues to where they can be definitively addressed. Sure I had an inclination that this is probably just an issue with the users' file but at the same time I don't know xlrd's code base nor am I as intimately familiar with the structure of Excel files to say for sure.

With regards to your position on xlrd there's been a lot of work recently to decouple that dependency and get openpyxl in as a valid reader (you can check the label IO Excel in our PR tracker if you care to see). A lot of the code here has been contributed by a variety of users over the course of years, so its one of those things where it is probably more work than you would think or hope for to get openpyxl in to place.

Getting something else besides xlrd for reading isn't a request we are ignoring, but like anything else its just taking a little bit of time to get there. Your patience while we work through that is certainly appreciated (from PR you linked you should see we are getting closer) and obviously if you have any particular contributions you'd like to make via PRs or reviews we would love that as well.

cjw296 · 2019-06-13T15:09:15Z

@WillAyd - the problem with xlrd, and it's one of the things that has burned me and John out, is dealing with careless users who can't be bothered to read exceptions or even check they have a valid excel file before complaining. After that, it becomes people who want it not to be their problem that their source data is corrupted and invalid, and because some other library or program happens to be able to deal with their corrupt data, they demand it be fixed at our expense in xlrd.

So, high level:

If people have xlsx, tell them to use openpyxl, do not complain on the xlrd tracker about xlsx issues, many more and it will persuade me to actually rip out the xlsx support completely and do a new release.
if it's xls, assume it's a data issue. Even if it's not, chances of it being fixed in xlrd without the OP supplying a PR with a simple, sanitized example file along with unit tests are pretty much zero.
please try and dissuade people who complain on this tracker from opening issues against xlrd. If they're not clued up enough to go to the xlrd codebase in the first place, they're unlikely to get any joy by you directing them there.

I'm sorry to have to be so blunt about this, but myself and John have tried the subtle approach over the years and it's hasn't worked.

goforaditya · 2020-06-26T07:41:11Z

For me switching to the newest version of Python and Anaconda resolved the issue. Got the error while working on Python 2.7 now updated to 3.7.4

SharnamK · 2021-04-27T16:25:07Z

Perform these quick sanity checks:

Open the excel file. Is the data appearing correctly?
Are you able to see the file size in the file's details in Windows Explorer?

In my case, I manually checked the excel file content and it turns out it was empty because I was not storing the file correctly. Once I fixed this, the "File is not a zip file" error got resolved.

ggprod · 2021-12-23T18:14:20Z

I recently ran into a similar issue. I had uploaded an .xlsx file to Google Cloud storage and used pandas to read the file pandas.read_excel method passing the Google Cloud storage location. This works fine when the file is uploaded normally to GCS but fails if the same file is uploaded to GCS gzip compressed

WillAyd closed this as completed Jun 13, 2019

cjw296 mentioned this issue Oct 29, 2022

CLN: Remove xlrd < 2.0 code #49376

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BadZipFile error when using read_excel on .xlsx #26813

BadZipFile error when using read_excel on .xlsx #26813

OD1995 commented Jun 12, 2019

INSTALLED VERSIONS

TomAugspurger commented Jun 12, 2019 via email

OD1995 commented Jun 13, 2019

WillAyd commented Jun 13, 2019

cjw296 commented Jun 13, 2019

cjw296 commented Jun 13, 2019 •

edited

Loading

TomAugspurger commented Jun 13, 2019 via email

cjw296 commented Jun 13, 2019 via email

TomAugspurger commented Jun 13, 2019 via email

WillAyd commented Jun 13, 2019

cjw296 commented Jun 13, 2019 •

edited

Loading

goforaditya commented Jun 26, 2020

SharnamK commented Apr 27, 2021 •

edited

Loading

ggprod commented Dec 23, 2021

BadZipFile error when using read_excel on .xlsx #26813

BadZipFile error when using read_excel on .xlsx #26813

Comments

OD1995 commented Jun 12, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Jun 12, 2019 via email

OD1995 commented Jun 13, 2019

WillAyd commented Jun 13, 2019

cjw296 commented Jun 13, 2019

cjw296 commented Jun 13, 2019 • edited Loading

TomAugspurger commented Jun 13, 2019 via email

cjw296 commented Jun 13, 2019 via email

TomAugspurger commented Jun 13, 2019 via email

WillAyd commented Jun 13, 2019

cjw296 commented Jun 13, 2019 • edited Loading

goforaditya commented Jun 26, 2020

SharnamK commented Apr 27, 2021 • edited Loading

ggprod commented Dec 23, 2021

Output of `pd.show_versions()`

cjw296 commented Jun 13, 2019 •

edited

Loading

cjw296 commented Jun 13, 2019 •

edited

Loading

SharnamK commented Apr 27, 2021 •

edited

Loading