-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BadZipFile error when using read_excel on .xlsx #26813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you open the file with xlrd directly?
…On Wed, Jun 12, 2019 at 11:06 AM OD1995 ***@***.***> wrote:
Code Sample, a copy-pastable example if possible
UKregions = pd.read_excel(r"K:\Sport\Sponsors\Lookups.xlsx",sheet_name="UKregions")
Problem description
Traceback (most recent call last):
File "<ipython-input-3-acf52be7bb80>", line 1, in <module>
UKregions0 = pd.read_excel(r"K:\Sport\Sponsors\Adidas\2018 - PSOV\Lookups.xlsx",sheet_name="UKregions")
File "E:\ANACONDA\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "E:\ANACONDA\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "E:\ANACONDA\lib\site-packages\pandas\io\excel.py", line 350, in read_excel
io = ExcelFile(io, engine=engine)
File "E:\ANACONDA\lib\site-packages\pandas\io\excel.py", line 653, in __init__
self._reader = self._engines[engine](self._io)
File "E:\ANACONDA\lib\site-packages\pandas\io\excel.py", line 424, in __init__
self.book = xlrd.open_workbook(filepath_or_buffer)
File "E:\ANACONDA\lib\site-packages\xlrd\__init__.py", line 117, in open_workbook
zf = zipfile.ZipFile(filename)
File "E:\ANACONDA\lib\zipfile.py", line 1131, in __init__
self._RealGetContents()
File "E:\ANACONDA\lib\zipfile.py", line 1198, in _RealGetContents
raise BadZipFile("File is not a zip file")
BadZipFile: File is not a zip file
Expected Output
A pandas DataFrame
Output of pd.show_versions() INSTALLED VERSIONS
commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
pandas: 0.24.2
pytest: 4.6.2
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.10
numpy: 1.16.4
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: 2.1.0
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: 1.2.1
tables: 3.5.2
numexpr: 2.6.9
feather: None
matplotlib: 3.1.0
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.8
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.4
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#26813?email_source=notifications&email_token=AAKAOIU6FTPEEWA4XXU2ZWDP2ENH7A5CNFSM4HXKQ322YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GZDKZ3A>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOITZWEWSDXTT4ODXPFTP2ENH7ANCNFSM4HXKQ32Q>
.
|
@TomAugspurger No, code and error below: Code:
Error:
|
Unfortunately this isn’t a pandas issue - you would have to open with xlrd |
@TomAugspurger / @WillAyd - more frustratingly, this is exactly the kind of non-issue that I'd like to avoid having dumped on the xlrd project: the error's pretty clear - if the file can't even be unzipped there is zero chance it's a valid xlsx, so why just say "well, this must be an xlrd problem, please go complain there"? |
Didn't look through the whole traceback. Agreed it looks like an issue with
the file.
…On Thu, Jun 13, 2019 at 9:13 AM Chris Withers ***@***.***> wrote:
@WillAyd <https://github.com/WillAyd> - more frustratingly, this is
exactly the kind of non-issue that I'd like to avoid having dumped on the
xlrd project: the error's pretty clear - if the file can't even be unzipped
there is zero chance it's a valid xlsx, so why just say "well, this must be
an xlrd problem, please go complain there"?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26813?email_source=notifications&email_token=AAKAOIVXEGYKPN3Z2ROK24TP2JIXFA5CNFSM4HXKQ322YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXT2LLQ#issuecomment-501720494>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIWTPJTUFV4TD3KSIATP2JIXFANCNFSM4HXKQ32Q>
.
|
If you’re going to offer opinions which create work for other maintainers, perhaps you could in future?
… On 13 Jun 2019, at 15:15, Tom Augspurger ***@***.***> wrote:
Didn't look through the whole traceback. Agreed it looks like an issue with
the file.
On Thu, Jun 13, 2019 at 9:13 AM Chris Withers ***@***.***>
wrote:
> @WillAyd <https://github.com/WillAyd> - more frustratingly, this is
> exactly the kind of non-issue that I'd like to avoid having dumped on the
> xlrd project: the error's pretty clear - if the file can't even be unzipped
> there is zero chance it's a valid xlsx, so why just say "well, this must be
> an xlrd problem, please go complain there"?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#26813?email_source=notifications&email_token=AAKAOIVXEGYKPN3Z2ROK24TP2JIXFA5CNFSM4HXKQ322YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXT2LLQ#issuecomment-501720494>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAKAOIWTPJTUFV4TD3KSIATP2JIXFANCNFSM4HXKQ32Q>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Sure.
On Thu, Jun 13, 2019 at 9:18 AM Chris Withers <[email protected]>
wrote:
… If you’re going to offer opinions which create work for other maintainers,
perhaps you could in future?
> On 13 Jun 2019, at 15:15, Tom Augspurger ***@***.***>
wrote:
>
> Didn't look through the whole traceback. Agreed it looks like an issue
with
> the file.
>
> On Thu, Jun 13, 2019 at 9:13 AM Chris Withers ***@***.***>
> wrote:
>
> > @WillAyd <https://github.com/WillAyd> - more frustratingly, this is
> > exactly the kind of non-issue that I'd like to avoid having dumped on
the
> > xlrd project: the error's pretty clear - if the file can't even be
unzipped
> > there is zero chance it's a valid xlsx, so why just say "well, this
must be
> > an xlrd problem, please go complain there"?
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub
> > <
#26813?email_source=notifications&email_token=AAKAOIVXEGYKPN3Z2ROK24TP2JIXFA5CNFSM4HXKQ322YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXT2LLQ#issuecomment-501720494
>,
> > or mute the thread
> > <
https://github.com/notifications/unsubscribe-auth/AAKAOIWTPJTUFV4TD3KSIATP2JIXFANCNFSM4HXKQ32Q
>
> > .
> >
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub, or mute the thread.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26813?email_source=notifications&email_token=AAKAOISFUKBRIV277W2UJITP2JJK3A5CNFSM4HXKQ322YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXT25LY#issuecomment-501722799>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIUHUDFFVHTVHLKW4LDP2JJK3ANCNFSM4HXKQ32Q>
.
|
Hey @cjw296 ! Just want to clarify my intention here - I'm not trying to "offload problems" or create more work. I'm just trying to route issues to where they can be definitively addressed. Sure I had an inclination that this is probably just an issue with the users' file but at the same time I don't know xlrd's code base nor am I as intimately familiar with the structure of Excel files to say for sure. With regards to your position on xlrd there's been a lot of work recently to decouple that dependency and get openpyxl in as a valid reader (you can check the label Getting something else besides xlrd for reading isn't a request we are ignoring, but like anything else its just taking a little bit of time to get there. Your patience while we work through that is certainly appreciated (from PR you linked you should see we are getting closer) and obviously if you have any particular contributions you'd like to make via PRs or reviews we would love that as well. |
@WillAyd - the problem with xlrd, and it's one of the things that has burned me and John out, is dealing with careless users who can't be bothered to read exceptions or even check they have a valid excel file before complaining. After that, it becomes people who want it not to be their problem that their source data is corrupted and invalid, and because some other library or program happens to be able to deal with their corrupt data, they demand it be fixed at our expense in xlrd. So, high level:
I'm sorry to have to be so blunt about this, but myself and John have tried the subtle approach over the years and it's hasn't worked. |
For me switching to the newest version of Python and Anaconda resolved the issue. Got the error while working on Python 2.7 now updated to 3.7.4 |
Perform these quick sanity checks:
In my case, I manually checked the excel file content and it turns out it was empty because I was not storing the file correctly. Once I fixed this, the "File is not a zip file" error got resolved. |
I recently ran into a similar issue. I had uploaded an .xlsx file to Google Cloud storage and used pandas to read the file pandas.read_excel method passing the Google Cloud storage location. This works fine when the file is uploaded normally to GCS but fails if the same file is uploaded to GCS gzip compressed |
Code Sample, a copy-pastable example if possible
Problem description
Expected Output
A pandas DataFrame
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
pandas: 0.24.2
pytest: 4.6.2
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.10
numpy: 1.16.4
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: 2.1.0
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: 1.2.1
tables: 3.5.2
numexpr: 2.6.9
feather: None
matplotlib: 3.1.0
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.8
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.4
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: