Skip to content

BUG: AttributeError raised when reading from excel file containing chart sheet #41448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Qrox opened this issue May 13, 2021 · 2 comments · Fixed by #41698
Closed

BUG: AttributeError raised when reading from excel file containing chart sheet #41448

Qrox opened this issue May 13, 2021 · 2 comments · Fixed by #41698
Assignees
Labels
Bug IO Excel read_excel, to_excel
Milestone

Comments

@Qrox
Copy link

Qrox commented May 13, 2021

  • [Y] I have checked that this issue has not already been reported.

  • [Y] I have confirmed this bug exists on the latest version of pandas.

  • [N] (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas
pandas.read_excel('chartsheet.xlsx', sheet_name=None)

Example sheet file:
chartsheet.xlsx

Problem description

When reading an excel file that contains a chart sheet (a sheet that contains a chart and no table), the following exception was raised:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-89c95e3cb7b1> in <module>
      1 import pandas
----> 2 pandas.read_excel('chartsheet.xlsx', sheet_name=None)

D:\Softwares\Anaconda\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    297                 )
    298                 warnings.warn(msg, FutureWarning, stacklevel=stacklevel)
--> 299             return func(*args, **kwargs)
    300 
    301         return wrapper

D:\Softwares\Anaconda\lib\site-packages\pandas\io\excel\_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, storage_options)
    365             skipfooter=skipfooter,
    366             convert_float=convert_float,
--> 367             mangle_dupe_cols=mangle_dupe_cols,
    368         )
    369     finally:

D:\Softwares\Anaconda\lib\site-packages\pandas\io\excel\_base.py in parse(self, sheet_name, header, names, index_col, usecols, squeeze, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
   1188             convert_float=convert_float,
   1189             mangle_dupe_cols=mangle_dupe_cols,
-> 1190             **kwds,
   1191         )
   1192 

D:\Softwares\Anaconda\lib\site-packages\pandas\io\excel\_base.py in parse(self, sheet_name, header, names, index_col, usecols, squeeze, dtype, true_values, false_values, skiprows, nrows, na_values, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
    490                 sheet = self.get_sheet_by_index(asheetname)
    491 
--> 492             data = self.get_sheet_data(sheet, convert_float)
    493             usecols = maybe_convert_usecols(usecols)
    494 

D:\Softwares\Anaconda\lib\site-packages\pandas\io\excel\_openpyxl.py in get_sheet_data(self, sheet, convert_float)
    546         data: List[List[Scalar]] = []
    547         last_row_with_data = -1
--> 548         for row_number, row in enumerate(sheet.rows):
    549             converted_row = [self._convert_cell(cell, convert_float) for cell in row]
    550             if not all(cell == "" for cell in converted_row):

AttributeError: 'Chartsheet' object has no attribute 'rows'

Expected Output

I expect that either the chart sheet is ignored, returned as some appropriate object, or a warning is shown that the chart sheet cannot be read, instead of raising the exception.

In my case I'm reading from many zipped archives of excel files, so manually removing the charts from the files is not an option.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : 2cb9652 python : 3.7.4.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder : little LC_ALL : None LANG : zh_CN.UTF-8 LOCALE : None.None

pandas : 1.2.4
numpy : 1.19.4
pytz : 2019.3
dateutil : 2.8.1
pip : 21.0.1
setuptools : 51.0.0
Cython : 0.29.14
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.22.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.1
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : 1.0.8
s3fs : None
scipy : 1.5.4
sqlalchemy : None
tables : None
tabulate : 0.8.7
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None

@Qrox Qrox added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 13, 2021
@ahawryluk
Copy link
Contributor

take

@ahawryluk
Copy link
Contributor

I expect that either the chart sheet is ignored, returned as some appropriate object, or a warning is shown that the chart sheet cannot be read, instead of raising the exception.

For consistency between readers I believe we should always ignore chartsheets. Currently, we have the following ExcelFile behaviours:

xls xlsx/m xlsb
chart sheets in sheet_names n y y
chart sheets counted in index n n y

Changing our xlsx engine to ignore chart sheets is easy. Changing our xlsb engine will have to wait until this bug is fixed: willtrnr/pyxlsb#33. This bug doesn't apply to to .ods files because the OpenDocument specification doesn't support chart sheets. Saving an excel file with a chart sheet as .ods produces an ordinary worksheet with a large graph and no data.

@jbrockmendel jbrockmendel added the IO Excel read_excel, to_excel label Jun 6, 2021
@rhshadrach rhshadrach added this to the 1.4 milestone Jun 25, 2021
@rhshadrach rhshadrach removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jun 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants