Skip to content

ERR: better error message on too large excel sheet #26080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 1, 2019
Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ Other Enhancements
- :class:`RangeIndex` has gained :attr:`~RangeIndex.start`, :attr:`~RangeIndex.stop`, and :attr:`~RangeIndex.step` attributes (:issue:`25710`)
- :class:`datetime.timezone` objects are now supported as arguments to timezone methods and constructors (:issue:`25065`)
- :meth:`DataFrame.query` and :meth:`DataFrame.eval` now supports quoting column names with backticks to refer to names with spaces (:issue:`6508`)
- :meth:`DataFrame.to_excel` now raises a ``ValueError`` when the caller's dimensions exceed the limitations of Excel (:issue:`26051`)
- :func:`merge_asof` now gives a more clear error message when merge keys are categoricals that are not equal (:issue:`26136`)
- :meth:`pandas.core.window.Rolling` supports exponential (or Poisson) window type (:issue:`21303`)
-
Expand Down
10 changes: 10 additions & 0 deletions pandas/io/formats/excel.py
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,9 @@ class ExcelFormatter:
This is only called for body cells.
"""

max_rows = 2**20
max_cols = 2**14

def __init__(self, df, na_rep='', float_format=None, cols=None,
header=True, index=True, index_label=None, merge_cells=False,
inf_rep='inf', style_converter=None):
Expand Down Expand Up @@ -648,6 +651,13 @@ def write(self, writer, sheet_name='Sheet1', startrow=0,
from pandas.io.excel import ExcelWriter
from pandas.io.common import _stringify_path

num_rows, num_cols = self.df.shape
if num_rows > self.max_rows or num_cols > self.max_cols:
raise ValueError("This sheet is too large! Your sheet size is: " +
"{}, {} ".format(num_rows, num_cols) +
"Max sheet size is: {}, {}".
format(self.max_rows, self.max_cols))

if isinstance(writer, ExcelWriter):
need_save = False
else:
Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/io/test_excel.py
Original file line number Diff line number Diff line change
Expand Up @@ -1185,6 +1185,24 @@ class and any subclasses, on account of the `autouse=True`
class TestExcelWriter(_WriterBase):
# Base class for test cases to run with different Excel writers.

def test_excel_sheet_size(self):

# GH 26080
breaking_row_count = 2**20 + 1
breaking_col_count = 2**14 + 1
# purposely using two arrays to prevent memory issues while testing
row_arr = np.zeros(shape=(breaking_row_count, 1))
col_arr = np.zeros(shape=(1, breaking_col_count))
row_df = pd.DataFrame(row_arr)
col_df = pd.DataFrame(col_arr)

msg = "sheet is too large"
with pytest.raises(ValueError, match=msg):
row_df.to_excel(self.path)

with pytest.raises(ValueError, match=msg):
col_df.to_excel(self.path)

def test_excel_sheet_by_name_raise(self, *_):
import xlrd

Expand Down