Skip to content

ENH: Add check for character limmit #57103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Feb 1, 2024
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ enhancement2

Other enhancements
^^^^^^^^^^^^^^^^^^
-
- :func:`DataFrame.to_excel` now raises an ``UserWarning`` when the character count in a cell exceeds Excel's limitation of 32767 characters (:issue:`56954`)
-

.. ---------------------------------------------------------------------------
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2363,6 +2363,10 @@ def to_excel(
Once a workbook has been saved it is not possible to write further
data without rewriting the whole workbook.

pandas will check the number of rows, columns,
and cell character count does not exceed Excel's limitations.
All other limitations must be checked by the user.

Examples
--------

Expand Down
10 changes: 9 additions & 1 deletion pandas/io/excel/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1326,7 +1326,15 @@ def _value_with_fmt(
fmt = "0"
else:
val = str(val)

# GH#56954
# Excel's limitation on cell contents is 32767 characters
# xref https://support.microsoft.com/en-au/office/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3
if len(val) > 32767:
warnings.warn(
"Cell contents too long, truncated to 32767 characters",
UserWarning,
stacklevel=find_stack_level(),
)
return val, fmt

@classmethod
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/io/excel/test_writers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1385,6 +1385,18 @@ def test_to_excel_empty_frame(self, engine, ext):
expected = DataFrame()
tm.assert_frame_equal(result, expected)

def test_to_excel_raising_warning_when_cell_character_exceed_limit(
self, path, engine
):
# GH#56954
df = DataFrame({"A": ["a" * 32768]})
msg = "Cell contents too long, truncated to 32767 characters"
with tm.assert_produces_warning(
UserWarning, match=msg, raise_on_extra_warnings=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What extra warning is raised here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From commit babdf29,

the check https://github.com/pandas-dev/pandas/actions/runs/7678882560/job/20929191559 rise pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_raising_warning_when_cell_character_exceed_limit[openpyxl-.xlsx] - AssertionError: Caused unexpected warning(s): [('EncodingWarning', EncodingWarning("'encoding' argument not specified"), '/home/runner/micromamba/envs/test/lib/python3.10/site-packages/openpyxl/writer/excel.py', 199)]

and check https://github.com/pandas-dev/pandas/actions/runs/7678882560/job/20929192274 rasie pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_raising_warning_when_cell_character_exceed_limit[openpyxl-.xlsm] - AssertionError: Caused unexpected warning(s): [('DeprecationWarning', DeprecationWarning('datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).'), '/home/runner/micromamba/envs/test/lib/python3.12/site-packages/openpyxl/packaging/core.py', 99), ('DeprecationWarning', DeprecationWarning('datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).'), '/home/runner/micromamba/envs/test/lib/python3.12/site-packages/openpyxl/writer/excel.py', 292)]

pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_raising_warning_when_cell_character_exceed_limit is what we added in this pr. I just wrote the test following the other's in same file. If there is a better way to avoid these extra warnings, I am very happy to change codes!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Yeah since these are external warnings this is fine for now

):
buf = BytesIO()
df.to_excel(buf)


class TestExcelWriterEngineTests:
@pytest.mark.parametrize(
Expand Down