Skip to content

ENH: Add check for character limmit #57103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Feb 1, 2024
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ enhancement2

Other enhancements
^^^^^^^^^^^^^^^^^^
-
- :func:`to_excel` now raises a ``ValueError`` when the character count in a cell exceeds Excel's limitation of 32767 characters (:issue:`56954`)
-

.. ---------------------------------------------------------------------------
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2363,6 +2363,10 @@ def to_excel(
Once a workbook has been saved it is not possible to write further
data without rewriting the whole workbook.

pandas will check the number of rows, columns,
and cell character count does not exceed Excel's limitations.
All other limitations must be checked by the user.

Examples
--------

Expand Down
5 changes: 5 additions & 0 deletions pandas/io/excel/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1326,6 +1326,11 @@ def _value_with_fmt(
fmt = "0"
else:
val = str(val)
# GH#56954
if len(val) > 32767:
raise ValueError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a reference where 32767 comes from?

Also I think this should be a UserWarning and not a ValueError. We shouldn't completely stop writing the excel file due to excel's limitation

Copy link
Member

@rhshadrach rhshadrach Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had suggested an error over a warning in #57103 (comment) primarily due to the fact that we raise an error if the frame is too long or wide already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. The reason I suggested warning was due to a similar scenario I remember a while back where we used to raise an error if we detected whether a stack/unstack operation would create "too many elements" and users wanted to bypass that limitation. We ended up changing it to a PerformanceWarning #45084

I slightly prefer to still raise a UserWarning to make pandas less opinionated here (the Excel output is still valid and maybe a truncated value is OK for some users). But if you feel strongly about a ValueError, it's not a blocker for me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - raising here would need a deprecation too, right? That makes me think we should raise a warning. @luke396 - can you turn this back into a warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - raising here would need a deprecation too, right?

Conservatively yes, but it would "fix" an undesired (external) behavior ("bug").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truth be told, I didn't initially utilize 'UserWarning' to the extent you both recommended when I first started out. However, I gained valuable insights from the discussions you provided.

"Cell contents too long, truncated to 32767 characters"
)

return val, fmt

Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/io/excel/test_writers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1385,6 +1385,15 @@ def test_to_excel_empty_frame(self, engine, ext):
expected = DataFrame()
tm.assert_frame_equal(result, expected)

def test_to_excel_raising_error_when_cell_character_exceed_limit(
self, path, engine
):
# GH#56954
df = DataFrame({"A": ["a" * 32768]})
msg = "Cell contents too long, truncated to 32767 characters"
with pytest.raises(ValueError, match=msg):
df.to_excel(path, engine=engine)


class TestExcelWriterEngineTests:
@pytest.mark.parametrize(
Expand Down