Skip to content

Commit fbe9511

Browse files
Backport PR #39605: REGR: appending to existing excel file created corrupt files (#39659)
Co-authored-by: Torsten Wörtwein <[email protected]>
1 parent 3ebbc77 commit fbe9511

File tree

3 files changed

+26
-0
lines changed

3 files changed

+26
-0
lines changed

doc/source/whatsnew/v1.2.2.rst

+1
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Fixed regressions
2121
- Fixed regression in :meth:`~DataFrame.to_pickle` failing to create bz2/xz compressed pickle files with ``protocol=5`` (:issue:`39002`)
2222
- Fixed regression in :func:`pandas.testing.assert_series_equal` and :func:`pandas.testing.assert_frame_equal` always raising ``AssertionError`` when comparing extension dtypes (:issue:`39410`)
2323
- Fixed regression in :meth:`~DataFrame.to_csv` opening ``codecs.StreamWriter`` in binary mode instead of in text mode and ignoring user-provided ``mode`` (:issue:`39247`)
24+
- Fixed regression in :meth:`~DataFrame.to_excel` creating corrupt files when appending (``mode="a"``) to an existing file (:issue:`39576`)
2425
- Fixed regression in :meth:`DataFrame.transform` failing in case of an empty DataFrame or Series (:issue:`39636`)
2526
- Fixed regression in :meth:`core.window.rolling.Rolling.count` where the ``min_periods`` argument would be set to ``0`` after the operation (:issue:`39554`)
2627
- Fixed regression in :func:`read_excel` that incorrectly raised when the argument ``io`` was a non-path and non-buffer and the ``engine`` argument was specified (:issue:`39528`)

pandas/io/excel/_openpyxl.py

+5
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from distutils.version import LooseVersion
2+
import mmap
23
from typing import TYPE_CHECKING, Dict, List, Optional
34

45
import numpy as np
@@ -38,6 +39,7 @@ def __init__(
3839
from openpyxl import load_workbook
3940

4041
self.book = load_workbook(self.handles.handle)
42+
self.handles.handle.seek(0)
4143
else:
4244
# Create workbook object with default optimized_write=True.
4345
self.book = Workbook()
@@ -50,6 +52,9 @@ def save(self):
5052
Save workbook to disk.
5153
"""
5254
self.book.save(self.handles.handle)
55+
if "r+" in self.mode and not isinstance(self.handles.handle, mmap.mmap):
56+
# truncate file to the written content
57+
self.handles.handle.truncate()
5358

5459
@classmethod
5560
def _convert_to_style_kwargs(cls, style_dict: dict) -> Dict[str, "Serialisable"]:

pandas/tests/io/excel/test_openpyxl.py

+20
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from distutils.version import LooseVersion
2+
from pathlib import Path
23

34
import numpy as np
45
import pytest
@@ -169,3 +170,22 @@ def test_read_with_bad_dimension(
169170
wb.close()
170171
expected = DataFrame(expected_data)
171172
tm.assert_frame_equal(result, expected)
173+
174+
175+
def test_append_mode_file(ext):
176+
# GH 39576
177+
df = DataFrame()
178+
179+
with tm.ensure_clean(ext) as f:
180+
df.to_excel(f, engine="openpyxl")
181+
182+
with ExcelWriter(f, mode="a", engine="openpyxl") as writer:
183+
df.to_excel(writer)
184+
185+
# make sure that zip files are not concatenated by making sure that
186+
# "docProps/app.xml" only occurs twice in the file
187+
data = Path(f).read_bytes()
188+
first = data.find(b"docProps/app.xml")
189+
second = data.find(b"docProps/app.xml", first + 1)
190+
third = data.find(b"docProps/app.xml", second + 1)
191+
assert second != -1 and third == -1

0 commit comments

Comments
 (0)