Skip to content

Commit 098b970

Browse files
authored
REGR: appending to existing excel file created corrupt files (#39605)
1 parent 82cda0d commit 098b970

File tree

3 files changed

+26
-0
lines changed

3 files changed

+26
-0
lines changed

doc/source/whatsnew/v1.2.2.rst

+1
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Fixed regressions
2121
- Fixed regression in :meth:`~DataFrame.to_pickle` failing to create bz2/xz compressed pickle files with ``protocol=5`` (:issue:`39002`)
2222
- Fixed regression in :func:`pandas.testing.assert_series_equal` and :func:`pandas.testing.assert_frame_equal` always raising ``AssertionError`` when comparing extension dtypes (:issue:`39410`)
2323
- Fixed regression in :meth:`~DataFrame.to_csv` opening ``codecs.StreamWriter`` in binary mode instead of in text mode and ignoring user-provided ``mode`` (:issue:`39247`)
24+
- Fixed regression in :meth:`~DataFrame.to_excel` creating corrupt files when appending (``mode="a"``) to an existing file (:issue:`39576`)
2425
- Fixed regression in :meth:`DataFrame.transform` failing in case of an empty DataFrame or Series (:issue:`39636`)
2526
- Fixed regression in :meth:`core.window.rolling.Rolling.count` where the ``min_periods`` argument would be set to ``0`` after the operation (:issue:`39554`)
2627
- Fixed regression in :func:`read_excel` that incorrectly raised when the argument ``io`` was a non-path and non-buffer and the ``engine`` argument was specified (:issue:`39528`)

pandas/io/excel/_openpyxl.py

+5
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from __future__ import annotations
22

33
from distutils.version import LooseVersion
4+
import mmap
45
from typing import TYPE_CHECKING, Dict, List, Optional
56

67
import numpy as np
@@ -40,6 +41,7 @@ def __init__(
4041
from openpyxl import load_workbook
4142

4243
self.book = load_workbook(self.handles.handle)
44+
self.handles.handle.seek(0)
4345
else:
4446
# Create workbook object with default optimized_write=True.
4547
self.book = Workbook()
@@ -52,6 +54,9 @@ def save(self):
5254
Save workbook to disk.
5355
"""
5456
self.book.save(self.handles.handle)
57+
if "r+" in self.mode and not isinstance(self.handles.handle, mmap.mmap):
58+
# truncate file to the written content
59+
self.handles.handle.truncate()
5560

5661
@classmethod
5762
def _convert_to_style_kwargs(cls, style_dict: dict) -> Dict[str, Serialisable]:

pandas/tests/io/excel/test_openpyxl.py

+20
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from distutils.version import LooseVersion
2+
from pathlib import Path
23

34
import numpy as np
45
import pytest
@@ -169,3 +170,22 @@ def test_read_with_bad_dimension(
169170
wb.close()
170171
expected = DataFrame(expected_data)
171172
tm.assert_frame_equal(result, expected)
173+
174+
175+
def test_append_mode_file(ext):
176+
# GH 39576
177+
df = DataFrame()
178+
179+
with tm.ensure_clean(ext) as f:
180+
df.to_excel(f, engine="openpyxl")
181+
182+
with ExcelWriter(f, mode="a", engine="openpyxl") as writer:
183+
df.to_excel(writer)
184+
185+
# make sure that zip files are not concatenated by making sure that
186+
# "docProps/app.xml" only occurs twice in the file
187+
data = Path(f).read_bytes()
188+
first = data.find(b"docProps/app.xml")
189+
second = data.find(b"docProps/app.xml", first + 1)
190+
third = data.find(b"docProps/app.xml", second + 1)
191+
assert second != -1 and third == -1

0 commit comments

Comments
 (0)