ENH: Allow overwriting existing sheets when appending to excel files #40230

mirober · 2021-03-04T19:16:52Z

Is your feature request related to a problem?

As far as I can tell there is no way to write a DataFrame to a specific (named) sheet in an existing Excel file, creating the sheet if it doesn't exist or overwriting if it does. The closest thing to this that exists is the openpyxl append mode, but if you try to write to sheet_which_already_exists with this, the data will be written to sheet_which_already_exists1. Using write mode overwrites the entire book.

Being able to overwrite existing sheets would be useful for populating "raw data" sheets in existing reports/publications on a regular basis, with pivot tables etc updating automatically.

This seems to be a very common issue:
https://stackoverflow.com/questions/62618680/overwrite-an-excel-sheet-with-pandas-dataframe-without-affecting-other-sheets
https://stackoverflow.com/questions/47602533/append-a-pandas-dataframe-to-an-existing-excel-table?noredirect=1&lq=1
https://stackoverflow.com/questions/34744863/python-how-to-use-excelwriter-to-write-into-an-existing-worksheet?noredirect=1&lq=1
#28653
#29990

Describe the solution you'd like

There are a few potential behaviours someone might want when they write to an existing sheet in append mode:

the current behaviour, which creates a new sheet with a different name
to delete the current contents of the sheet and write their data
to write the data without deleting anything, for example to add columns or rows to an existing table, or a new table in a different position

Originally I considered adding a flag like overwrite=True to pd.ExcelWriter but I don't think this would be sufficient to cover all the potential behaviours, so would suggest an enum along the lines of the if_exists option in df.to_sql, with the following options:

new_sheet: Create a new sheet with a different name.
overwrite_sheet: Delete the contents of the sheet, then write to it.
overwrite_cells: Write directly to the named sheet without deleting the previous contents.

This option would only be relevant when using mode="a" with the openpyxl engine.

API breaking implications

The new_sheet option would replicate the current behaviour, so this could be made the default.

Additional context

Demonstration of current behaviour:

import pandas as pd
from openpyxl import load_workbook
import io

df1 = pd.DataFrame({
    "col1": ["apple", "banana", "carrot"],
    "col2": [1, 2, 3],
})
df2 = pd.DataFrame({"col1": [5]})

excel = io.BytesIO()

with pd.ExcelWriter(excel, engine="openpyxl", mode="w") as writer:
    # write two sheets
    df1.to_excel(writer, sheet_name="sheet1", index=False)
    df1.to_excel(writer, sheet_name="sheet2", index=False)

with pd.ExcelWriter(excel, engine="openpyxl", mode="a") as writer:
    # try to overwrite one of them
    df2.to_excel(writer, sheet_name="sheet1", index=False)

wb = load_workbook(excel)
# sheet1, sheet2, sheet11
print(wb.sheetnames)

Why does it work this way? At the moment, when the openpyxl ExcelWriter is initialised in append mode it loads the workbook, but does not update self.sheets to reflect the sheets in the loaded file.
https://github.com/pandas-dev/pandas/blob/master/pandas/io/excel/_openpyxl.py#L54-L58

This seems unintentional and means that when write_cells is called with a sheet name that already exists in the file, it is not found in self.sheets:
https://github.com/pandas-dev/pandas/blob/master/pandas/io/excel/_openpyxl.py#L414-L419
so we follow the else branch, and when assigning to wks.title openpyxl suffixes the sheet name to avoid a naming conflict.

The text was updated successfully, but these errors were encountered:

jreback · 2021-03-05T00:36:05Z

I sympathize, but why is this a pandas issue when the underlying writer's don't support this?

jreback · 2021-03-05T00:36:23Z

cc @rhshadrach

rhshadrach · 2021-03-05T01:47:21Z

IIUC, pandas is not updating self.sheets when mode is append, so the check if sheet_name in self.sheets: is False, which makes us run wks = self.book.create_sheet(). On top of this, I expect it to be easy to remove the content of the existing sheet if the user would like to replace it entirely.

Assuming all that needs to be done here is updating self.sheets appropriately and possibly removing the content of an existing sheet, I'm generally positive toward the idea. My only remaining question is if this can be implemented for all excel engines, or if not, which ones (I've only looked at openpyxl).

I think the choosing a name similar to DataFrame.to_sql makes sense, but if_exists sounds like it is applying to the entire workbook to me. Would if_sheet_exists be more clear here? Also some alternative suggestions to the values:

if_sheet_exists='fail'
if_sheet_exists='replace'
if_sheet_exists='overwrite'
if_sheet_exists='avoid'

mirober · 2021-03-05T17:20:14Z

I sympathize, but why is this a pandas issue when the underlying writer's don't support this?

My only remaining question is if this can be implemented for all excel engines, or if not, which ones (I've only looked at openpyxl).

I think this option only really makes sense in append mode, and therefore only with the openpyxl engine. In write mode the user is in complete control so there is no real "existing sheet" problem. Openpyxl supports all of the options I've suggested, the reason pandas does what it does at the moment seems to be just a quirk of the OpenpyxlWriter initialisation code.

I think the choosing a name similar to DataFrame.to_sql makes sense, but if_exists sounds like it is applying to the entire workbook to me. Would if_sheet_exists be more clear here? Also some alternative suggestions to the values:

Those seem reasonable names to me. An alternative to 'avoid' might be 'create' or 'new'? I'm fairly indifferent

rhshadrach · 2021-03-05T23:27:05Z

Makes sense - while I don't like adding engine-specific arguments, IMO this is still a clear enough use-case that I'm +1. cc @jreback for any thoughts.

I do like 'new' instead of 'avoid'. Also, maybe if_sheet_exists='clear' is better than if_sheet_exists='replace'.

mirober · 2021-03-06T18:15:08Z

I agree it's not ideal to be adding a bunch of engine-specific options, but I think the functionality is worth it. I can come up with a more detailed motivating example if it's not obvious what kind of thing it would allow.

I think replace is clearer (heh) than clear. It also matches the if_exists="replace" argument to df.to_sql and does more or less the same thing: truncate the target location and then write the dataframe.

mirober added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 4, 2021

mirober mentioned this issue Mar 4, 2021

ENH: Add if_sheet_exists parameter to ExcelWriter #40231

Merged

4 tasks

jreback added the IO Excel read_excel, to_excel label Mar 5, 2021

lithomas1 removed the Needs Triage Issue that has not been reviewed by a pandas team member label Mar 7, 2021

rhshadrach mentioned this issue Mar 7, 2021

EHN: Refactor pd.read_excel/DataFrame.to_excel like DataFrame.plot #40291

Closed

jreback added this to the 1.3 milestone Apr 21, 2021

rhshadrach closed this as completed in #40231 Apr 22, 2021

feefladder mentioned this issue Jun 25, 2021

ENH: #40231 does not allow writing multiple dataframes to a single sheet #42221

Closed

fdzuJ mentioned this issue Jul 16, 2021

ENH: write to existing sheet of excel file #42559

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Allow overwriting existing sheets when appending to excel files #40230

ENH: Allow overwriting existing sheets when appending to excel files #40230

mirober commented Mar 4, 2021

jreback commented Mar 5, 2021

jreback commented Mar 5, 2021

rhshadrach commented Mar 5, 2021 •

edited

Loading

mirober commented Mar 5, 2021

rhshadrach commented Mar 5, 2021

mirober commented Mar 6, 2021

ENH: Allow overwriting existing sheets when appending to excel files #40230

ENH: Allow overwriting existing sheets when appending to excel files #40230

Comments

mirober commented Mar 4, 2021

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Additional context

jreback commented Mar 5, 2021

jreback commented Mar 5, 2021

rhshadrach commented Mar 5, 2021 • edited Loading

mirober commented Mar 5, 2021

rhshadrach commented Mar 5, 2021

mirober commented Mar 6, 2021

rhshadrach commented Mar 5, 2021 •

edited

Loading