Skip to content

ENH: Adding engine_kwargs to DataFrame.to_excel #53220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
May 20, 2023
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
31af5a2
Implementing initial logic to add engine_kwargs to DataFrame.to_excel
rmhowe425 Apr 9, 2023
5943856
Implementing logic to add engine_kwards to to_excel. Adding unit tests
rmhowe425 May 13, 2023
4a7994f
Documenting the enhancement for this GH issue
rmhowe425 May 14, 2023
1f2ef68
Merge branch 'main' into dev/to_excel
rmhowe425 May 14, 2023
72c5f80
Fixing formatting errors
rmhowe425 May 14, 2023
5ad0e10
Fixing formatting issues
rmhowe425 May 14, 2023
a1799a1
Fixing documentation errors and fixing unit test errors.
rmhowe425 May 14, 2023
1d6677f
Fixing unit test errors
rmhowe425 May 14, 2023
cca1f6c
Updating discrepancies in documentation, restricting except statement…
rmhowe425 May 15, 2023
7d42a15
Adding blank line after my contribution in the whatsnew file
rmhowe425 May 16, 2023
0e86804
Fixing a discrepancy with my unit tests
rmhowe425 May 16, 2023
b016edd
Fixing discrepancy with unit test
rmhowe425 May 16, 2023
7c63fc9
Fixing discrepancy with unit test
rmhowe425 May 16, 2023
954a727
Fixing discrepancy with unit test
rmhowe425 May 16, 2023
9938978
Fixing discrepancy with unit test
rmhowe425 May 16, 2023
b095e10
Fixing discrepancy with unit test
rmhowe425 May 16, 2023
f3dd277
Updating documentation and syntactical discrepancies based on feedback
rmhowe425 May 17, 2023
bbe1d3b
Removing finally statement in _xlsxwriter.py to troubleshoot failing …
rmhowe425 May 17, 2023
010e224
Fixing discrepancies with unit tests. All local unit tests are passing.
rmhowe425 May 17, 2023
7e023ad
Fixing discrepancies with unit tests. All local unit tests are passing.
rmhowe425 May 17, 2023
5b76c06
Updating error messages in unit tests to ensure that tests are failin…
rmhowe425 May 17, 2023
c014e4f
Updating error messages in unit tests to ensure that tests are failin…
rmhowe425 May 18, 2023
e31ec6a
Updating error messages in unit tests to ensure that tests are failin…
rmhowe425 May 19, 2023
50ef9f8
Updating error messages in unit tests to ensure that tests are failin…
rmhowe425 May 19, 2023
cbad00f
Updating error messages in unit tests to ensure that tests are failin…
rmhowe425 May 19, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3785,6 +3785,15 @@ one can pass an :class:`~pandas.io.excel.ExcelWriter`.

.. _io.excel_writing_buffer:

When using the ``engine_kwargs`` parameter, pandas will pass these arguments to the
engine. For this, it is important to know which function pandas is using internally.

* For the engine openpyxl, pandas is using :func:`openpyxl.Workbook` to create a new sheet and :func:`openpyxl.load_workbook` to append data to an existing sheet. The openpyxl engine writes to (``.xlsx``) and (``.xlsm``) files.

* For the engine xlsxwriter, pandas is using :func:`xlsxwriter.Workbook` to write to (``.xlsx``) files.

* For the engine odf, pandas is using :func:`odf.opendocument.OpenDocumentSpreadsheet` to write to (``.ods``) files.

Writing Excel files to memory
+++++++++++++++++++++++++++++

Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,10 @@ Other enhancements
- Let :meth:`DataFrame.to_feather` accept a non-default :class:`Index` and non-string column names (:issue:`51787`)
- Performance improvement in :func:`read_csv` (:issue:`52632`) with ``engine="c"``
- :meth:`Categorical.from_codes` has gotten a ``validate`` parameter (:issue:`50975`)
- Added ``engine_kwargs`` parameter to :meth:`DataFrame.to_excel` (:issue:`53220`)
- Performance improvement in :func:`concat` with homogeneous ``np.float64`` or ``np.float32`` dtypes (:issue:`52685`)
- Performance improvement in :meth:`DataFrame.filter` when ``items`` is given (:issue:`52941`)
-

.. ---------------------------------------------------------------------------
.. _whatsnew_210.notable_bug_fixes:
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2155,6 +2155,7 @@ def to_excel(
inf_rep: str = "inf",
freeze_panes: tuple[int, int] | None = None,
storage_options: StorageOptions = None,
engine_kwargs: dict[str, Any] | None = None,
) -> None:
"""
Write {klass} to an Excel sheet.
Expand Down Expand Up @@ -2211,6 +2212,8 @@ def to_excel(
{storage_options}

.. versionadded:: {storage_options_versionadded}
engine_kwargs : dict, optional
Arbitrary keyword arguments passed to excel engine.

See Also
--------
Expand Down Expand Up @@ -2263,6 +2266,8 @@ def to_excel(

>>> df1.to_excel('output1.xlsx', engine='xlsxwriter') # doctest: +SKIP
"""
if engine_kwargs is None:
engine_kwargs = {}

df = self if isinstance(self, ABCDataFrame) else self.to_frame()

Expand All @@ -2287,6 +2292,7 @@ def to_excel(
freeze_panes=freeze_panes,
engine=engine,
storage_options=storage_options,
engine_kwargs=engine_kwargs,
)

@final
Expand Down
6 changes: 5 additions & 1 deletion pandas/io/excel/_xlsxwriter.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,11 @@ def __init__(
engine_kwargs=engine_kwargs,
)

self._book = Workbook(self._handles.handle, **engine_kwargs)
try:
self._book = Workbook(self._handles.handle, **engine_kwargs)
except TypeError:
self._handles.handle.close()
raise

@property
def book(self):
Expand Down
11 changes: 10 additions & 1 deletion pandas/io/formats/excel.py
Original file line number Diff line number Diff line change
Expand Up @@ -901,6 +901,7 @@ def write(
freeze_panes: tuple[int, int] | None = None,
engine: str | None = None,
storage_options: StorageOptions = None,
engine_kwargs: dict | None = None,
) -> None:
"""
writer : path-like, file-like, or ExcelWriter object
Expand All @@ -922,6 +923,8 @@ def write(
{storage_options}

.. versionadded:: 1.2.0
engine_kwargs: dict, optional
Arbitrary keyword arguments passed to excel engine.
"""
from pandas.io.excel import ExcelWriter

Expand All @@ -932,14 +935,20 @@ def write(
f"Max sheet size is: {self.max_rows}, {self.max_cols}"
)

if engine_kwargs is None:
engine_kwargs = {}

formatted_cells = self.get_formatted_cells()
if isinstance(writer, ExcelWriter):
need_save = False
else:
# error: Cannot instantiate abstract class 'ExcelWriter' with abstract
# attributes 'engine', 'save', 'supported_extensions' and 'write_cells'
writer = ExcelWriter( # type: ignore[abstract]
writer, engine=engine, storage_options=storage_options
writer,
engine=engine,
storage_options=storage_options,
engine_kwargs=engine_kwargs,
)
need_save = True

Expand Down
21 changes: 21 additions & 0 deletions pandas/tests/io/excel/test_writers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1115,6 +1115,27 @@ def test_bytes_io(self, engine):
reread_df = pd.read_excel(bio, index_col=0)
tm.assert_frame_equal(df, reread_df)

def test_engine_kwargs(self, engine, path):
# GH#52368
df = DataFrame([{"A": 1, "B": 2}, {"A": 3, "B": 4}])

msgs = {
"odf": r"OpenDocumentSpreadsheet() got an unexpected keyword "
r"argument 'foo'",
"openpyxl": r"load_workbook() got an unexpected keyword argument 'foo'",
"xlsxwriter": r"__init__() got an unexpected keyword argument 'foo'",
}

# Handle change in error message for openpyxl (write and append mode)
if engine == "openpyxl" and os.path.exists(path):
msgs["openpyxl"] = r"__init__() got an unexpected keyword argument 'foo'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For xlsxwriter and the append mode of openpyxl, I would not expect this error message. It appears that foo is attempting to be passed to OpenpyxlWriter.__init__ rather than stored in engine_kwargs. Is that the case?

Copy link
Contributor Author

@rmhowe425 rmhowe425 May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach For openpyxl.load_workbook, we are getting the correct error message in my unit test of load_workbook() got an unexpected keyword argument 'foo'

For xlsxwriter (there is no append mode) and the write mode of openpyxl, both methods are passed to <engine>.Workbook().

238488486-9a5a8058-aae1-4f2f-936d-fc04d01346f9

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach Yeah looking at the workbook class being instantiated in both _xlsxwriter.py and _openpyxl.py, the error message handling should be correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - the __init__ that raises is part of the engine. Can you check the message for e.g. Workbook.__init__() to make sure we're not raising in the wrong place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the traceback was changed in Python 3.10; in Python 3.9 you only get __init__().... Can you import PY310 from pandas.compat._constants and check for Workbook.__init__ when true? Note this will be true whenever the Python version is 3.10 or greater.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rhshadrach! I was going down a rabbit hole last night troubleshooting this. I was beginning to think that the issue had something to do with how we were raising the Type error in _openpyxl.py and _xlsxwriter.py without an Exception clause.


with pytest.raises(TypeError, match=re.escape(msgs[engine])):
df.to_excel(
path,
engine_kwargs={"foo": "bar"},
)

def test_write_lists_dict(self, path):
# see gh-8188.
df = DataFrame(
Expand Down