ENH: Add ods writer #32911

roberthdevries · 2020-03-22T19:42:30Z

closes Wish: Write support for Open Document Spreadsheet (ODS) #27222
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry
add support for startrow and startcol parameters

WillAyd

Cool thanks for taking a stab at this. I realize a lot of this is copy / paste; we may need to think about defining a better base class (not sure yet if a pre-cursor or follow up)

pandas/io/excel/_odfreader.py

WillAyd · 2020-03-23T02:41:49Z

pandas/tests/io/excel/test_writers.py

@@ -1166,7 +1173,9 @@ def test_bytes_io(self, engine):
        writer.save()

        bio.seek(0)
-        reread_df = pd.read_excel(bio, index_col=0)
+        if engine != "odf":
+            engine = None


Are you setting to None because this test doesn't work for odf? If so you should just pytest.xfail() if not implemented or pytest.skip() if not applicable

No, when you set engine to None, it uses the default engine (xlrd IIRC)

Yea so my point is we shouldn't be changing the fixture like this. Should either be an xfail or skip depending on comment above

No tests are skipped, only the right engine is chosen. When having a byte iostream you have to explicitly pass the engine, or if you pass None it uses the default engine (which is xlrd IIRC).

Can you take a look at test_readers to see how we handle this? Can ideally ensure we parametrize with the appropriate writer / extension combinations up front; see engine_and_read_ext in test_readers

I have now implemented an automatic file type recognizer for byte streams. Currently the only choice to be made is between 'xlrd' and 'odf', so it is quite easy.
For the other test I have improved the extension recognition to more types of path.
Plus I have fixed the parameterization of the tests that depend on the various external Excel read and write packages.

WillAyd · 2020-03-23T02:45:35Z

pandas/io/excel/_odswriter.py

+        rows: DefaultDict = defaultdict(TableRow)
+        col_count: DefaultDict = defaultdict(int)
+
+        for cell in sorted(cells, key=lambda cell: (cell.row, cell.col)):


Is there a reason for the sorted call here? We don't do this for other writers?

In order to add cells and rows to the odf file we have to add stuff in order. I am basically building up an XML file here. Adding cells/rows out of order would cause a sheet with data in the wrong cells

Right but aren't these already in the order you need? Noted on comment but still not clear on why this needs to be done differently for odf

Most of the time yes, but there were a couple of tests where this was not the case (I forgot which one, but I think this had something to do with adding an index)

Hmm still not clear on why this is needed and it slightly different from the other writers; can you try to remove?

Without this the following tests fail:

FAILED pandas/tests/io/excel/test_writers.py::TestRoundTrip::test_excel_multindex_roundtrip[1-1-True-False-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestRoundTrip::test_excel_multindex_roundtrip[3-1-True-False-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_roundtrip_indexlabels[True-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_roundtrip_indexlabels[False-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_roundtrip_indexname[True-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_roundtrip_indexname[False-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex[True-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex[False-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_nan_label[True-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_nan_label[False-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_cols[False-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_dates[True-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_dates[False-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[True-1-1-True-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[True-1-2-True-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[True-1-3-True-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[False-1-1-True-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[False-1-2-True-odf-.ods] FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[False-1-3-True-odf-.ods]

roberthdevries · 2020-03-25T21:15:50Z

@WillAyd I am not sure where "the lot of copy paste" comes from. It is true that I started off from _xlwt.py, but there are just 20 unmodified lines of the 321. Most of which are the standard methods required for each engine.

WillAyd · 2020-03-26T00:05:02Z

pandas/tests/io/excel/test_writers.py

@@ -1166,7 +1173,9 @@ def test_bytes_io(self, engine):
        writer.save()

        bio.seek(0)
-        reread_df = pd.read_excel(bio, index_col=0)
+        if engine != "odf":
+            engine = None


Can you take a look at test_readers to see how we handle this? Can ideally ensure we parametrize with the appropriate writer / extension combinations up front; see engine_and_read_ext in test_readers

WillAyd · 2020-03-26T00:06:31Z

pandas/io/excel/_odswriter.py

+        rows: DefaultDict = defaultdict(TableRow)
+        col_count: DefaultDict = defaultdict(int)
+
+        for cell in sorted(cells, key=lambda cell: (cell.row, cell.col)):


Hmm still not clear on why this is needed and it slightly different from the other writers; can you try to remove?

pep8speaks · 2020-03-28T15:01:52Z

Hello @roberthdevries! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-24 12:34:37 UTC

alimcmaster1 · 2020-04-15T00:01:22Z

Can you merge master - thanks!

roberthdevries · 2020-04-19T12:25:12Z

Rebased to master

WillAyd

OK looking pretty good - thanks

WillAyd · 2020-04-22T03:41:45Z

pandas/io/excel/_base.py

        if engine is None:
            engine = "xlrd"
+            if isinstance(path_or_io, IOBase):


I don’t think we need to do this - are we doing this for xls vs .xlsx? Maybe better as a separate PR if you feel strongly and I’m wrong about the existing file types

I agree can you revert this

No, I do this because of .xls* vs. .ods, and no it cannot be reverted unless tons of tests fail.

WillAyd · 2020-04-22T03:45:36Z

pandas/io/excel/_base.py

+                    engine = "odf"
+            else:
+                ext = os.path.splitext(str(path_or_io))[-1]
+                if ext == ".ods":


Rather than this should register the writer globally

pandas/pandas/core/config_init.py

Line 576 in 120e9d9

with cf.config_prefix("io.excel.xlsx"):

The weird thing is that this stuff does not seem to be used anywhere. Correct me if I'm wrong. I added a similar bit for the OpenOffice file format, but it did not seem to be called/tested anywhere.

pandas/io/excel/_odfreader.py

jreback · 2020-04-26T20:47:01Z

pandas/io/excel/_base.py

 class ExcelFile:
    """
    Class for parsing tabular excel sheets into DataFrame objects.
    Uses xlrd. See read_excel for more documentation

    Parameters
    ----------
-    io : str, path object (pathlib.Path or py._path.local.LocalPath),
+    path_or_io : str, path object (pathlib.Path or py._path.local.LocalPath),


This can be a path or a file-like object. The current doc-string is wrong.

kk, rename to path_or_buffer

jreback · 2020-04-26T20:47:16Z

pandas/io/excel/_base.py

        if engine is None:
            engine = "xlrd"
+            if isinstance(path_or_io, IOBase):


I agree can you revert this

pandas/io/excel/_odfreader.py

pandas/io/excel/_odswriter.py

jreback · 2020-04-26T20:48:48Z

pandas/io/excel/_odswriter.py

+        return self.book.save(self.path)
+
+    def write_cells(
+        self, cells, sheet_name=None, startrow=0, startcol=0, freeze_panes=None


can you type signatures whereever possible & provide full doc-strings

Added type signatures + doc strings

jreback · 2020-04-26T20:49:13Z

pandas/io/excel/_odswriter.py

+        for row_nr in range(max(rows.keys()) + 1):
+            wks.addElement(rows[row_nr])
+
+    def _make_table_cell_attributes(self, cell):


type & doc-string

added type info + docstrings

jreback · 2020-05-25T22:40:37Z

@roberthdevries can you merge master and will look again.

jreback

if you can type and doc-string as much as possible (could type as a follow up if @WillAyd ok with this)

jreback · 2020-05-28T23:22:15Z

doc/source/whatsnew/v1.1.0.rst

@@ -288,6 +288,7 @@ Other enhancements
 - :meth:`HDFStore.put` now accepts `track_times` parameter. Parameter is passed to ``create_table`` method of ``PyTables`` (:issue:`32682`).
 - Make :class:`pandas.core.window.Rolling` and :class:`pandas.core.window.Expanding` iterable（:issue:`11704`)
 - Make ``option_context`` a :class:`contextlib.ContextDecorator`, which allows it to be used as a decorator over an entire function (:issue:`34253`).
+- :meth:`DataFrame.to_excel` can now also generate OpenOffice spreadsheet (.ods) files (:issue:`27222`)


generate -> write

Changed according to your suggestion

jreback · 2020-05-28T23:23:49Z

pandas/io/excel/_base.py

 class ExcelFile:
    """
    Class for parsing tabular excel sheets into DataFrame objects.
    Uses xlrd. See read_excel for more documentation

    Parameters
    ----------
-    io : str, path object (pathlib.Path or py._path.local.LocalPath),
+    path_or_io : str, path object (pathlib.Path or py._path.local.LocalPath),


kk, rename to path_or_buffer

jreback · 2020-05-28T23:24:00Z

pandas/io/excel/_base.py

@@ -809,17 +819,24 @@ class ExcelFile:
        "pyxlsb": _PyxlsbReader,
    }

-    def __init__(self, io, engine=None):
+    def __init__(self, path_or_io, engine=None):


use path_or_buffer

changed as you suggested

pandas/io/excel/_odfreader.py

jreback · 2020-05-28T23:24:45Z

pandas/io/excel/_odswriter.py

+    engine = "odf"
+    supported_extensions = (".ods",)
+
+    def __init__(self, path, engine=None, encoding=None, mode="w", **engine_kwargs):


can you type as much as possible here

added type info

jreback · 2020-05-28T23:25:05Z

pandas/io/excel/_odswriter.py

+        self.book = OpenDocumentSpreadsheet()
+        self._style_dict: Dict[str, str] = {}
+
+    def save(self):


can you type as much as possible (e.g. -> None)

added type info

jreback · 2020-06-08T23:28:54Z

can you rebase and update

jreback · 2020-06-14T20:53:36Z

pandas/io/excel/_base.py

@@ -781,6 +778,19 @@ def close(self):
        return self.save()


+def _is_ods_stream(stream):


can you type this function inputs & outputs & add a doc-string

pandas/io/excel/_odswriter.py

jreback · 2020-06-14T20:54:46Z

pandas/io/excel/_odswriter.py

+        for row_nr in range(max(rows.keys()) + 1):
+            wks.addElement(rows[row_nr])
+
+    def _make_table_cell_attributes(self, cell) -> Dict[str, object]:


can you type here

jreback · 2020-06-14T20:54:56Z

pandas/io/excel/_odswriter.py

+            attributes["numbercolumnsspanned"] = cell.mergeend
+        return attributes
+
+    def _make_table_cell(self, cell) -> Tuple[str, object]:


and here, etc

This one is a bit more difficult as the type is an odf.table.TableCell. As the odf package is is not installed on most test environments, this type cannot be added or the test will fail.

pandas/tests/io/excel/test_writers.py

… test

…odf files

jreback

lgtm. can always open an issue for followups if needed. cc @WillAyd

WillAyd · 2020-06-24T15:14:06Z

Great thanks @roberthdevries

roberthdevries · 2020-06-24T17:06:36Z

You're welcome

WillAyd requested changes Mar 23, 2020

View reviewed changes

WillAyd added the IO Excel read_excel, to_excel label Mar 23, 2020

roberthdevries force-pushed the add-ods-writer branch from fbb2068 to 6014c9f Compare March 24, 2020 20:48

roberthdevries requested a review from WillAyd March 24, 2020 21:24

roberthdevries force-pushed the add-ods-writer branch from e6354b6 to 57dcac7 Compare March 25, 2020 21:02

WillAyd requested changes Mar 26, 2020

View reviewed changes

roberthdevries force-pushed the add-ods-writer branch from 57dcac7 to 35a6524 Compare March 28, 2020 15:01

roberthdevries force-pushed the add-ods-writer branch from 9a1e153 to cf4aba0 Compare April 2, 2020 12:26

roberthdevries force-pushed the add-ods-writer branch from cf4aba0 to 602a591 Compare April 19, 2020 12:23

WillAyd requested changes Apr 22, 2020

View reviewed changes

jreback requested changes Apr 26, 2020

View reviewed changes

jreback added this to the 1.1 milestone May 2, 2020

roberthdevries force-pushed the add-ods-writer branch from 602a591 to b345a28 Compare May 28, 2020 21:47

jreback requested changes May 28, 2020

View reviewed changes

roberthdevries force-pushed the add-ods-writer branch 2 times, most recently from 5eb1cc2 to bb931c1 Compare June 14, 2020 20:37

jreback requested changes Jun 14, 2020

View reviewed changes

roberthdevries added 7 commits June 24, 2020 14:12

WIP: unit tests are still failing for ods write to ods read loop back…

526d756

… test

Create empty cells where needed

165d887

Add support for dates

024cb2d

More date/datetime fixes

341b77c

Make sure the cells and columns are sorted before writing them out

1ead9f0

Pass explicit engine for reading ods files

df321b6

Only check extensions when there is a file with an extension

fbc5b3e

roberthdevries added 14 commits June 24, 2020 14:13

Reformatting fixes (black)

635dd84

Rename parameter path_or_io to path_or_buffer

2de7755

Add doc-strings and type annotations

19f0a5c

Update whatsnew according to suggestion by jreback

89f742f

Black reformatting

171fc61

Fix some type annotations

0d15a20

Some type fixes

336c231

Revert some of the typing fixes as they break some of the builds

3edfbd8

More mypy typing fixes

97707b8

Add more typing info

45467d2

And yet more typing fixes

b14847d

Add doc-string and type info to _is_ods_stream

d4d3a7c

Fix import order

f82f4d4

Add test to check exception when writing in append mode

f20e2cc

roberthdevries force-pushed the add-ods-writer branch from d897985 to f20e2cc Compare June 24, 2020 12:13

Add whatsnew entry for extra bug fix in read_excel for 0.0 values in …

9e2684f

…odf files

roberthdevries requested review from WillAyd and jreback June 24, 2020 12:36

jreback approved these changes Jun 24, 2020

View reviewed changes

WillAyd approved these changes Jun 24, 2020

View reviewed changes

WillAyd merged commit e8dcaf9 into pandas-dev:master Jun 24, 2020

roberthdevries deleted the add-ods-writer branch June 27, 2020 14:01

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request Aug 28, 2020

TYP: misc typing cleanups for pandas-dev#32911

2a1010a

rhshadrach pushed a commit that referenced this pull request Aug 29, 2020

TYP: misc typing cleanups for #32911 (#35954)

92e4bd5

AlexKirko pushed a commit to AlexKirko/pandas that referenced this pull request Aug 31, 2020

TYP: misc typing cleanups for pandas-dev#32911 (pandas-dev#35954)

977b3da

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

TYP: misc typing cleanups for pandas-dev#32911 (pandas-dev#35954)

c3e0e39

rhshadrach mentioned this pull request Nov 19, 2022

BUG: Merging cells doesn't work when writting MultiIndex to .ods file #49779

Open

3 tasks

vamsi-verma-s mentioned this pull request Nov 19, 2022

DOC: to_excel OpenDocument Spreadsheets (ODS) write support not updated #49790

Closed

1 task

		@@ -781,6 +778,19 @@ def close(self):
		return self.save()


		def _is_ods_stream(stream):

Uh oh!

ENH: Add ods writer #32911

ENH: Add ods writer #32911

Uh oh!

Conversation

roberthdevries commented Mar 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

roberthdevries commented Mar 25, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pep8speaks commented Mar 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-06-24 12:34:37 UTC

Uh oh!

alimcmaster1 commented Apr 15, 2020

Uh oh!

roberthdevries commented Apr 19, 2020

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

roberthdevries commented Mar 22, 2020 •

edited

Loading

pep8speaks commented Mar 28, 2020 •

edited

Loading