Skip to content

ENH: Add ods writer #32911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Jun 24, 2020
Merged

ENH: Add ods writer #32911

merged 40 commits into from
Jun 24, 2020

Conversation

roberthdevries
Copy link
Contributor

@roberthdevries roberthdevries commented Mar 22, 2020

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks for taking a stab at this. I realize a lot of this is copy / paste; we may need to think about defining a better base class (not sure yet if a pre-cursor or follow up)

@@ -1166,7 +1173,9 @@ def test_bytes_io(self, engine):
writer.save()

bio.seek(0)
reread_df = pd.read_excel(bio, index_col=0)
if engine != "odf":
engine = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you setting to None because this test doesn't work for odf? If so you should just pytest.xfail() if not implemented or pytest.skip() if not applicable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, when you set engine to None, it uses the default engine (xlrd IIRC)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea so my point is we shouldn't be changing the fixture like this. Should either be an xfail or skip depending on comment above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests are skipped, only the right engine is chosen. When having a byte iostream you have to explicitly pass the engine, or if you pass None it uses the default engine (which is xlrd IIRC).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you take a look at test_readers to see how we handle this? Can ideally ensure we parametrize with the appropriate writer / extension combinations up front; see engine_and_read_ext in test_readers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now implemented an automatic file type recognizer for byte streams. Currently the only choice to be made is between 'xlrd' and 'odf', so it is quite easy.
For the other test I have improved the extension recognition to more types of path.
Plus I have fixed the parameterization of the tests that depend on the various external Excel read and write packages.

rows: DefaultDict = defaultdict(TableRow)
col_count: DefaultDict = defaultdict(int)

for cell in sorted(cells, key=lambda cell: (cell.row, cell.col)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for the sorted call here? We don't do this for other writers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to add cells and rows to the odf file we have to add stuff in order. I am basically building up an XML file here. Adding cells/rows out of order would cause a sheet with data in the wrong cells

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but aren't these already in the order you need? Noted on comment but still not clear on why this needs to be done differently for odf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the time yes, but there were a couple of tests where this was not the case (I forgot which one, but I think this had something to do with adding an index)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm still not clear on why this is needed and it slightly different from the other writers; can you try to remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this the following tests fail:

FAILED pandas/tests/io/excel/test_writers.py::TestRoundTrip::test_excel_multindex_roundtrip[1-1-True-False-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestRoundTrip::test_excel_multindex_roundtrip[3-1-True-False-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_roundtrip_indexlabels[True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_roundtrip_indexlabels[False-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_roundtrip_indexname[True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_roundtrip_indexname[False-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex[True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex[False-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_nan_label[True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_nan_label[False-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_cols[False-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_dates[True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_dates[False-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[True-1-1-True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[True-1-2-True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[True-1-3-True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[False-1-1-True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[False-1-2-True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[False-1-3-True-odf-.ods]

@WillAyd WillAyd added the IO Excel read_excel, to_excel label Mar 23, 2020
@roberthdevries roberthdevries requested a review from WillAyd March 24, 2020 21:24
@roberthdevries
Copy link
Contributor Author

@WillAyd I am not sure where "the lot of copy paste" comes from. It is true that I started off from _xlwt.py, but there are just 20 unmodified lines of the 321. Most of which are the standard methods required for each engine.

@@ -1166,7 +1173,9 @@ def test_bytes_io(self, engine):
writer.save()

bio.seek(0)
reread_df = pd.read_excel(bio, index_col=0)
if engine != "odf":
engine = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you take a look at test_readers to see how we handle this? Can ideally ensure we parametrize with the appropriate writer / extension combinations up front; see engine_and_read_ext in test_readers

rows: DefaultDict = defaultdict(TableRow)
col_count: DefaultDict = defaultdict(int)

for cell in sorted(cells, key=lambda cell: (cell.row, cell.col)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm still not clear on why this is needed and it slightly different from the other writers; can you try to remove?

@pep8speaks
Copy link

pep8speaks commented Mar 28, 2020

Hello @roberthdevries! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-24 12:34:37 UTC

@alimcmaster1
Copy link
Member

Can you merge master - thanks!

@roberthdevries
Copy link
Contributor Author

Rebased to master

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK looking pretty good - thanks

if engine is None:
engine = "xlrd"
if isinstance(path_or_io, IOBase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think we need to do this - are we doing this for xls vs .xlsx? Maybe better as a separate PR if you feel strongly and I’m wrong about the existing file types

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree can you revert this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I do this because of .xls* vs. .ods, and no it cannot be reverted unless tons of tests fail.

engine = "odf"
else:
ext = os.path.splitext(str(path_or_io))[-1]
if ext == ".ods":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than this should register the writer globally

with cf.config_prefix("io.excel.xlsx"):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weird thing is that this stuff does not seem to be used anywhere. Correct me if I'm wrong. I added a similar bit for the OpenOffice file format, but it did not seem to be called/tested anywhere.

class ExcelFile:
"""
Class for parsing tabular excel sheets into DataFrame objects.
Uses xlrd. See read_excel for more documentation

Parameters
----------
io : str, path object (pathlib.Path or py._path.local.LocalPath),
path_or_io : str, path object (pathlib.Path or py._path.local.LocalPath),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a path or a file-like object. The current doc-string is wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk, rename to path_or_buffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

if engine is None:
engine = "xlrd"
if isinstance(path_or_io, IOBase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree can you revert this

return self.book.save(self.path)

def write_cells(
self, cells, sheet_name=None, startrow=0, startcol=0, freeze_panes=None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type signatures whereever possible & provide full doc-strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added type signatures + doc strings

for row_nr in range(max(rows.keys()) + 1):
wks.addElement(rows[row_nr])

def _make_table_cell_attributes(self, cell):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type & doc-string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added type info + docstrings

@jreback jreback added this to the 1.1 milestone May 2, 2020
@jreback
Copy link
Contributor

jreback commented May 25, 2020

@roberthdevries can you merge master and will look again.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can type and doc-string as much as possible (could type as a follow up if @WillAyd ok with this)

@@ -288,6 +288,7 @@ Other enhancements
- :meth:`HDFStore.put` now accepts `track_times` parameter. Parameter is passed to ``create_table`` method of ``PyTables`` (:issue:`32682`).
- Make :class:`pandas.core.window.Rolling` and :class:`pandas.core.window.Expanding` iterable(:issue:`11704`)
- Make ``option_context`` a :class:`contextlib.ContextDecorator`, which allows it to be used as a decorator over an entire function (:issue:`34253`).
- :meth:`DataFrame.to_excel` can now also generate OpenOffice spreadsheet (.ods) files (:issue:`27222`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate -> write

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed according to your suggestion

class ExcelFile:
"""
Class for parsing tabular excel sheets into DataFrame objects.
Uses xlrd. See read_excel for more documentation

Parameters
----------
io : str, path object (pathlib.Path or py._path.local.LocalPath),
path_or_io : str, path object (pathlib.Path or py._path.local.LocalPath),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk, rename to path_or_buffer

@@ -809,17 +819,24 @@ class ExcelFile:
"pyxlsb": _PyxlsbReader,
}

def __init__(self, io, engine=None):
def __init__(self, path_or_io, engine=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use path_or_buffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed as you suggested

engine = "odf"
supported_extensions = (".ods",)

def __init__(self, path, engine=None, encoding=None, mode="w", **engine_kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type as much as possible here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added type info

self.book = OpenDocumentSpreadsheet()
self._style_dict: Dict[str, str] = {}

def save(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type as much as possible (e.g. -> None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added type info

@jreback
Copy link
Contributor

jreback commented Jun 8, 2020

can you rebase and update

@roberthdevries roberthdevries force-pushed the add-ods-writer branch 2 times, most recently from 5eb1cc2 to bb931c1 Compare June 14, 2020 20:37
@@ -781,6 +778,19 @@ def close(self):
return self.save()


def _is_ods_stream(stream):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type this function inputs & outputs & add a doc-string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

for row_nr in range(max(rows.keys()) + 1):
wks.addElement(rows[row_nr])

def _make_table_cell_attributes(self, cell) -> Dict[str, object]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type here

Copy link
Contributor Author

@roberthdevries roberthdevries Jun 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

attributes["numbercolumnsspanned"] = cell.mergeend
return attributes

def _make_table_cell(self, cell) -> Tuple[str, object]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here, etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a bit more difficult as the type is an odf.table.TableCell. As the odf package is is not installed on most test environments, this type cannot be added or the test will fail.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. can always open an issue for followups if needed. cc @WillAyd

@WillAyd WillAyd merged commit e8dcaf9 into pandas-dev:master Jun 24, 2020
@WillAyd
Copy link
Member

WillAyd commented Jun 24, 2020

Great thanks @roberthdevries

@roberthdevries
Copy link
Contributor Author

You're welcome

@roberthdevries roberthdevries deleted the add-ods-writer branch June 27, 2020 14:01
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request Aug 28, 2020
rhshadrach pushed a commit that referenced this pull request Aug 29, 2020
AlexKirko pushed a commit to AlexKirko/pandas that referenced this pull request Aug 31, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wish: Write support for Open Document Spreadsheet (ODS)
5 participants