-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add ods writer #32911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add ods writer #32911
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks for taking a stab at this. I realize a lot of this is copy / paste; we may need to think about defining a better base class (not sure yet if a pre-cursor or follow up)
@@ -1166,7 +1173,9 @@ def test_bytes_io(self, engine): | |||
writer.save() | |||
|
|||
bio.seek(0) | |||
reread_df = pd.read_excel(bio, index_col=0) | |||
if engine != "odf": | |||
engine = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you setting to None because this test doesn't work for odf? If so you should just pytest.xfail()
if not implemented or pytest.skip()
if not applicable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, when you set engine to None, it uses the default engine (xlrd IIRC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea so my point is we shouldn't be changing the fixture like this. Should either be an xfail or skip depending on comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No tests are skipped, only the right engine is chosen. When having a byte iostream you have to explicitly pass the engine, or if you pass None
it uses the default engine (which is xlrd
IIRC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you take a look at test_readers to see how we handle this? Can ideally ensure we parametrize with the appropriate writer / extension combinations up front; see engine_and_read_ext
in test_readers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now implemented an automatic file type recognizer for byte streams. Currently the only choice to be made is between 'xlrd' and 'odf', so it is quite easy.
For the other test I have improved the extension recognition to more types of path
.
Plus I have fixed the parameterization of the tests that depend on the various external Excel read and write packages.
rows: DefaultDict = defaultdict(TableRow) | ||
col_count: DefaultDict = defaultdict(int) | ||
|
||
for cell in sorted(cells, key=lambda cell: (cell.row, cell.col)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for the sorted call here? We don't do this for other writers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to add cells and rows to the odf file we have to add stuff in order. I am basically building up an XML file here. Adding cells/rows out of order would cause a sheet with data in the wrong cells
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right but aren't these already in the order you need? Noted on comment but still not clear on why this needs to be done differently for odf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the time yes, but there were a couple of tests where this was not the case (I forgot which one, but I think this had something to do with adding an index)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm still not clear on why this is needed and it slightly different from the other writers; can you try to remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this the following tests fail:
FAILED pandas/tests/io/excel/test_writers.py::TestRoundTrip::test_excel_multindex_roundtrip[1-1-True-False-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestRoundTrip::test_excel_multindex_roundtrip[3-1-True-False-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_roundtrip_indexlabels[True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_roundtrip_indexlabels[False-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_roundtrip_indexname[True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_roundtrip_indexname[False-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex[True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex[False-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_nan_label[True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_nan_label[False-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_cols[False-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_dates[True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_to_excel_multiindex_dates[False-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[True-1-1-True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[True-1-2-True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[True-1-3-True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[False-1-1-True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[False-1-2-True-odf-.ods]
FAILED pandas/tests/io/excel/test_writers.py::TestExcelWriter::test_excel_010_hemstring[False-1-3-True-odf-.ods]
fbb2068
to
6014c9f
Compare
e6354b6
to
57dcac7
Compare
@WillAyd I am not sure where "the lot of copy paste" comes from. It is true that I started off from |
@@ -1166,7 +1173,9 @@ def test_bytes_io(self, engine): | |||
writer.save() | |||
|
|||
bio.seek(0) | |||
reread_df = pd.read_excel(bio, index_col=0) | |||
if engine != "odf": | |||
engine = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you take a look at test_readers to see how we handle this? Can ideally ensure we parametrize with the appropriate writer / extension combinations up front; see engine_and_read_ext
in test_readers
rows: DefaultDict = defaultdict(TableRow) | ||
col_count: DefaultDict = defaultdict(int) | ||
|
||
for cell in sorted(cells, key=lambda cell: (cell.row, cell.col)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm still not clear on why this is needed and it slightly different from the other writers; can you try to remove?
57dcac7
to
35a6524
Compare
Hello @roberthdevries! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-06-24 12:34:37 UTC |
9a1e153
to
cf4aba0
Compare
Can you merge master - thanks! |
cf4aba0
to
602a591
Compare
Rebased to master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK looking pretty good - thanks
pandas/io/excel/_base.py
Outdated
if engine is None: | ||
engine = "xlrd" | ||
if isinstance(path_or_io, IOBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think we need to do this - are we doing this for xls vs .xlsx? Maybe better as a separate PR if you feel strongly and I’m wrong about the existing file types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree can you revert this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I do this because of .xls* vs. .ods, and no it cannot be reverted unless tons of tests fail.
engine = "odf" | ||
else: | ||
ext = os.path.splitext(str(path_or_io))[-1] | ||
if ext == ".ods": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than this should register the writer globally
pandas/pandas/core/config_init.py
Line 576 in 120e9d9
with cf.config_prefix("io.excel.xlsx"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The weird thing is that this stuff does not seem to be used anywhere. Correct me if I'm wrong. I added a similar bit for the OpenOffice file format, but it did not seem to be called/tested anywhere.
pandas/io/excel/_base.py
Outdated
class ExcelFile: | ||
""" | ||
Class for parsing tabular excel sheets into DataFrame objects. | ||
Uses xlrd. See read_excel for more documentation | ||
|
||
Parameters | ||
---------- | ||
io : str, path object (pathlib.Path or py._path.local.LocalPath), | ||
path_or_io : str, path object (pathlib.Path or py._path.local.LocalPath), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a path or a file-like object. The current doc-string is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kk, rename to path_or_buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
pandas/io/excel/_base.py
Outdated
if engine is None: | ||
engine = "xlrd" | ||
if isinstance(path_or_io, IOBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree can you revert this
pandas/io/excel/_odswriter.py
Outdated
return self.book.save(self.path) | ||
|
||
def write_cells( | ||
self, cells, sheet_name=None, startrow=0, startcol=0, freeze_panes=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type signatures whereever possible & provide full doc-strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added type signatures + doc strings
pandas/io/excel/_odswriter.py
Outdated
for row_nr in range(max(rows.keys()) + 1): | ||
wks.addElement(rows[row_nr]) | ||
|
||
def _make_table_cell_attributes(self, cell): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type & doc-string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added type info + docstrings
@roberthdevries can you merge master and will look again. |
602a591
to
b345a28
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you can type and doc-string as much as possible (could type as a follow up if @WillAyd ok with this)
doc/source/whatsnew/v1.1.0.rst
Outdated
@@ -288,6 +288,7 @@ Other enhancements | |||
- :meth:`HDFStore.put` now accepts `track_times` parameter. Parameter is passed to ``create_table`` method of ``PyTables`` (:issue:`32682`). | |||
- Make :class:`pandas.core.window.Rolling` and :class:`pandas.core.window.Expanding` iterable(:issue:`11704`) | |||
- Make ``option_context`` a :class:`contextlib.ContextDecorator`, which allows it to be used as a decorator over an entire function (:issue:`34253`). | |||
- :meth:`DataFrame.to_excel` can now also generate OpenOffice spreadsheet (.ods) files (:issue:`27222`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generate -> write
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed according to your suggestion
pandas/io/excel/_base.py
Outdated
class ExcelFile: | ||
""" | ||
Class for parsing tabular excel sheets into DataFrame objects. | ||
Uses xlrd. See read_excel for more documentation | ||
|
||
Parameters | ||
---------- | ||
io : str, path object (pathlib.Path or py._path.local.LocalPath), | ||
path_or_io : str, path object (pathlib.Path or py._path.local.LocalPath), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kk, rename to path_or_buffer
pandas/io/excel/_base.py
Outdated
@@ -809,17 +819,24 @@ class ExcelFile: | |||
"pyxlsb": _PyxlsbReader, | |||
} | |||
|
|||
def __init__(self, io, engine=None): | |||
def __init__(self, path_or_io, engine=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use path_or_buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed as you suggested
pandas/io/excel/_odswriter.py
Outdated
engine = "odf" | ||
supported_extensions = (".ods",) | ||
|
||
def __init__(self, path, engine=None, encoding=None, mode="w", **engine_kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type as much as possible here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added type info
pandas/io/excel/_odswriter.py
Outdated
self.book = OpenDocumentSpreadsheet() | ||
self._style_dict: Dict[str, str] = {} | ||
|
||
def save(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type as much as possible (e.g. -> None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added type info
can you rebase and update |
5eb1cc2
to
bb931c1
Compare
pandas/io/excel/_base.py
Outdated
@@ -781,6 +778,19 @@ def close(self): | |||
return self.save() | |||
|
|||
|
|||
def _is_ods_stream(stream): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type this function inputs & outputs & add a doc-string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
pandas/io/excel/_odswriter.py
Outdated
for row_nr in range(max(rows.keys()) + 1): | ||
wks.addElement(rows[row_nr]) | ||
|
||
def _make_table_cell_attributes(self, cell) -> Dict[str, object]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/io/excel/_odswriter.py
Outdated
attributes["numbercolumnsspanned"] = cell.mergeend | ||
return attributes | ||
|
||
def _make_table_cell(self, cell) -> Tuple[str, object]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a bit more difficult as the type is an odf.table.TableCell. As the odf package is is not installed on most test environments, this type cannot be added or the test will fail.
d897985
to
f20e2cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. can always open an issue for followups if needed. cc @WillAyd
Great thanks @roberthdevries |
You're welcome |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff