Skip to content

ENH: Add ods writer #32911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Jun 24, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
526d756
WIP: unit tests are still failing for ods write to ods read loop back…
roberthdevries Mar 8, 2020
165d887
Create empty cells where needed
roberthdevries Mar 21, 2020
024cb2d
Add support for dates
roberthdevries Mar 21, 2020
341b77c
More date/datetime fixes
roberthdevries Mar 21, 2020
1ead9f0
Make sure the cells and columns are sorted before writing them out
roberthdevries Mar 21, 2020
df321b6
Pass explicit engine for reading ods files
roberthdevries Mar 21, 2020
fbc5b3e
Only check extensions when there is a file with an extension
roberthdevries Mar 21, 2020
1303b85
Fix #N/A handling
roberthdevries Mar 21, 2020
4cae564
Add support for merged cells and skipped rows
roberthdevries Mar 22, 2020
bd78fae
Clean up code
roberthdevries Mar 22, 2020
5b9427f
Implement styling
roberthdevries Mar 22, 2020
3a8a06b
Refactor a bit to make a bit more readable
roberthdevries Mar 22, 2020
4d6ca30
black reformatting
roberthdevries Mar 22, 2020
736ec57
flake8 and isort fixes
roberthdevries Mar 22, 2020
9458d4f
Typing validation fixes
roberthdevries Mar 22, 2020
ac0c96f
Fix typo in type annotation
roberthdevries Mar 22, 2020
1fdabc6
Remove commented out debug code
roberthdevries Mar 22, 2020
149e1c5
Move imports inside methods
roberthdevries Mar 22, 2020
defb5c1
Move skip into test
roberthdevries Mar 22, 2020
af530a4
mypy fix
roberthdevries Mar 24, 2020
febd3ba
Add whatsnew entry
roberthdevries Mar 24, 2020
dac7cb6
Simplify datetime formatting by removing useless check
roberthdevries Mar 24, 2020
d64fb96
Add support for startrow and startcol arguments
roberthdevries Mar 24, 2020
54fbbf8
Add automatic OpenDocument Spreadsheet recognition to ExcelFile class
roberthdevries Mar 28, 2020
d6e48fb
Improve import dependency parameterization
roberthdevries Mar 28, 2020
635dd84
Reformatting fixes (black)
roberthdevries Mar 28, 2020
2de7755
Rename parameter path_or_io to path_or_buffer
roberthdevries Jun 9, 2020
19f0a5c
Add doc-strings and type annotations
roberthdevries Jun 9, 2020
89f742f
Update whatsnew according to suggestion by jreback
roberthdevries Jun 9, 2020
171fc61
Black reformatting
roberthdevries Jun 9, 2020
0d15a20
Fix some type annotations
roberthdevries Jun 9, 2020
336c231
Some type fixes
roberthdevries Jun 14, 2020
3edfbd8
Revert some of the typing fixes as they break some of the builds
roberthdevries Jun 14, 2020
97707b8
More mypy typing fixes
roberthdevries Jun 14, 2020
45467d2
Add more typing info
roberthdevries Jun 14, 2020
b14847d
And yet more typing fixes
roberthdevries Jun 14, 2020
d4d3a7c
Add doc-string and type info to _is_ods_stream
roberthdevries Jun 14, 2020
f82f4d4
Fix import order
roberthdevries Jun 16, 2020
f20e2cc
Add test to check exception when writing in append mode
roberthdevries Jun 16, 2020
9e2684f
Add whatsnew entry for extra bug fix in read_excel for 0.0 values in …
roberthdevries Jun 24, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,7 @@ Other enhancements
- :meth:`~pandas.io.gbq.read_gbq` now supports the ``max_results`` kwarg from ``pandas-gbq`` (:issue:`34639`).
- :meth:`Dataframe.cov` and :meth:`Series.cov` now support a new parameter ddof to support delta degrees of freedom as in the corresponding numpy methods (:issue:`34611`).
- :meth:`DataFrame.to_html` and :meth:`DataFrame.to_string`'s ``col_space`` parameter now accepts a list of dict to change only some specific columns' width (:issue:`28917`).
- :meth:`DataFrame.to_excel` can now also write OpenOffice spreadsheet (.ods) files (:issue:`27222`)

.. ---------------------------------------------------------------------------

Expand Down Expand Up @@ -1018,6 +1019,7 @@ I/O
- Bug in :meth:`~SQLDatabase.execute` was raising a ``ProgrammingError`` for some DB-API drivers when the SQL statement contained the `%` character and no parameters were present (:issue:`34211`)
- Bug in :meth:`~pandas.io.stata.StataReader` which resulted in categorical variables with difference dtypes when reading data using an iterator. (:issue:`31544`)
- :meth:`HDFStore.keys` has now an optional `include` parameter that allows the retrieval of all native HDF5 table names (:issue:`29916`)
- Bug in :meth:`read_excel` for ODS files removes 0.0 values (:issue:`27222`)

Plotting
^^^^^^^^
Expand Down
10 changes: 10 additions & 0 deletions pandas/core/config_init.py
Original file line number Diff line number Diff line change
Expand Up @@ -553,6 +553,7 @@ def use_inf_as_na_cb(key):
_xls_options = ["xlwt"]
_xlsm_options = ["openpyxl"]
_xlsx_options = ["openpyxl", "xlsxwriter"]
_ods_options = ["odf"]


with cf.config_prefix("io.excel.xls"):
Expand Down Expand Up @@ -581,6 +582,15 @@ def use_inf_as_na_cb(key):
)


with cf.config_prefix("io.excel.ods"):
cf.register_option(
"writer",
"auto",
writer_engine_doc.format(ext="ods", others=", ".join(_ods_options)),
validator=str,
)


# Set up the io.parquet specific configuration.
parquet_engine_doc = """
: string
Expand Down
4 changes: 4 additions & 0 deletions pandas/io/excel/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from pandas.io.excel._base import ExcelFile, ExcelWriter, read_excel
from pandas.io.excel._odswriter import _ODSWriter
from pandas.io.excel._openpyxl import _OpenpyxlWriter
from pandas.io.excel._util import register_writer
from pandas.io.excel._xlsxwriter import _XlsxWriter
Expand All @@ -14,3 +15,6 @@


register_writer(_XlsxWriter)


register_writer(_ODSWriter)
57 changes: 45 additions & 12 deletions pandas/io/excel/_base.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
import abc
import datetime
from io import BytesIO
from io import BufferedIOBase, BytesIO, RawIOBase
import os
from textwrap import fill
from typing import Union

from pandas._config import config

Expand Down Expand Up @@ -533,13 +534,13 @@ class ExcelWriter(metaclass=abc.ABCMeta):
"""
Class for writing DataFrame objects into excel sheets.

Default is to use xlwt for xls, openpyxl for xlsx.
Default is to use xlwt for xls, openpyxl for xlsx, odf for ods.
See DataFrame.to_excel for typical usage.

Parameters
----------
path : str
Path to xls or xlsx file.
Path to xls or xlsx or ods file.
engine : str (optional)
Engine to use for writing. If None, defaults to
``io.excel.<extension>.writer``. NOTE: can only be passed as a keyword
Expand Down Expand Up @@ -692,10 +693,7 @@ def __init__(
# validate that this engine can handle the extension
if isinstance(path, str):
ext = os.path.splitext(path)[-1]
else:
ext = "xls" if engine == "xlwt" else "xlsx"

self.check_extension(ext)
self.check_extension(ext)

self.path = path
self.sheets = {}
Expand Down Expand Up @@ -781,6 +779,34 @@ def close(self):
return self.save()


def _is_ods_stream(stream: Union[BufferedIOBase, RawIOBase]) -> bool:
"""
Check if the stream is an OpenDocument Spreadsheet (.ods) file

It uses magic values inside the stream

Parameters
----------
stream : Union[BufferedIOBase, RawIOBase]
IO stream with data which might be an ODS file

Returns
-------
is_ods : bool
Boolean indication that this is indeed an ODS file or not
"""
stream.seek(0)
is_ods = False
if stream.read(4) == b"PK\003\004":
stream.seek(30)
is_ods = (
stream.read(54) == b"mimetype"
b"application/vnd.oasis.opendocument.spreadsheet"
)
stream.seek(0)
return is_ods


class ExcelFile:
"""
Class for parsing tabular excel sheets into DataFrame objects.
Expand All @@ -789,8 +815,8 @@ class ExcelFile:

Parameters
----------
io : str, path object (pathlib.Path or py._path.local.LocalPath),
a file-like object, xlrd workbook or openpypl workbook.
path_or_buffer : str, path object (pathlib.Path or py._path.local.LocalPath),
a file-like object, xlrd workbook or openpypl workbook.
If a string or path object, expected to be a path to a
.xls, .xlsx, .xlsb, .xlsm, .odf, .ods, or .odt file.
engine : str, default None
Expand All @@ -816,18 +842,25 @@ class ExcelFile:
"pyxlsb": _PyxlsbReader,
}

def __init__(self, io, engine=None):
def __init__(self, path_or_buffer, engine=None):
if engine is None:
engine = "xlrd"
if isinstance(path_or_buffer, (BufferedIOBase, RawIOBase)):
if _is_ods_stream(path_or_buffer):
engine = "odf"
else:
ext = os.path.splitext(str(path_or_buffer))[-1]
if ext == ".ods":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than this should register the writer globally

with cf.config_prefix("io.excel.xlsx"):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weird thing is that this stuff does not seem to be used anywhere. Correct me if I'm wrong. I added a similar bit for the OpenOffice file format, but it did not seem to be called/tested anywhere.

engine = "odf"
if engine not in self._engines:
raise ValueError(f"Unknown engine: {engine}")

self.engine = engine

# Could be a str, ExcelFile, Book, etc.
self.io = io
self.io = path_or_buffer
# Always a string
self._io = stringify_path(io)
self._io = stringify_path(path_or_buffer)

self._reader = self._engines[engine](self._io)

Expand Down
9 changes: 5 additions & 4 deletions pandas/io/excel/_odfreader.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
from typing import List, cast

import numpy as np

from pandas._typing import FilePathOrBuffer, Scalar
from pandas.compat._optional import import_optional_dependency

Expand Down Expand Up @@ -148,6 +150,9 @@ def _is_empty_row(self, row) -> bool:
def _get_cell_value(self, cell, convert_float: bool) -> Scalar:
from odf.namespaces import OFFICENS

if str(cell) == "#N/A":
return np.nan

cell_type = cell.attributes.get((OFFICENS, "value-type"))
if cell_type == "boolean":
if str(cell) == "TRUE":
Expand All @@ -158,10 +163,6 @@ def _get_cell_value(self, cell, convert_float: bool) -> Scalar:
elif cell_type == "float":
# GH5394
cell_value = float(cell.attributes.get((OFFICENS, "value")))

if cell_value == 0.0: # NA handling
return str(cell)

if convert_float:
val = int(cell_value)
if val == cell_value:
Expand Down
Loading