ENH: Upgrade Excel I/O module to use OpenPyXL 2 #7177

neirbowj · 2014-05-19T18:55:16Z

OpenPyXL v2.0 was released on May 13th, breaking backwards compatibility in some parts of its API, including those upon which pandas depends. Notably, v2 changes the style API and makes Style objects immutable. This incompatibility is evident at 0.14.0-rc1 in pandas.io.excel._OpenpyxlWriter._convert_to_style and the associated test case, pandas.io.tests.test_excel.OpenyxlTests.test_to_excel_styleconverter.

In addition to teaching pandas how to use the new API, the project may wish to retain support for the OpenPyXL v1 API, necessitating the design and development of a compatibility shim.

The text was updated successfully, but these errors were encountered:

jreback · 2014-05-19T19:17:00Z

dupe of #7169
would love to have u work on it!

py3.4 build already has 2.02 (and so fails)

neirbowj · 2014-05-24T01:18:21Z

Could you re-open this issue and reassign it to the 0.14.1 milestone?

jreback · 2014-06-10T15:43:11Z

any thoughts on this? (getting it to work with >= 2.0)

neirbowj · 2014-06-10T17:14:29Z

I've been spending my pandas time allocation on this.

My thoughts, in case anybody else is driven to work on this before I can get to it, are as follows (caveat: I haven't thought about this for a couple weeks, so my recollection about some of these details may betray me).

There appear to be two main pieces to this feature. One is simply the conversion of a style dict to a native openpyxl Style object. This should be fairly rote, and by implementing to the new-style Style protocol, which is the only option in v2, backwards compatibility with v1 will be quite tractable, because it offers both old- and new-style protocols for forwards compatibility.

There are some differences in the structure of the Style class from v1 to v2, which pandas will have to handle. My preference would be to try, but not very hard, to map whatever style information the user passes onto the underlying representation, and to attach commentary about the openpyxl limitations we know of to whatever exceptions bubble up. However, I don't know how people tend to use this, and I haven't yet found a clear spec in the pandas docs for an engine-agnostic, user-facing style API (maybe there isn't and shouldn't be one).

The other main piece is the merge behavior. I've had more trouble wrapping my head around how this should work. The new-style Style class has a copy method that accepts coarse substitutions. For example, conceptually if not correct syntactically, consider this snippet.

from openpyxl.style import *
s1 = Style(font=Font(name='Helvetica'), color=Color(rgb='00FF0000'))
s2 = s1.copy(font=Font(sz=12))

s2 will get the red Color object from s1 but its font attribute will have been overridden with a completely new Font object that has the default face, 'Calibri', at a non-default size, 12 pt. Make sense?

I think this is a fine set of semantics to adopt for the internal style merging behavior of pandas, because it keeps the padding between the user and the engine implementation thin. I'm not sure what that implies for the openpyxl v1 case. At the least, it may be convenient to model a compatibility shim as a backport of the copy method applied to v1 Style objects.

neirbowj · 2014-06-21T21:05:48Z

It occurred to me belatedly to wonder, since I have never used pandas.io.excel or its main consumer, the to_excel method, whether the styling capability of ExcelWriter is really intended to be user facing? The documentation does not mention it, though the docstring for the ExcelWriter.write_cells method clearly indicates acceptance of formatted (i.e. styled) cells. In what follows, I will be assuming that this is supposed to be a user-facing capability that is simply not yet documented as such. If it isn't, please say so because it might provide some useful wiggle room to address the other areas where I'm getting stuck.

The other thing I finally realized I didn't really know for sure, and need to, is whether the style dict those formatted cells carry around is required to be engine-agnostic, or if it can vary from engine to engine. I guessed that the intention was the former, and the uniformity of the usage examples --- the only internal user in pandas.core.format and the applicable test cases for the openpyxl and xlwt engines --- seem to substantiate my guess. However, the style dicts in the test cases are insufficient for me to tell what I should write if I were to write a style dict spec/doc in the spirit of earlier authors. For example, I can tell that the spec has to allow the font to be set to bold or not, but I cannot tell whether font colors are to be supported, and, if so, how they are to be encoded.

If it were up to me, I think I would choose a very limited subset of all possible styling available in the underlying Excel, and call that the engine-agnostic, pandas dialect. By making the subset small and simple, the task gets easier of mapping it onto the supported versions of the existing engine libraries and any future versions or new engines. As I said, I don't know how users tend to use pandas.io.excel so I'm basing this judgement on the notion that somebody who doesn't wish to specify a particular engine, probably won't wish to specify a great deal of formatting information either, if any.

To accommodate users who do want to exert a great deal of control over cell styling, an additional, separable feature could be to teach ExcelWriter.write_cells to accept cells that come with attached style objects that are native to the selected engine. That is, if you want access to the full styling capability of the underlying library, then you have to build the style objects yourself instead of describing them to pandas as dicts.

A related question. Does pandas have some sort of policy, roadmap, or procedure for determining version support? That is, how will we know when we can drop support for the old version of openpyxl?

I welcome discussion or even just direction.

jreback · 2014-06-21T21:44:46Z

to your last points

we try maintain support for as long as possible with dep libraries until it becomes too painful :)

if it's easy enough then no biggie

openpyxl 1.6 I think is the min version but only. 1 year old I think

it's becoming pretty easy IMHO to upgrade individual deps easily

so if their is a recommendation to drop support for something we can discuss

note that pandas does still supports numpy 1.6.1 (and this is painful!)

neirbowj · 2014-06-21T21:56:28Z

@jreback I'm not suggesting that we drop support for anything at the moment. I'm just trying to get a sense of how long the life of an openpyxl v1/v2 compatibility shim might be.

jreback · 2014-06-21T22:00:46Z

one way to do this might be to make a openpyxl_v2 engine (and then depending on the installation choose one or the other)
that way u can rebuild the API as needed

neirbowj · 2014-06-21T23:10:42Z

Yes, that's a good point. Splitting the implementation at the engine level would probably be better. I still think that we need a spec and documentation for the style dict, both of which would be informed by whether it is intended for internal use only, or is an external feature that we also use internally.

neirbowj · 2014-06-25T04:09:40Z

The PR I just posted still has some rough edges (see, for example, pandas.io.excel._Openpyxl2Writer._convert_to_fill), but should give us a leg up on this issue.

I find that openpyxl is maddeningly full of corner cases and reasons why I can't do something simple and uniform. If anybody sees a missed opportunity to refactor and consolidate this code, I welcome suggestions or even alternative PRs.

jreback added Dupe labels May 19, 2014

jreback added this to the 0.14.0 milestone May 19, 2014

jreback closed this as completed May 19, 2014

jreback reopened this May 24, 2014

jreback modified the milestones: 0.14.1, 0.14.0 May 24, 2014

jreback removed the Dupe label May 24, 2014

jreback modified the milestones: 0.15.0, 0.14.1 Jun 22, 2014

neirbowj mentioned this issue Jun 25, 2014

ENH: Implement _Openpyxl2Writer for pandas.io.excel #7565

Merged

jreback modified the milestones: 0.15.0, 0.15.1 Jul 6, 2014

jreback closed this as completed in #7565 Sep 20, 2014

neirbowj mentioned this issue Dec 14, 2014

Python3: No module named 'openpyxl.style' #9060

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Upgrade Excel I/O module to use OpenPyXL 2 #7177

ENH: Upgrade Excel I/O module to use OpenPyXL 2 #7177

neirbowj commented May 19, 2014

jreback commented May 19, 2014

neirbowj commented May 24, 2014

jreback commented Jun 10, 2014

neirbowj commented Jun 10, 2014

neirbowj commented Jun 21, 2014

jreback commented Jun 21, 2014

neirbowj commented Jun 21, 2014

jreback commented Jun 21, 2014

neirbowj commented Jun 21, 2014

neirbowj commented Jun 25, 2014

ENH: Upgrade Excel I/O module to use OpenPyXL 2 #7177

ENH: Upgrade Excel I/O module to use OpenPyXL 2 #7177

Comments

neirbowj commented May 19, 2014

jreback commented May 19, 2014

neirbowj commented May 24, 2014

jreback commented Jun 10, 2014

neirbowj commented Jun 10, 2014

neirbowj commented Jun 21, 2014

jreback commented Jun 21, 2014

neirbowj commented Jun 21, 2014

jreback commented Jun 21, 2014

neirbowj commented Jun 21, 2014

neirbowj commented Jun 25, 2014