API: integrate with openpyxl 2.0-2.1 changes #8342

jtratner · 2014-09-21T19:33:47Z

Builds are failing on Travis due to some issue with Excel (weirdly passed on merge but then started failing a few commits later). Currently investigating. C.f. https://travis-ci.org/pydata/pandas/jobs/35873580

@neirbowj - any idea why these tests stopped passing?

It's all some variation on the following:

======================================================================
ERROR: test_tsframe (pandas.io.tests.test_excel.OpenpyxlTests_NoMerge)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.4.1/lib/python3.4/site-packages/pandas/io/tests/test_excel.py", line 562, in test_tsframe
    df.to_excel(path, 'test1')
  File "/home/travis/virtualenv/python3.4.1/lib/python3.4/site-packages/pandas/util/decorators.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/travis/virtualenv/python3.4.1/lib/python3.4/site-packages/pandas/core/frame.py", line 1228, in to_excel
    startrow=startrow, startcol=startcol)
  File "/home/travis/virtualenv/python3.4.1/lib/python3.4/site-packages/pandas/io/excel.py", line 687, in write_cells
    'number_format':{'format_code': self.datetime_format}}))
  File "/home/travis/virtualenv/python3.4.1/lib/python3.4/site-packages/pandas/io/excel.py", line 757, in _convert_to_style_kwargs
    new_v = _conv_to_x(v)
  File "/home/travis/virtualenv/python3.4.1/lib/python3.4/site-packages/pandas/io/excel.py", line 1043, in _convert_to_number_format
    from openpyxl.styles import NumberFormat
ImportError: cannot import name 'NumberFormat'
----------------------------------------------------------------------

The text was updated successfully, but these errors were encountered:

jreback · 2014-09-21T19:43:33Z

Ha, we just had this discussion!: #8340

apparently openpyxl released 2.0.5 (today 9/21). and made backward incompatible changes (no suprise their). The 3.4 build tests with the latest so picked it up.

So whatever fix is needed, prob should change say the 3.3 build to use the last good version as well (2.04?), though 2.6 build uses 2.03 (to make sure that WE maintain compat).

jtratner · 2014-09-21T19:53:32Z

it looks like a really minor change - maybe for now just switch the import
to openpyxl.styles.number.NumberFormat?

On Sun, Sep 21, 2014 at 12:43 PM, jreback [email protected] wrote:

Ha, we just had this discussion!: #8340
#8340

apparently openpyxl released 2.0.5 (today 9/21). and made backward
incompatible changes (no suprise their). The 3.4 build tests with the
latest so picked it up.

So whatever fix is needed, prob should change say the 3.3 build to use the
last good version as well (2.04?), though 2.6 build uses 2.03 (to make sure
that WE maintain compat).

—
Reply to this email directly or view it on GitHub
#8342 (comment).

jreback · 2014-09-21T19:56:33Z

@jtratner prob, just need to do it in a try: except: because have to keep the existing one around. Its really annoying when people change the API's around like that

jreback · 2014-09-21T20:03:00Z

hmm, might be something else as 2.0.5 seems to have been working for last few days (though its possible that what was called 2.0.5 changed), which is a no-no, but its possible

jorisvandenbossche · 2014-09-21T20:12:17Z

It is version 2.1.0 that is released, not 2.0.5 I think: http://openpyxl.readthedocs.org/en/latest/changes.html

jtratner · 2014-09-21T20:13:42Z

I just pushed a potential fix. If it works, let's just stick with the minor
changes rather than contorting ourselves to make things work with openpyxl
v2. good thing you said to mark it as experimental :)

On Sun, Sep 21, 2014 at 1:12 PM, Joris Van den Bossche <
[email protected]> wrote:

It is version 2.1.0 that is released, not 2.0.5 I think.

—
Reply to this email directly or view it on GitHub
#8342 (comment).

jreback · 2014-09-21T20:17:11Z

@jorisvandenbossche right

neirbowj · 2014-09-22T00:44:20Z

Grr. The OpenPyxl devs are making me regret getting roped into this.

It's not clear that using the deprecated NumberFormat class is a viable approach. It's been deprecated pretty emphatically:

>>> from openpyxl.styles import Style
>>> from openpyxl.styles.numbers import NumberFormat
>>> nf = NumberFormat(format_code='0.00')
openpyxl.styles.numbers:1: UserWarning: Call to deprecated function or class NumberFormat (Number formats are strings. Use module functions).
>>> s = Style(number_format=nf)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "openpyxl/styles/__init__.py", line 45, in __init__
    self.number_format = number_format
  File "openpyxl/styles/hashable.py", line 54, in __setattr__
    return object.__setattr__(self, *args, **kwargs)
  File "openpyxl/styles/numbers.py", line 194, in __set__
    super(NumberFormatDescriptor, self).__set__(instance, value)
  File "openpyxl/descriptors/__init__.py", line 32, in __set__
    raise TypeError('expected ' + str(self.expected_type))
TypeError: expected <type 'basestring'>

It looks like the main alternatives for us are to A) build in separate compatibility for the two flavors of number formats, or B) declare 2.x.y prior to 2.1.0 unsupported, and treat "number_format is a string" as the one true interface. Either way, it seems like this kind of breakage is going to be a fact of life with OpenPyxl going forward.

Thoughts?

jreback · 2014-09-22T00:47:43Z

can u ping their mailing list / issues
and see what the heck is up with all if these API changes

if they come back with - this is the API and it's stable - then if u can fix up
we can just make 2.1 min for 2.x series

otherwise maybe just revert and wait for stability

neirbowj · 2014-09-22T00:49:11Z

Yeah, I'll see what I can find out.

neirbowj · 2014-09-22T01:23:25Z

I've opened the hailing frequencies.

jreback · 2014-09-22T01:28:07Z

perfect - of course this problem of crazy API changes happened in v2 so not surprised

jreback · 2014-09-22T13:39:41Z

@neirbowj what do you think is best to do in short-run

revert your original PR
skip tests for openpyxl2
comment something out

?

want to do a quick for one of these?

neirbowj · 2014-09-22T14:45:52Z

I'll be in transit most of the day without access to the Internet (because I'll be damned if I ever knowingly buy GoGo Inflight Internet again). If this doesn't answer the immediate need, the next least worst option is probably to skip tests for openpyxl2, then next to revert. I'll be back online this evening.

jreback · 2014-09-22T16:40:37Z

@neirbowj ok I merged in your changes (needed a couple of more for compat). Will leave this issue open for you to come back at some point and validate / fix for the continued openpyxl API changes.

thanks!

stancikcom · 2015-03-17T09:45:10Z

Guys, I believe IO modules are essential for Pandas usability (besides Numpy ;-) ).
I have fallen recently into dependency hell as another module (PETL) requires a newer version of Openpyxl. Please, consider to implement an adapter pattern that will support (reasonable) legacy versions.

I think Pandas is a great package and for a production usage it is worth to invest more into reliability (that also means modules).

khatriindu · 2015-03-23T03:08:09Z

Hi, I am trying to create an Excel File(showing the differences of two Excel files). I am getting this error:TypeError: expected <class 'openpyxl.styles.fonts.Font'>.
I have already imported : >>> import openpyxl.styles

from openpyxl.styles import fonts

Any solutions please?

jreback · 2015-03-23T03:21:45Z

@stancikcom and @Anon3
pandas has been compat with all released versions of openpyxl for as far back as I can remember
there were a couple of hiccups iirc 0.15.0
0.15.2 has been out for quite some time and 0.16.0 just released
if u are having difficulties thrn pls pd.show_versions() in your report

khatriindu · 2015-03-23T04:52:24Z

@jreback The version is pandas: 0.15.2. Would you suggest me to upgrade it to 0.16.0 ?

jreback · 2015-03-23T10:52:08Z

@Anon3 well pls report pd.show_versions() as well as as showing what is causing the error

stancikcom · 2015-03-23T12:33:15Z

Hi all,

after upgrade, pandas works fine with new version of openpyxl.
my import pandas; pandas.show_versions() dump:

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-46-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.0
nose: 1.3.4
Cython: 0.21.2
numpy: 1.9.2
scipy: 0.15.0
statsmodels: 0.6.1
IPython: 2.3.1
sphinx: 1.2.3
patsy: 0.2.1
dateutil: 2.4.1
pytz: 2015.2
bottleneck: 1.0.0
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.4.2
openpyxl: 2.2.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: 0.6.7
lxml: 3.3.3
bs4: 4.2.1
html5lib: 0.999
httplib2: 0.8
apiclient: None
sqlalchemy: 0.9.9
pymysql: 0.6.6.None
psycopg2: 2.6 (dt dec pq3 ext lo64)

khatriindu · 2015-03-23T15:52:07Z

@jreback this is what I used:

import pandas as pd
pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.2.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.0
nose: None
Cython: None
numpy: 1.9.2
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.4.0
pytz: 2014.10
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: 2.2.0
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None

khatriindu · 2015-03-23T15:53:04Z

@jreback and @stancikcom even with 0.16.0 i get this error: TypeError: expected <class 'openpyxl.styles.fonts.Font'>

stancikcom · 2015-03-23T17:03:21Z

@Anon3 try to share a snippet of your code which makes you a trouble. We can try to replicate / debug it. Cheers. (ms)

khatriindu · 2015-03-24T00:31:05Z

@stancikcom : here is the code that I found (and I am using for official purpose) to compare two excels:

import pandas as pd
import numpy as np
import xlrd

Define the diff function to show the changes in each field

def report_diff(x):
return x[0] if x[0] == x[1] else '{} ---> {}'.format(*x)

We want to be able to easily tell which rows have changes

def has_change(row):
if "--->" in row.to_string():
return "Y"
else:
return "N"

Read in both excel files

df1 = pd.read_excel('Excel1.xlsx', 'Sheet2', na_values=['NA'])
df2 = pd.read_excel('Excel2.xlsx', 'Sheet2', na_values=['NA'])

Make sure we order by account number so the comparisons work

df1.sort(columns="Col1")
df1=df1.reindex()
df2.sort(columns="Col1")
df2=df2.reindex()

Create a panel of the two dataframes

diff_panel = pd.Panel(dict(df1=df1,df2=df2))

#Apply the diff function
diff_output = diff_panel.apply(report_diff, axis=0)

Flag all the changes

diff_output['has_change'] = diff_output.apply(has_change, axis=1)

#Save the changes to excel but only include the columns we care about
diff_output[(diff_output.has_change == 'Y')].to_excel('my-diff-1.xlsx',index=False,columns=["Col1",
"Col2","Col3","Col4","Col5"])

stancikcom · 2015-03-24T09:51:15Z

@Anon3 I have tried your code on my data sets (of course I do have different data than you) and I can confirme that the pure IO from/to excel worked without any problem in my python environment.

However, I think the issues are in your code logic. Currently it won't work, if the data sets are different size, it won't work if the account order is altered e.g. due to added/deleted account in one of the set. It is a total overkill in this case using a 3d pandas.Panel() ;-)

If I were you I would refactor the logic as follows:
-assuming account is a unique primary key for each row, then even without Pandas you can write your function using built_in hasattr() function. Accounts could be a hash key and its value the account attributes (a tuple where each member is taken from one cell from excel). So then

data_set = { Account : (Col1,Col2,Col3,...Coln), ... }

Consider to address this to stackoverflow which is more focused to optimize algorythms.

khatriindu · 2015-03-24T16:52:58Z

@stancikcom thanks for your valuable comments. I am a newbie .. never used hasattr() function.
Let me try learning and using hasattr()

stancikcom · 2015-03-25T15:58:42Z

@Anon3 with #python and #pandas you are in the right track... :-)
bdw. if you want to still solve your problem with pandas, merge df1 and df2 on the account column, and then write short function to compare 'left' and 'right' side of the table.

pd.merge(df1,df2,on='account',how='left')

khatriindu · 2015-03-25T17:06:48Z

@stancikcom ,
Yes. I wrote something something and I get : ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

This is the additional piece of code:
while(df1 and df2):
if(df1[0:len(df1)]==df2[0:len(df2)]):
continue
else:
# something

PS: I am sure there is something wrong in comparison.I'll try to use either of these:a.empty, a.bool(), a.item(), a.any() or a.all()
I'll keep you updated with my final code

khatriindu · 2015-03-26T04:15:50Z

@stancikcom ,

how to compare the left and right column from the merged dataframe?

I was trying to find some reference over the stackoverflow, didn't find for the merge but found a different way to compare two data frames. Here is the piece of code which works fine:
import pandas as pd
import numpy as np
import xlrd

Read in both excel files

df1 = pd.read_excel('Excel1.xlsx', 'Sheet2', na_values=['NA'])
df2 = pd.read_excel('Excel2.xlsx', 'Sheet2', na_values=['NA'])

Make sure we order by account number so the comparisons work

df1.sort(columns="Col1")
df1=df1.reindex()
df2.sort(columns="Col1")
df2=df2.reindex()

ne_stacked = (df1 != df2).stack()

changed = ne_stacked[ne_stacked]

changed.index.names = ['id', 'col']

difference_locations = np.where(df1 != df2)

changed_from = df1.values[difference_locations]

changed_to = df2.values[difference_locations]

print(pd.DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index))

jreback · 2015-09-21T15:35:47Z

closed in favor of #10125

jreback added Compat pandas objects compatability with Numpy or Python functions IO Excel read_excel, to_excel labels Sep 21, 2014

jreback added this to the 0.15.0 milestone Sep 21, 2014

jreback mentioned this issue Sep 21, 2014

CLN: pep8 clean up of sql.py #8340

Merged

neirbowj mentioned this issue Sep 22, 2014

BUG: Work-around openpyxl-2.1.0 NumberFormat removal #8352

Closed

jreback mentioned this issue Sep 22, 2014

BUG: Work-around openpyxl-2.1.0 NumberFormat removal #8356

Merged

jreback modified the milestones: 0.15.1, 0.15.0 Sep 22, 2014

jreback changed the title ~~Failed builds due to Excel issue~~ API: integrate with openpyxl 2.0-2.1 changes Sep 22, 2014

neirbowj mentioned this issue Dec 14, 2014

Python3: No module named 'openpyxl.style' #9060

Closed

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

jlec mentioned this issue Sep 11, 2015

COMPAT: openpyxl >= 2.2 failing #10125

Closed

jreback modified the milestones: 0.17.0, Next Major Release Sep 21, 2015

jreback closed this as completed Sep 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: integrate with openpyxl 2.0-2.1 changes #8342

API: integrate with openpyxl 2.0-2.1 changes #8342

jtratner commented Sep 21, 2014

jreback commented Sep 21, 2014

jtratner commented Sep 21, 2014

jreback commented Sep 21, 2014

jreback commented Sep 21, 2014

jorisvandenbossche commented Sep 21, 2014

jtratner commented Sep 21, 2014

jreback commented Sep 21, 2014

neirbowj commented Sep 22, 2014

jreback commented Sep 22, 2014

neirbowj commented Sep 22, 2014

neirbowj commented Sep 22, 2014

jreback commented Sep 22, 2014

jreback commented Sep 22, 2014

neirbowj commented Sep 22, 2014

jreback commented Sep 22, 2014

stancikcom commented Mar 17, 2015

khatriindu commented Mar 23, 2015

jreback commented Mar 23, 2015

khatriindu commented Mar 23, 2015

jreback commented Mar 23, 2015

stancikcom commented Mar 23, 2015

khatriindu commented Mar 23, 2015

INSTALLED VERSIONS

khatriindu commented Mar 23, 2015

stancikcom commented Mar 23, 2015

khatriindu commented Mar 24, 2015

stancikcom commented Mar 24, 2015

khatriindu commented Mar 24, 2015

stancikcom commented Mar 25, 2015

khatriindu commented Mar 25, 2015

khatriindu commented Mar 26, 2015

jreback commented Sep 21, 2015

API: integrate with openpyxl 2.0-2.1 changes #8342

API: integrate with openpyxl 2.0-2.1 changes #8342

Comments

jtratner commented Sep 21, 2014

jreback commented Sep 21, 2014

jtratner commented Sep 21, 2014

jreback commented Sep 21, 2014

jreback commented Sep 21, 2014

jorisvandenbossche commented Sep 21, 2014

jtratner commented Sep 21, 2014

jreback commented Sep 21, 2014

neirbowj commented Sep 22, 2014

jreback commented Sep 22, 2014

neirbowj commented Sep 22, 2014

neirbowj commented Sep 22, 2014

jreback commented Sep 22, 2014

jreback commented Sep 22, 2014

neirbowj commented Sep 22, 2014

jreback commented Sep 22, 2014

stancikcom commented Mar 17, 2015

khatriindu commented Mar 23, 2015

jreback commented Mar 23, 2015

khatriindu commented Mar 23, 2015

jreback commented Mar 23, 2015

stancikcom commented Mar 23, 2015

INSTALLED VERSIONS

khatriindu commented Mar 23, 2015

INSTALLED VERSIONS

khatriindu commented Mar 23, 2015

stancikcom commented Mar 23, 2015

khatriindu commented Mar 24, 2015

Define the diff function to show the changes in each field

We want to be able to easily tell which rows have changes

Read in both excel files

Make sure we order by account number so the comparisons work

Create a panel of the two dataframes

Flag all the changes

stancikcom commented Mar 24, 2015

khatriindu commented Mar 24, 2015

stancikcom commented Mar 25, 2015

khatriindu commented Mar 25, 2015

khatriindu commented Mar 26, 2015

Read in both excel files

Make sure we order by account number so the comparisons work

jreback commented Sep 21, 2015