ExcelWriter fails with columns with int32 dtype #3122

lbeltrame · 2013-03-21T14:32:52Z

Test case

In [21]: df = pandas.DataFrame({"foo": [1,2,3]})

In [22]: df["foo"] = df["foo"].astype("int32")

In [23]: df.to_excel("foo.xls")
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-23-a8d956752923> in <module>()
----> 1 df.to_excel("foo.xls")

/usr/lib64/python2.7/site-packages/pandas-0.11.0.dev_4e0465d-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in to_excel(self, excel_writer, sheet_name, na_rep, float_format, cols, header, index, index_label, startrow, startcol)
   1406         formatted_cells = formatter.get_formatted_cells()
   1407         excel_writer.write_cells(formatted_cells, sheet_name,
-> 1408                                  startrow=startrow, startcol=startcol)
   1409         if need_save:
   1410             excel_writer.save()

/usr/lib64/python2.7/site-packages/pandas-0.11.0.dev_4e0465d-py2.7-linux-x86_64.egg/pandas/io/parsers.pyc in write_cells(self, cells, sheet_name, startrow, startcol)
   2194             self._writecells_xlsx(cells, sheet_name, startrow, startcol)
   2195         else:
-> 2196             self._writecells_xls(cells, sheet_name, startrow, startcol)
   2197
   2198     def _writecells_xlsx(self, cells, sheet_name, startrow, startcol):

/usr/lib64/python2.7/site-packages/pandas-0.11.0.dev_4e0465d-py2.7-linux-x86_64.egg/pandas/io/parsers.pyc in _writecells_xls(self, cells, sheet_name, startrow, startcol)
   2270                 wks.write(startrow + cell.row,
   2271                           startcol + cell.col,
-> 2272                           val, style)

/usr/lib/python2.7/site-packages/xlwt/Worksheet.pyc in write(self, r, c, label, style)
   1030
   1031     def write(self, r, c, label="", style=Style.default_style):
-> 1032         self.row(r).write(c, label, style)
   1033
   1034     def write_rich_text(self, r, c, rich_text_list, style=Style.default_style):

/usr/lib/python2.7/site-packages/xlwt/Row.pyc in write(self, col, label, style)
    257             self.__rich_text_helper(col, label, style, style_index)
    258         else:
--> 259             raise Exception("Unexpected data type %r" % type(label))
    260
    261     def set_cell_rich_text(self, col, rich_text_list, style=Style.default_style):

Exception: Unexpected data type <type 'numpy.int32'>

jreback · 2013-03-21T14:39:57Z

I think Excel writer needs to do some type conversions as xlwt only seems to deal with pure python types
see

https://github.com/python-excel/xlwt/issues/15

jreback · 2013-03-21T14:41:06Z

prob can get done in 0.12

lbeltrame · 2013-03-21T14:43:14Z

In data giovedì 21 marzo 2013 07:41:18, jreback ha scritto:

prob can get done in 0.12

No big deal. I noticed it seems to work if I upcast to int64.

Luca Beltrame - KDE Forums team
KDE Science supporter
GPG key ID: 6E1A4E79

jreback · 2013-03-21T14:47:25Z

FYI...this actually wasn't a bug in 0.10.1 because the astype that you are doing, while converting the series, upcasted again when put back in the data frame (but in 0.11 the dtype in the frame is correct)

so going to leave open for fixing later

In [2]: import pandas

In [3]: df = pandas.DataFrame({"foo": [1,2,3]})

In [5]: df.dtypes
Out[5]: foo    int64

In [7]: df['foo'].astype('int32').dtype
Out[7]: dtype('int32')

In [8]: df['foo'] = df['foo'].astype('int32')

In [9]: df.dtypes
Out[9]: foo    int64

jtratner · 2013-09-15T13:29:55Z

I'm thinking we create a function that, based on numpy version/environment, converts columns to things that are picked up as int/float/long and that will be used by the cell formatter.

Does int64 need to be special-cased to long in 32 bit python?

jreback · 2013-09-15T13:34:21Z

this I think is an excel conversion issue in that it doesn't know how to deal with int32
(not sure if it can at all; if it cannot then the excel converters should ensure 64bit dtypes for float/int)

jreback · 2013-09-15T13:35:24Z

@jtratner I think it's the other way around - since we normally always use int64 wrt the python version
unless its explicitly converted

jtratner · 2013-09-15T14:19:17Z

@jreback well, it just needs to be converted to a format that's a subclass of int/float/whatever.

jtratner · 2013-09-15T14:19:51Z

so in 32bit python int32 is a subclass of int, in 64bit python int64 is a subclass of int.

jreback · 2013-09-15T14:27:04Z

ok so this IS platform dep on the windows side then? weird

jtratner · 2013-09-15T14:54:34Z

@jreback not sure...I'm just going off of what that guy showed was happening in the linked issue

jtratner · 2013-09-22T22:05:56Z

@lbeltrame - very soon you can use the xlsxwriter engine to handle this.

jreback · 2013-09-30T13:21:32Z

cc @jmcnamara, does xlxswriter handle int32 dtypes? (or is this an excel issue)

jmcnamara · 2013-09-30T14:12:56Z

@jreback

Xlsxwriter will handle int32 and any numpy/pandas numeric data types that can be mapped to an IEE754 float (which is the only numeric type that Excel has). Running the failing test case with xlsxwriter doesn't raise an exception:

$ python
Python 2.7.3 (default, Jul  5 2013, 08:39:51)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> df = pandas.DataFrame({"foo": [1,2,3]})
>>> df["foo"] = df["foo"].astype("int32")
>>> df.to_excel("foo.xlsx", engine='xlsxwriter')
>>> # No exception

So, xlsxwriter or openpyxl don't have this issue.

The reason that it fails with a .xls output file is that xlwt uses a whitelist for supported types and int32 isn't identified as a int, long or float. This may be an pandas/numpy type issue, see:

https://github.com/python-excel/xlwt/issues/15

jreback · 2013-09-30T15:30:01Z

ok...leaving open as their needs to be a conversion done for xlwt

jmcnamara · 2013-09-30T16:11:27Z

@jreback

needs to be a conversion done for xlwt

In io/excel.py there is the following function that I think was included to solve this issue (or similar issues).

def _conv_value(val):
    # convert value for excel dump
    if isinstance(val, np.int64):
        val = int(val)
    elif isinstance(val, np.bool8):
        val = bool(val)
    elif isinstance(val, Period):
        val = "%s" % val

    return val

Changing it to something like would fix this issue (although there may be other cases to account for as well):

$ git diff
diff --git a/pandas/io/excel.py b/pandas/io/excel.py
index 6b83fad..97af5be 100644
--- a/pandas/io/excel.py
+++ b/pandas/io/excel.py
@@ -301,7 +301,7 @@ def _trim_excel_header(row):

 def _conv_value(val):
     # convert value for excel dump
-    if isinstance(val, np.int64):
+    if isinstance(val, (np.int64, np.int32)):
         val = int(val)
     elif isinstance(val, np.bool8):
         val = bool(val)

jreback · 2013-09-30T16:21:41Z

@jmcnamara absolutely......

do a PR for that? (maybe use com.is_integer_dtype and not isinstance(np.bool_)) (which will select integer with no datetime/timedelta and not bool)

jmcnamara · 2013-09-30T16:56:18Z

@jreback Ok. I'll write a test and do a PR.

jreback · 2013-10-14T01:52:20Z

closed by #5122

ghost assigned jtratner Sep 5, 2013

jreback mentioned this issue Sep 22, 2013

ENH: Added xlsxwriter as an ExcelWriter option. #4857

Merged

jmcnamara mentioned this issue Oct 6, 2013

TST: Tests and fix for unhandled data types in Excel writers. #5122

Merged

jreback closed this as completed Oct 14, 2013

wesm unassigned jtratner Oct 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExcelWriter fails with columns with int32 dtype #3122

ExcelWriter fails with columns with int32 dtype #3122

lbeltrame commented Mar 21, 2013

jreback commented Mar 21, 2013

jreback commented Mar 21, 2013

lbeltrame commented Mar 21, 2013

jreback commented Mar 21, 2013

jtratner commented Sep 15, 2013

jreback commented Sep 15, 2013

jreback commented Sep 15, 2013

jtratner commented Sep 15, 2013

jtratner commented Sep 15, 2013

jreback commented Sep 15, 2013

jtratner commented Sep 15, 2013

jtratner commented Sep 22, 2013

jreback commented Sep 30, 2013

jmcnamara commented Sep 30, 2013

jreback commented Sep 30, 2013

jmcnamara commented Sep 30, 2013

jreback commented Sep 30, 2013

jmcnamara commented Sep 30, 2013

jreback commented Oct 14, 2013

ExcelWriter fails with columns with int32 dtype #3122

ExcelWriter fails with columns with int32 dtype #3122

Comments

lbeltrame commented Mar 21, 2013

jreback commented Mar 21, 2013

jreback commented Mar 21, 2013

lbeltrame commented Mar 21, 2013

jreback commented Mar 21, 2013

jtratner commented Sep 15, 2013

jreback commented Sep 15, 2013

jreback commented Sep 15, 2013

jtratner commented Sep 15, 2013

jtratner commented Sep 15, 2013

jreback commented Sep 15, 2013

jtratner commented Sep 15, 2013

jtratner commented Sep 22, 2013

jreback commented Sep 30, 2013

jmcnamara commented Sep 30, 2013

jreback commented Sep 30, 2013

jmcnamara commented Sep 30, 2013

jreback commented Sep 30, 2013

jmcnamara commented Sep 30, 2013

jreback commented Oct 14, 2013