Skip to content

ExcelWriter fails with columns with int32 dtype #3122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lbeltrame opened this issue Mar 21, 2013 · 19 comments
Closed

ExcelWriter fails with columns with int32 dtype #3122

lbeltrame opened this issue Mar 21, 2013 · 19 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions IO Data IO issues that don't fit into a more specific label IO Excel read_excel, to_excel
Milestone

Comments

@lbeltrame
Copy link
Contributor

Test case

In [21]: df = pandas.DataFrame({"foo": [1,2,3]})

In [22]: df["foo"] = df["foo"].astype("int32")

In [23]: df.to_excel("foo.xls")
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-23-a8d956752923> in <module>()
----> 1 df.to_excel("foo.xls")

/usr/lib64/python2.7/site-packages/pandas-0.11.0.dev_4e0465d-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in to_excel(self, excel_writer, sheet_name, na_rep, float_format, cols, header, index, index_label, startrow, startcol)
   1406         formatted_cells = formatter.get_formatted_cells()
   1407         excel_writer.write_cells(formatted_cells, sheet_name,
-> 1408                                  startrow=startrow, startcol=startcol)
   1409         if need_save:
   1410             excel_writer.save()

/usr/lib64/python2.7/site-packages/pandas-0.11.0.dev_4e0465d-py2.7-linux-x86_64.egg/pandas/io/parsers.pyc in write_cells(self, cells, sheet_name, startrow, startcol)
   2194             self._writecells_xlsx(cells, sheet_name, startrow, startcol)
   2195         else:
-> 2196             self._writecells_xls(cells, sheet_name, startrow, startcol)
   2197
   2198     def _writecells_xlsx(self, cells, sheet_name, startrow, startcol):

/usr/lib64/python2.7/site-packages/pandas-0.11.0.dev_4e0465d-py2.7-linux-x86_64.egg/pandas/io/parsers.pyc in _writecells_xls(self, cells, sheet_name, startrow, startcol)
   2270                 wks.write(startrow + cell.row,
   2271                           startcol + cell.col,
-> 2272                           val, style)

/usr/lib/python2.7/site-packages/xlwt/Worksheet.pyc in write(self, r, c, label, style)
   1030
   1031     def write(self, r, c, label="", style=Style.default_style):
-> 1032         self.row(r).write(c, label, style)
   1033
   1034     def write_rich_text(self, r, c, rich_text_list, style=Style.default_style):

/usr/lib/python2.7/site-packages/xlwt/Row.pyc in write(self, col, label, style)
    257             self.__rich_text_helper(col, label, style, style_index)
    258         else:
--> 259             raise Exception("Unexpected data type %r" % type(label))
    260
    261     def set_cell_rich_text(self, col, rich_text_list, style=Style.default_style):

Exception: Unexpected data type <type 'numpy.int32'>
@jreback
Copy link
Contributor

jreback commented Mar 21, 2013

I think Excel writer needs to do some type conversions as xlwt only seems to deal with pure python types
see

https://github.com/python-excel/xlwt/issues/15

@jreback
Copy link
Contributor

jreback commented Mar 21, 2013

prob can get done in 0.12

@lbeltrame
Copy link
Contributor Author

In data giovedì 21 marzo 2013 07:41:18, jreback ha scritto:

prob can get done in 0.12

No big deal. I noticed it seems to work if I upcast to int64.

Luca Beltrame - KDE Forums team
KDE Science supporter
GPG key ID: 6E1A4E79

@jreback
Copy link
Contributor

jreback commented Mar 21, 2013

FYI...this actually wasn't a bug in 0.10.1 because the astype that you are doing, while converting the series, upcasted again when put back in the data frame (but in 0.11 the dtype in the frame is correct)

so going to leave open for fixing later

In [2]: import pandas

In [3]: df = pandas.DataFrame({"foo": [1,2,3]})

In [5]: df.dtypes
Out[5]: foo    int64

In [7]: df['foo'].astype('int32').dtype
Out[7]: dtype('int32')

In [8]: df['foo'] = df['foo'].astype('int32')

In [9]: df.dtypes
Out[9]: foo    int64

@ghost ghost assigned jtratner Sep 5, 2013
@jtratner
Copy link
Contributor

I'm thinking we create a function that, based on numpy version/environment, converts columns to things that are picked up as int/float/long and that will be used by the cell formatter.

Does int64 need to be special-cased to long in 32 bit python?

@jreback
Copy link
Contributor

jreback commented Sep 15, 2013

this I think is an excel conversion issue in that it doesn't know how to deal with int32
(not sure if it can at all; if it cannot then the excel converters should ensure 64bit dtypes for float/int)

@jreback
Copy link
Contributor

jreback commented Sep 15, 2013

@jtratner I think it's the other way around - since we normally always use int64 wrt the python version
unless its explicitly converted

@jtratner
Copy link
Contributor

@jreback well, it just needs to be converted to a format that's a subclass of int/float/whatever.

@jtratner
Copy link
Contributor

so in 32bit python int32 is a subclass of int, in 64bit python int64 is a subclass of int.

@jreback
Copy link
Contributor

jreback commented Sep 15, 2013

ok so this IS platform dep on the windows side then? weird

@jtratner
Copy link
Contributor

@jreback not sure...I'm just going off of what that guy showed was happening in the linked issue

@jtratner
Copy link
Contributor

@lbeltrame - very soon you can use the xlsxwriter engine to handle this.

@jreback
Copy link
Contributor

jreback commented Sep 30, 2013

cc @jmcnamara, does xlxswriter handle int32 dtypes? (or is this an excel issue)

@jmcnamara
Copy link
Contributor

@jreback

Xlsxwriter will handle int32 and any numpy/pandas numeric data types that can be mapped to an IEE754 float (which is the only numeric type that Excel has). Running the failing test case with xlsxwriter doesn't raise an exception:

$ python
Python 2.7.3 (default, Jul  5 2013, 08:39:51)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> df = pandas.DataFrame({"foo": [1,2,3]})
>>> df["foo"] = df["foo"].astype("int32")
>>> df.to_excel("foo.xlsx", engine='xlsxwriter')
>>> # No exception

So, xlsxwriter or openpyxl don't have this issue.

The reason that it fails with a .xls output file is that xlwt uses a whitelist for supported types and int32 isn't identified as a int, long or float. This may be an pandas/numpy type issue, see:

https://github.com/python-excel/xlwt/issues/15

@jreback
Copy link
Contributor

jreback commented Sep 30, 2013

ok...leaving open as their needs to be a conversion done for xlwt

@jmcnamara
Copy link
Contributor

@jreback

needs to be a conversion done for xlwt

In io/excel.py there is the following function that I think was included to solve this issue (or similar issues).

def _conv_value(val):
    # convert value for excel dump
    if isinstance(val, np.int64):
        val = int(val)
    elif isinstance(val, np.bool8):
        val = bool(val)
    elif isinstance(val, Period):
        val = "%s" % val

    return val

Changing it to something like would fix this issue (although there may be other cases to account for as well):

$ git diff
diff --git a/pandas/io/excel.py b/pandas/io/excel.py
index 6b83fad..97af5be 100644
--- a/pandas/io/excel.py
+++ b/pandas/io/excel.py
@@ -301,7 +301,7 @@ def _trim_excel_header(row):

 def _conv_value(val):
     # convert value for excel dump
-    if isinstance(val, np.int64):
+    if isinstance(val, (np.int64, np.int32)):
         val = int(val)
     elif isinstance(val, np.bool8):
         val = bool(val)

@jreback
Copy link
Contributor

jreback commented Sep 30, 2013

@jmcnamara absolutely......

do a PR for that? (maybe use com.is_integer_dtype and not isinstance(np.bool_)) (which will select integer with no datetime/timedelta and not bool)

@jmcnamara
Copy link
Contributor

@jreback Ok. I'll write a test and do a PR.

@jreback
Copy link
Contributor

jreback commented Oct 14, 2013

closed by #5122

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions IO Data IO issues that don't fit into a more specific label IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

4 participants