BUG: read_excel doesn't respect string data #11331

chris-b1 · 2015-10-15T00:22:32Z

From SO

In [16]: df = pd.DataFrame({'a': ['001','002']})

In [17]: df.to_excel('temp.xlsx')

In [18]: pd.read_excel('temp.xlsx')
Out[18]: 
   a
0  1
1  2

I think it would probably make sense for read_excel to not try and convert strings to numeric, or at least have another keyword argument.

The text was updated successfully, but these errors were encountered:

jreback · 2015-10-15T00:27:09Z

xref #8212

this should be done via the dtype kw (and not s converter) as it's more consistent with how the other parsers work

and it should be coercing normally as that is the expected behavior

chris-b1 · 2015-10-15T00:30:13Z

I guess my point is that (unlike csv, etc) Excel numeric data will already be de-serialized as a python numeric type, so the only data that will be strings are those explicitly stored as strings in Excel. So a different default could make sense?

jreback · 2015-10-15T00:32:52Z

zipcodes are usually stored as integers with a format
not sure u can detect this (but maybe it's not being taken into account)

chris-b1 · 2015-10-15T00:41:32Z

Right, if that were the case I think the data should still be read as integers. In this case the zipcodes were stored as Excel text (i.e. if you typed '00500)

In [25]: ws = xlrd.open_workbook('test.xlsx').sheet_by_index(0)

In [26]: ws.cell_value(1, 0)
Out[26]: 55.0

In [27]: ws.cell_value(2, 0)
Out[27]: u'00500'

jreback · 2015-10-15T00:43:29Z

ok then might be a bug then

stevemaughan · 2015-10-15T01:17:25Z

Zipcodes are usually stored as strings. You also have zip+4 which also need to be stored as strings e.g. 32771-5407

jreback · 2015-12-30T15:21:00Z

dupe of #8212

jreback added IO Data IO issues that don't fit into a more specific label Dtype Conversions Unexpected or buggy dtype conversions IO Excel read_excel, to_excel Bug labels Oct 15, 2015

jreback closed this as completed Dec 30, 2015

jreback added the Duplicate Report Duplicate issue or pull request label Dec 30, 2015

chris-b1 mentioned this issue Apr 26, 2018

ENH: read_excel respect Excel text type for numbers #20828

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: read_excel doesn't respect string data #11331

BUG: read_excel doesn't respect string data #11331

chris-b1 commented Oct 15, 2015

jreback commented Oct 15, 2015

chris-b1 commented Oct 15, 2015

jreback commented Oct 15, 2015

chris-b1 commented Oct 15, 2015

jreback commented Oct 15, 2015

stevemaughan commented Oct 15, 2015

jreback commented Dec 30, 2015

BUG: read_excel doesn't respect string data #11331

BUG: read_excel doesn't respect string data #11331

Comments

chris-b1 commented Oct 15, 2015

jreback commented Oct 15, 2015

chris-b1 commented Oct 15, 2015

jreback commented Oct 15, 2015

chris-b1 commented Oct 15, 2015

jreback commented Oct 15, 2015

stevemaughan commented Oct 15, 2015

jreback commented Dec 30, 2015