Skip to content

BUG: read_excel doesn't respect string data #11331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chris-b1 opened this issue Oct 15, 2015 · 7 comments
Closed

BUG: read_excel doesn't respect string data #11331

chris-b1 opened this issue Oct 15, 2015 · 7 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request IO Data IO issues that don't fit into a more specific label IO Excel read_excel, to_excel

Comments

@chris-b1
Copy link
Contributor

From SO

In [16]: df = pd.DataFrame({'a': ['001','002']})

In [17]: df.to_excel('temp.xlsx')

In [18]: pd.read_excel('temp.xlsx')
Out[18]: 
   a
0  1
1  2

I think it would probably make sense for read_excel to not try and convert strings to numeric, or at least have another keyword argument.

@jreback
Copy link
Contributor

jreback commented Oct 15, 2015

xref #8212

this should be done via the dtype kw (and not s converter) as it's more consistent with how the other parsers work

and it should be coercing normally as that is the expected behavior

@chris-b1
Copy link
Contributor Author

I guess my point is that (unlike csv, etc) Excel numeric data will already be de-serialized as a python numeric type, so the only data that will be strings are those explicitly stored as strings in Excel. So a different default could make sense?

@jreback
Copy link
Contributor

jreback commented Oct 15, 2015

zipcodes are usually stored as integers with a format
not sure u can detect this (but maybe it's not being taken into account)

@chris-b1
Copy link
Contributor Author

Right, if that were the case I think the data should still be read as integers. In this case the zipcodes were stored as Excel text (i.e. if you typed '00500)

image

In [25]: ws = xlrd.open_workbook('test.xlsx').sheet_by_index(0)

In [26]: ws.cell_value(1, 0)
Out[26]: 55.0

In [27]: ws.cell_value(2, 0)
Out[27]: u'00500'

@jreback
Copy link
Contributor

jreback commented Oct 15, 2015

ok then might be a bug then

@stevemaughan
Copy link

Zipcodes are usually stored as strings. You also have zip+4 which also need to be stored as strings e.g. 32771-5407

@jreback jreback added IO Data IO issues that don't fit into a more specific label Dtype Conversions Unexpected or buggy dtype conversions IO Excel read_excel, to_excel Bug labels Oct 15, 2015
@jreback
Copy link
Contributor

jreback commented Dec 30, 2015

dupe of #8212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request IO Data IO issues that don't fit into a more specific label IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

3 participants