read_excel: Wrong dtypes for numeric text fields #11927

suokunlong · 2015-12-30T09:41:15Z

In the following example, df column a are exported as text (string) in xlsx file, but imported back to df2 as int.

In [161]: df = pd.DataFrame({'a':['01','02','03','04'], 'b':[5,6,7,8]})

In [162]: df
Out[162]:
a b
0 01 5
1 02 6
2 03 7
3 04 8

In [163]: df.dtypes
Out[163]:
a object
b int64
dtype: object

In [164]: df.to_excel('tmp.xlsx')

In [165]: df2 = pd.read_excel('tmp.xlsx')

In [166]: df2
Out[166]:
a b
0 1 5
1 2 6
2 3 7
3 4 8

In [167]: df2.dtypes
Out[167]:
a int64
b int64
dtype: object

Users usually set columns (which contain numeric data) as "text" in excel for special purposes, pandas should keep these dtypes as str, please do not try to convert it back to numberic. At least, pandas should provide an option in pd.read_excel() to switch this dtype conversion.

jreback · 2015-12-30T15:22:19Z

this is a dupe of #8212

jreback added Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request IO Excel read_excel, to_excel labels Dec 30, 2015

jreback closed this as completed Dec 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_excel: Wrong dtypes for numeric text fields #11927

read_excel: Wrong dtypes for numeric text fields #11927

suokunlong commented Dec 30, 2015

jreback commented Dec 30, 2015

read_excel: Wrong dtypes for numeric text fields #11927

read_excel: Wrong dtypes for numeric text fields #11927

Comments

suokunlong commented Dec 30, 2015

jreback commented Dec 30, 2015