read_excel: Wrong dtypes for numeric text fields #11927
Labels
Dtype Conversions
Unexpected or buggy dtype conversions
Duplicate Report
Duplicate issue or pull request
IO Excel
read_excel, to_excel
In the following example, df column a are exported as text (string) in xlsx file, but imported back to df2 as int.
In [161]: df = pd.DataFrame({'a':['01','02','03','04'], 'b':[5,6,7,8]})
In [162]: df
Out[162]:
a b
0 01 5
1 02 6
2 03 7
3 04 8
In [163]: df.dtypes
Out[163]:
a object
b int64
dtype: object
In [164]: df.to_excel('tmp.xlsx')
In [165]: df2 = pd.read_excel('tmp.xlsx')
In [166]: df2
Out[166]:
a b
0 1 5
1 2 6
2 3 7
3 4 8
In [167]: df2.dtypes
Out[167]:
a int64
b int64
dtype: object
Users usually set columns (which contain numeric data) as "text" in excel for special purposes, pandas should keep these dtypes as str, please do not try to convert it back to numberic. At least, pandas should provide an option in pd.read_excel() to switch this dtype conversion.
The text was updated successfully, but these errors were encountered: