Skip to content

.loc indexing of heterogeneous dataframe returns different dtype #15220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wattse opened this issue Jan 25, 2017 · 2 comments
Closed

.loc indexing of heterogeneous dataframe returns different dtype #15220

wattse opened this issue Jan 25, 2017 · 2 comments
Labels
Bug Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@wattse
Copy link

wattse commented Jan 25, 2017

Code Sample

import pandas as pd
import numpy as np

#working data
names = ['d1', 'd2', 'd3', 'd4', 'd5']
formats = ['u1', '<f8', 'u1', 'u1', 'u1']
dtype = dict(names=names, formats=formats)
data = {'d1':[1,11], 'd2':[2,12], 'd3':[3,13], 'd4':[4,14], 'd5':[5,15]}

#create a pandas dataframe with uint8 variables except for a double in d2 slot.
df_create = np.rec.fromarrays(data.values(), dtype=dtype, names=data.keys())
df_create = pd.DataFrame(df_create)
df_create.loc[:,'d2'] *= 0.12345

#create a pandas dataframe with all variables uint8
df_mod = pd.DataFrame.from_dict(data, dtype=np.dtype('u1'))
#convert d2 to double and modify
df_mod.loc[:,'d2'] = df_mod.loc[:,'d2'].astype(np.dtype('float64'))
df_mod.loc[:,'d2'] *= 0.12345


print('type of df_create.loc[0,\'d1\']: {}'.format(type(df_create.loc[0,'d1'])))
print('type of df_create.loc[0,\'d2\']: {}'.format(type(df_create.loc[0,'d2'])))
print('type of df_create.iloc[0,0]: {}'.format(type(df_create.iloc[0,2])))
print('type of df_create.iloc[0,1]: {}'.format(type(df_create.iloc[0,1])))
print('type of df_create.ix[0,0]: {}'.format(type(df_create.iloc[0,2])))
print('type of df_create.ix[0,1]: {}'.format(type(df_create.iloc[0,1])))
print('')
print('type of df_mod.loc[0,\'d1\']: {}'.format(type(df_mod.loc[0,'d1'])))
print('type of df_mod.loc[0,\'d2\']: {}'.format(type(df_mod.loc[0,'d2'])))
print('type of df_mod.iloc[0,0]: {}'.format(type(df_mod.iloc[0,2])))
print('type of df_mod.iloc[0,1]: {}'.format(type(df_mod.iloc[0,1])))
print('type of df_mod.ix[0,0]: {}'.format(type(df_mod.iloc[0,2])))
print('type of df_mod.ix[0,1]: {}'.format(type(df_mod.iloc[0,1])))
print('')
print('All dtypes for dataframe df_mod:')
print(df_mod.dtypes)
print('')
pd.show_versions()

produces the following output:
type of df_create.loc[0,'d1']: <class 'numpy.float64'>
type of df_create.loc[0,'d2']: <class 'numpy.float64'>
type of df_create.iloc[0,0]: <class 'numpy.float64'>
type of df_create.iloc[0,1]: <class 'numpy.float64'>
type of df_create.ix[0,0]: <class 'numpy.float64'>
type of df_create.ix[0,1]: <class 'numpy.float64'>

type of df_mod.loc[0,'d1']: <class 'numpy.float64'>
type of df_mod.loc[0,'d2']: <class 'numpy.float64'>
type of df_mod.iloc[0,0]: <class 'numpy.float64'>
type of df_mod.iloc[0,1]: <class 'numpy.float64'>
type of df_mod.ix[0,0]: <class 'numpy.float64'>
type of df_mod.ix[0,1]: <class 'numpy.float64'>

All dtypes for dataframe df_mod:
d1 uint8
d2 float64
d3 uint8
d4 uint8
d5 uint8
dtype: object

Problem description

As far as I can see, the indexing mechanism is converting non-float64s to float64s. According to the dtypes, the internal representation of the data remains uint8, but when exposed by some indexing mechanism, the uint8 data is converted to float64.

Expected Output

type of df_create.loc[0,'d1']: <class 'numpy.uint8'>
type of df_create.loc[0,'d2']: <class 'numpy.float64'>
type of df_create.iloc[0,0]: <class 'numpy.uint8'>
type of df_create.iloc[0,1]: <class 'numpy.float64'>
type of df_create.ix[0,0]: <class 'numpy.uint8'>
type of df_create.ix[0,1]: <class 'numpy.float64'>

type of df_mod.loc[0,'d1']: <class 'numpy.uint8'>
type of df_mod.loc[0,'d2']: <class 'numpy.float64'>
type of df_mod.iloc[0,0]: <class 'numpy.uint8'>
type of df_mod.iloc[0,1]: <class 'numpy.float64'>
type of df_mod.ix[0,0]: <class 'numpy.uint8'>
type of df_mod.ix[0,1]: <class 'numpy.float64'>

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 30 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.23.4
numpy: 1.12.0
scipy: 0.18.0
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.4.0
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None

@jorisvandenbossche
Copy link
Member

@wattse Thanks for the report! This is a duplicate of #11617 and #14205, and recently fixed: PR #15120

So on pandas master version, I get the following output:

type of df_create.loc[0,'d1']: <class 'numpy.uint8'>
type of df_create.loc[0,'d2']: <class 'numpy.float64'>
type of df_create.iloc[0,0]: <class 'numpy.uint8'>
type of df_create.iloc[0,1]: <class 'numpy.float64'>
type of df_create.ix[0,0]: <class 'numpy.uint8'>
type of df_create.ix[0,1]: <class 'numpy.float64'>

type of df_mod.loc[0,'d1']: <class 'numpy.uint8'>
type of df_mod.loc[0,'d2']: <class 'numpy.float64'>
type of df_mod.iloc[0,0]: <class 'numpy.uint8'>
type of df_mod.iloc[0,1]: <class 'numpy.float64'>
type of df_mod.ix[0,0]: <class 'numpy.uint8'>
type of df_mod.ix[0,1]: <class 'numpy.float64'>

@jorisvandenbossche jorisvandenbossche added Bug Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves labels Jan 25, 2017
@jorisvandenbossche jorisvandenbossche added this to the 0.20.0 milestone Jan 25, 2017
@wattse
Copy link
Author

wattse commented Jan 27, 2017

Thanks Joris, apologies for the duplicate! My searching was not detailed enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

2 participants