DataFrame.clip_upper does not preserve dtype per column #24162

joneugster · 2018-12-08T13:04:26Z

Code Sample

import pandas as pd
data = pd.DataFrame({'INT': [-1, 0, 10, 9],
              'FLOAT': [-0.148, 0.2347, 38.237, 12.2233]},
             index=pd.date_range("20180101 00:00", periods=4))

print('Original data:')
print(data.head())

print('\nThis is probably not a bug but my misunderstanding:')
print('(So how would I apply "clip_upper" inplace on parts of the dataframe?)')
data.loc[[True, True, True, False], ['INT']].clip_upper(8, inplace=True)
print(data.head()) 
# I used then:
# data.loc[[True, True, True, False], ['INT']] = data.loc[[True, True, True, False], ['INT']].clip_upper(8)     

print('\nIt seems that clip_upper does not preserve the dtypes:')
print(data.clip_upper(8).head())

print('\nSame for inplace:')
data.clip_upper(8, inplace=True)
print(data.head())

Output of this code:

Original data:
            INT    FLOAT
2018-01-01   -1  -0.1480
2018-01-02    0   0.2347
2018-01-03   10  38.2370
2018-01-04    9  12.2233

(A) This is probably not a bug but my misunderstanding:
(So how would I apply "clip_upper" inplace on parts of the dataframe?)
            INT    FLOAT
2018-01-01   -1  -0.1480
2018-01-02    0   0.2347
2018-01-03   10  38.2370
2018-01-04    9  12.2233

(B) It seems that clip_upper does not preserve the dtypes:
            INT   FLOAT
2018-01-01 -1.0 -0.1480
2018-01-02  0.0  0.2347
2018-01-03  8.0  8.0000
2018-01-04  8.0  8.0000

(C) Same for inplace:
            INT   FLOAT
2018-01-01 -1.0 -0.1480
2018-01-02  0.0  0.2347
2018-01-03  8.0  8.0000
2018-01-04  8.0  8.0000

Problem description

clip_upper with int- and float- columns convert int-column to float.

Calling data.clip_upper(10) with an integer, I would expect that it leaves the int-column as integers and the float-column as float. However, it converts everything to float. (see (B) and (C))

Moreover, clip_upper with inplace=True does not work with .loc but this might as well be me understanding the concept wrong... (see (A))

Same for clip_lower.

Expected Output

For (A):

            INT    FLOAT
2018-01-01   -1  -0.1480
2018-01-02    0   0.2347
2018-01-03    8  38.2370
2018-01-04    9  12.2233

For (B) and (C):

            INT   FLOAT
2018-01-01 -1 -0.1480
2018-01-02  0  0.2347
2018-01-03  8  8.0000
2018-01-04  8  8.0000

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None

pandas: 0.23.4
pytest: 4.0.1
pip: 18.1
setuptools: 40.6.2
Cython: 0.29
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.2
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.1
openpyxl: 2.5.11
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-12-11T07:13:39Z

The example is very confusing and pulls in a lot of unnecessary elements. Please keep reports minimal in the future.

This is much easier to produce with a small sample:

In [66]: data = pd.DataFrame([[1, 2], [3, 4]], columns=['int1', 'int2']) 
In [66]: data.clip_upper(1)                                                     
Out[66]: 
   int1  int2
0     1     1
1     1     1

In [67]: data['float'] = data['int1'].astype(float)                             

In [68]: data.clip_upper(1)                                                     
Out[68]: 
   int1  int2  float
0   1.0   1.0    1.0
1   1.0   1.0    1.0

dtype should probably be preserved by column though it appears the mere presence of a float casts the entire frame.

Investigation and PRs are always welcome

minggli · 2018-12-27T20:19:59Z

Hi @WillAyd 👋 ,

Happy to look at this issue 🐞 and will revert with a PR 🚀 .

Thanks,

Ming

cgangwar11 · 2018-12-27T21:01:55Z

In [16]: data
Out[16]:
   int  float
0    1    2.0
1    3    4.0
In [17]: axes_dict = data._construct_axes_dict()
['index', 'columns'] None {'index': RangeIndex(start=0, stop=2, step=1), 'columns': Index(['int', 'float'], dtype='object')}

In [18]: result = data._constructor(data.values, **axes_dict).__finalize__(data)

In [19]: result
Out[19]:
   int  float
0  1.0    2.0
1  3.0    4.0

Underlying problem is constructor method which is casting dtype of int column to float.
I will take a look at working of property decorator and create a PR

WillAyd changed the title ~~DataFrame.clip_upper does not preserve dtype~~ DataFrame.clip_upper does not preserve dtype per column Dec 11, 2018

WillAyd added Bug Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Dec 11, 2018

WillAyd added this to the Contributions Welcome milestone Dec 11, 2018

minggli mentioned this issue Dec 28, 2018

BUG: clip doesn't preserve dtype by column #24458

Merged

4 tasks

jreback modified the milestones: Contributions Welcome, 0.24.0 Dec 28, 2018

jreback closed this as completed in #24458 Dec 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.clip_upper does not preserve dtype per column #24162

DataFrame.clip_upper does not preserve dtype per column #24162

joneugster commented Dec 8, 2018

INSTALLED VERSIONS

WillAyd commented Dec 11, 2018

minggli commented Dec 27, 2018

cgangwar11 commented Dec 27, 2018 •

edited

Loading

DataFrame.clip_upper does not preserve dtype per column #24162

DataFrame.clip_upper does not preserve dtype per column #24162

Comments

joneugster commented Dec 8, 2018

Code Sample

Output of this code:

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Dec 11, 2018

minggli commented Dec 27, 2018

cgangwar11 commented Dec 27, 2018 • edited Loading

Output of `pd.show_versions()`

cgangwar11 commented Dec 27, 2018 •

edited

Loading