Skip to content

BUG: clip doesn't preserve dtype by column #24458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 28, 2018

Conversation

minggli
Copy link
Contributor

@minggli minggli commented Dec 28, 2018

>>> import pandas as pd
>>> data = pd.DataFrame([[1, 2], [3, 4]], columns=['int1', 'int2'])
>>> data.clip_upper(1)
__main__:1: FutureWarning: clip_upper(threshold) is deprecated, use clip(upper=threshold) instead
   int1  int2
0     1     1
1     1     1

Before

>>> data['float'] = data['int1'].astype(float)
>>> data.clip_upper(1)
   int1  int2  float
0   1.0   1.0    1.0
1   1.0   1.0    1.0

After

>>> data['float'] = data['int1'].astype(float)
>>> data.clip_upper(1)
   int1  int2  float
0     1     1    1.0
1     1     1    1.0

@jreback
Copy link
Contributor

jreback commented Dec 28, 2018

looks pretty good - pls add. whatsnew note
can u ensure that the OP test is covered?

i suspect we have a couple of open issues for clip that might be closed by this as well

if u can have a look

@jreback
Copy link
Contributor

jreback commented Dec 28, 2018

#20911 and #21476 maybe

@minggli
Copy link
Contributor Author

minggli commented Dec 28, 2018

#20911 and #21476 maybe
Thanks @jreback sure let me check them out!

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions labels Dec 28, 2018
@jreback jreback added this to the 0.24.0 milestone Dec 28, 2018
subset = self.le(upper, axis=None) | isna(result)
result = result.where(subset, upper, axis=None, inplace=inplace)
if lower is not None:
if inplace:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this inplace check only when lower is not None? It looks like you also assign result = self before the upper and lower check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mroeschke, thanks for reviewing!

result = self is to ensure there is a return value in case both lower and upper are None, which they can. when inplace=True, result is None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its just better to use the existing idiom

if inplace:
     self._update_inplace(result0
else:
    return self

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actioned. thanks.

@codecov
Copy link

codecov bot commented Dec 28, 2018

Codecov Report

Merging #24458 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24458      +/-   ##
==========================================
- Coverage   92.29%   92.29%   -0.01%     
==========================================
  Files         163      163              
  Lines       51969    51964       -5     
==========================================
- Hits        47967    47962       -5     
  Misses       4002     4002
Flag Coverage Δ
#multiple 90.7% <100%> (-0.01%) ⬇️
#single 43.02% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/generic.py 96.62% <100%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24a50fc...7235940. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 28, 2018

Codecov Report

Merging #24458 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24458      +/-   ##
==========================================
- Coverage   92.29%   92.29%   -0.01%     
==========================================
  Files         163      163              
  Lines       51969    51943      -26     
==========================================
- Hits        47967    47940      -27     
- Misses       4002     4003       +1
Flag Coverage Δ
#multiple 90.7% <100%> (-0.01%) ⬇️
#single 43% <0%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 96.62% <100%> (-0.01%) ⬇️
pandas/core/indexes/datetimelike.py 97.56% <0%> (-0.15%) ⬇️
pandas/core/indexes/datetimes.py 96.2% <0%> (-0.11%) ⬇️
pandas/core/indexes/period.py 92.71% <0%> (-0.05%) ⬇️
pandas/util/testing.py 87.75% <0%> (ø) ⬆️
pandas/core/indexes/accessors.py 91% <0%> (ø) ⬆️
pandas/core/indexes/timedeltas.py 90.28% <0%> (+0.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24a50fc...fab5c99. Read the comment docs.

@minggli
Copy link
Contributor Author

minggli commented Dec 28, 2018

@jreback,

had look at #20911, I think that's not a supported use case when passing a DataFrame for either lower or upper, which is essentially used as a numpy array for creating a mask, having ignored the column orders.

#21476 is to do with different precisions between s and threshold. It works fine downcasting from int to float but appears not from float32 to float64. I suspect it's to do with numpy and need more investigation.

@minggli
Copy link
Contributor Author

minggli commented Dec 28, 2018

>>> df.clip(lower=2, upper=7)
   floats  int
0     2.0    2
1     2.0    2
2     2.2    2
3     3.3    3
4     4.4    4
5     5.5    5
6     6.6    6
7     7.0    7
8     7.0    7
9     7.0    7

>>> df.clip(lower=2, upper=7.5)
   floats  int
0     2.0  2.0
1     2.0  2.0
2     2.2  2.0
3     3.3  3.0
4     4.4  4.0
5     5.5  5.0
6     6.6  6.0
7     7.5  7.0
8     7.5  7.5
9     7.5  7.5

subset = self.le(upper, axis=None) | isna(result)
result = result.where(subset, upper, axis=None, inplace=inplace)
if lower is not None:
if inplace:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its just better to use the existing idiom

if inplace:
     self._update_inplace(result0
else:
    return self

@minggli
Copy link
Contributor Author

minggli commented Dec 28, 2018

@jreback

review actioned and added test and documentation.

@jreback
Copy link
Contributor

jreback commented Dec 28, 2018

thanks. ping on green.

@minggli
Copy link
Contributor Author

minggli commented Dec 28, 2018

@jreback thanks, it's green.

@jreback jreback merged commit d1b2a52 into pandas-dev:master Dec 28, 2018
@jreback
Copy link
Contributor

jreback commented Dec 28, 2018

thanks @minggli

@minggli
Copy link
Contributor Author

minggli commented Dec 28, 2018

@jreback @mroeschke thanks for reviewing and happy new year!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.clip_upper does not preserve dtype per column
3 participants