regression in 0.24: np.clip does not work for some values of a_min #25066

sam-s · 2019-02-01T00:10:20Z

Code Sample

import pandas as pd
import numpy as np
print(pd.__name__,pd.__version__,np.__name__,np.__version__)
np.clip(pd.Series([0,1,0,1]), 1e-7, 1)
np.clip(pd.Series([0,1,0,1]), 1e-8, 1)
np.clip(pd.Series([0.0,1.0,0.0,1.0]), 1e-8, 1)

Problem description

With pandas 0.23.4, numpy.clip works as expected, producing a Series of type float64 for any min argument.

With pandas 0.24.0, it works up to 1e-7 but not for 1e-8

>>> np.clip(pd.Series([0,1,0,1]), 1e-7, 1)
0    1.000000e-07
1    1.000000e+00
2    1.000000e-07
3    1.000000e+00
dtype: float64
>>> np.clip(pd.Series([0,1,0,1]), 1e-8, 1)
0    0
1    1
2    0
3    1
dtype: int64

This is an unwelcome change from the previous version.

Expected Output

>>> np.clip(pd.Series([0,1,0,1]), 1e-7, 1)
0    1.000000e-07
1    1.000000e+00
2    1.000000e-07
3    1.000000e+00
dtype: float64
>>> np.clip(pd.Series([0,1,0,1]), 1e-8, 1)
0    1.000000e-08
1    1.000000e+00
2    1.000000e-08
3    1.000000e+00
dtype: float64
>>> np.clip(pd.Series([0,1,0,1]), 1e-15, 1)
0    1.000000e-15
1    1.000000e+00
2    1.000000e-15
3    1.000000e+00
dtype: float64

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.7.2.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: C LOCALE: en_US.UTF-8

pandas: 0.24.0
pytest: None
pip: 19.0.1
setuptools: 40.7.0
Cython: None
numpy: 1.16.0
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: 1.1.2
lxml.etree: 4.3.0
bs4: None
html5lib: None
sqlalchemy: 1.2.17
pymysql: None
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2019-02-01T11:40:11Z

@sam-s thanks for the report! I can confirm the issue.
I suppose this is related to trying to preserve the dtype: #24458.

IMO it would make sense to follow numpy behaviour here to only preserve the dtype if the passed lower/upper match the dtype.

gfyoung · 2019-02-02T08:27:32Z

@sam-s @jorisvandenbossche : I tracked down the regression, and it boils down to this:

When tiny floats and integers are involved in .where, numpy uses exact numbers and does not round. That's why your examples work before #24458.

In pandas, in the post-processing portion of .where, we actually attempt to cast downcast to int if they are "sufficiently" close to one another (via np.allclose). Turns out you hit the tolerance on the nose with 1e-8 (we pass in atol=1e-8). Thus, the result of our internal where functionality is correct, but it is the post-processing of said result that is the issue.

Relevant files: generic.py, managers.py, blocks.py --> cast.py (downcasting occurs at block level)

gfyoung · 2019-02-02T09:08:37Z

Reading the docs, this rounding / downcasting behavior is not documented in the docstring for .where, and while it won't manifest itself in many situations, it does in this one. Thus, there are a couple of options here on how to proceed:

Revert BUG: clip doesn't preserve dtype by column #24458 - Not sure I really want to do this though.
Don't downcast results in post-processing. It's not documented, but I see the benefits (memory).
Stricter downcasting procedures so that we don't round results at all like numpy does. This, however, would involve modifying maybe_downcast_to_dtype to make this work.

Thoughts?

cc @jreback

TomAugspurger · 2019-02-02T12:13:27Z

Pushing to 0.24.2, since this looks a little tricky..

jorisvandenbossche · 2019-02-14T22:35:39Z

@gfyoung I didn't look into detail, but from your explanation, I was thinking: couldn't we check the type of the scalar clip value? If the clipping value you provide is a float, then I would say it is OK that the output is also float (even if your original column was int).

On the other hand, long term, we could also try to go to a behaviour where we always preserve the dtype (basically meaning we would cast the clip value to the dtype of the column, so any float clip value would be converted to int). So if you want to clip integers with a clip value like 0.5, you first need to cast them to floats.
But that would require a deprecation cycle (if we would want this of course).

jorisvandenbossche · 2019-02-14T22:37:20Z

a behaviour where we always preserve the dtype

Although, argument against it: numpy doesn't do this. (but numpy does always result in a dtype compatible with the clipping values)

jorisvandenbossche added this to the 0.24.1 milestone Feb 1, 2019

gfyoung added Regression Functionality that used to work in a prior pandas version Numeric Operations Arithmetic, Comparison, and Logical operations labels Feb 2, 2019

TomAugspurger modified the milestones: 0.24.1, 0.24.2 Feb 2, 2019

jorisvandenbossche modified the milestones: 0.24.2, 0.24.3 Mar 11, 2019

jreback modified the milestones: 0.24.3, Contributions Welcome Apr 20, 2019

mroeschke added the Bug label May 11, 2020

mroeschke added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff and removed Numeric Operations Arithmetic, Comparison, and Logical operations labels Jun 26, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regression in 0.24: np.clip does not work for some values of a_min #25066

regression in 0.24: np.clip does not work for some values of a_min #25066

sam-s commented Feb 1, 2019 •

edited

Loading

jorisvandenbossche commented Feb 1, 2019

gfyoung commented Feb 2, 2019 •

edited

Loading

gfyoung commented Feb 2, 2019 •

edited

Loading

TomAugspurger commented Feb 2, 2019

jorisvandenbossche commented Feb 14, 2019

jorisvandenbossche commented Feb 14, 2019

regression in 0.24: np.clip does not work for some values of a_min #25066

regression in 0.24: np.clip does not work for some values of a_min #25066

Comments

sam-s commented Feb 1, 2019 • edited Loading

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

jorisvandenbossche commented Feb 1, 2019

gfyoung commented Feb 2, 2019 • edited Loading

gfyoung commented Feb 2, 2019 • edited Loading

TomAugspurger commented Feb 2, 2019

jorisvandenbossche commented Feb 14, 2019

jorisvandenbossche commented Feb 14, 2019

sam-s commented Feb 1, 2019 •

edited

Loading

Output of `pd.show_versions()`

gfyoung commented Feb 2, 2019 •

edited

Loading

gfyoung commented Feb 2, 2019 •

edited

Loading