Skip to content

inplace clipping does not account for mixed precision #21476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
crepererum opened this issue Jun 14, 2018 · 2 comments
Open

inplace clipping does not account for mixed precision #21476

crepererum opened this issue Jun 14, 2018 · 2 comments
Labels
Bug inplace Relating to inplace parameter or equivalent

Comments

@crepererum
Copy link
Contributor

crepererum commented Jun 14, 2018

Code Sample, a copy-pastable example if possible

This is just one case that can go wrong, but other edge cases are possible (see listing below)

import numpy as np
import pandas as pd

series = pd.Series([0], dtype=np.float32)
upper = pd.Series([-np.finfo(np.float64).tiny], dtype=np.float64)
series.clip(upper, inplace=True)
assert (series <= upper).all()

Problem description

Series.clip, Series.clip_upper and Series.clip_lower may return wrong results if the bound arguments have a higher precision than the series (e.g. float64 VS float32, but others are possible as well) and inplace=True was given. This is due to the fact that pandas cannot change the dtype of the clipped series in that case and seems to do run some additional conversation. The following edge cases are possible:

  1. the upper bound gets larger when converted to the lower precision (this is the example shown above)
  2. the lower bound gets larger when converted to the lower precision
  3. the upper bound is negative and is that "large" that it cannot be represented by the lower precision float
  4. the lower bound is positive and is that "large" that it cannot be represented by the lower precision float
  5. the range between lower and upper is that tiny that there is no lower precision float possible

Expected Output

For 1. and 2.: find the closest lower-precision float that satisfies the bound check

For 3. and 4.: return -/+inf

For 5.: return NaN

Output of pd.show_versions()

``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.9.49-moby machine: x86_64 processor: byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.1
pytest: 3.4.0
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.0.0
pyarrow: 0.9.0
xarray: None
IPython: 6.1.0
sphinx: 1.6.7
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.8.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

</details>
@gfyoung gfyoung added Numeric Operations Arithmetic, Comparison, and Logical operations Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug and removed Bug labels Jun 15, 2018
@gfyoung
Copy link
Member

gfyoung commented Jun 15, 2018

@crepererum : Could you post the expected output for series.clip_upper(upper, inplace=True) ?

cc @jreback

@crepererum
Copy link
Contributor Author

"Fun" fact: the same issue is present in numpy:

import numpy as np

array = np.array([0], dtype=np.float32)
out = array.copy()
upper = np.array([-np.finfo(np.float64).tiny], dtype=np.float64)
np.clip(array, None, upper, out)

assert (out <= upper).all()

The following seems to work though, and is also the answer to the expected output:

import numpy as np

def clip_upper_inplace(array, upper):
    np.clip(array, None, upper, array)
    wrong = (array > upper)
    array[wrong] = np.nextafter(array[wrong], -np.inf)

array = np.array([0], dtype=np.float32)
upper = np.array([-np.finfo(np.float64).tiny], dtype=np.float64)
clip_upper_inplace(array, upper)

assert (array <= upper).all()

(array is [-1.e-45] at this point)

@mroeschke mroeschke added the Bug label May 13, 2020
@mroeschke mroeschke removed Numeric Operations Arithmetic, Comparison, and Logical operations Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jun 20, 2021
@jbrockmendel jbrockmendel added the inplace Relating to inplace parameter or equivalent label Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug inplace Relating to inplace parameter or equivalent
Projects
None yet
Development

No branches or pull requests

4 participants