Skip to content

BUG: comparison warning with NaN when DataFrame/Series broadcasting #16378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Zhang18 opened this issue May 17, 2017 · 10 comments · Fixed by #16433
Closed

BUG: comparison warning with NaN when DataFrame/Series broadcasting #16378

Zhang18 opened this issue May 17, 2017 · 10 comments · Fixed by #16433
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@Zhang18
Copy link

Zhang18 commented May 17, 2017

Code Sample

>>> df = pd.DataFrame({'A' : [np.nan, 2.0, np.nan]})
>>> df
     A
0  NaN
1  2.0
2  NaN
>>> x = pd.Series([1, 1, 1])
>>> x
0    1
1    1
2    1
dtype: int64
>>> df.clip_lower(x, axis=0)
/usr/local/lib/python3.5/dist-packages/pandas/core/ops.py:1253: RuntimeWarning: invalid value encountered in greater_equal
  result = op(x, y)
     A
0  NaN
1  2.0
2  NaN

>>> df.clip_lower(x, axis=0)
     A
0  NaN
1  2.0
2  NaN

Problem description

So the RuntimeWarning is only thrown the first time the clip is called. But not on subsequent times. This is inconsistent.

Expected Output

I propose to suppress this warning altogether as the triggering use cases seem to be completely common and fully legit.

Output of pd.show_versions()

>>> pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-75-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 20.7.0
Cython: None
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

This repeatedly shows warnings for me. Do you perhaps set a warningfilter somewhere like

import warnings
warnings.simplefilter('once', RuntimeWarning)

an easy way to test is to call

In [19]: def f():
    ...:     raise RuntimeWarning("test")
    ...:     return 2

a couple times.

@Zhang18
Copy link
Author

Zhang18 commented May 17, 2017

Here is a continuous screen output per your suggestion. Apparently f() keeps raise warnings whereas clip doesn't.

Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def f():
...     raise RuntimeWarning("test")
...     return 1
...
>>> f()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in f
RuntimeWarning: test
>>> f()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in f
RuntimeWarning: test
>>> f()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in f
RuntimeWarning: test
>>> df = pd.DataFrame({'A' : [np.nan, 2.0, np.nan]})
>>> x = pd.Series([1, 1, 1])
>>> df.clip_lower(x, axis=0)
/usr/local/lib/python3.5/dist-packages/pandas/core/ops.py:1253: RuntimeWarning: invalid value encountered in greater_equal
  result = op(x, y)
     A
0  NaN
1  2.0
2  NaN
>>> df.clip_lower(x, axis=0)
     A
0  NaN
1  2.0
2  NaN

@TomAugspurger
Copy link
Contributor

Can you post the output of warnings.filters? It seems unlikely, but it's possible to filter warnings form specific libraries.

@Zhang18
Copy link
Author

Zhang18 commented May 17, 2017

Here you go:

>>> pprint.pprint(warnings.filters)
[('ignore',
  re.compile('numpy.ndarray size changed', re.IGNORECASE),
  <class 'Warning'>,
  re.compile(''),
  0),
 ('ignore',
  re.compile('numpy.ufunc size changed', re.IGNORECASE),
  <class 'Warning'>,
  re.compile(''),
  0),
 ('ignore',
  re.compile('numpy.dtype size changed', re.IGNORECASE),
  <class 'Warning'>,
  re.compile(''),
  0),
 ('always', None, <class 'numpy.lib.polynomial.RankWarning'>, None, 0),
 ('default', None, <class 'urllib3.exceptions.SNIMissingWarning'>, None, 0),
 ('default', None, <class 'urllib3.exceptions.SubjectAltNameWarning'>, None, 0),
 ('ignore', None, <class 'DeprecationWarning'>, None, 0),
 ('ignore', None, <class 'PendingDeprecationWarning'>, None, 0),
 ('ignore', None, <class 'ImportWarning'>, None, 0),
 ('ignore', None, <class 'BytesWarning'>, None, 0),
 ('ignore', None, <class 'ResourceWarning'>, None, 0),
 ('always', None, <class 'urllib3.exceptions.SecurityWarning'>, None, 0),
 ('default',
  None,
  <class 'urllib3.exceptions.InsecurePlatformWarning'>,
  None,
  0),
 ('default', None, <class 'requests.exceptions.FileModeWarning'>, None, 0)]

@Zhang18
Copy link
Author

Zhang18 commented May 17, 2017

But inconsistency aside, do you agree this warning is unnecessary and should be removed?

@TomAugspurger
Copy link
Contributor

Sorry, missed that section of your original post. I think that was fixed by #16373 Can you try on master?

@jreback
Copy link
Contributor

jreback commented May 17, 2017

This is a different issue.
This has to do with broadcasting of a Series and a DataFrame.

In [6]: df = pd.DataFrame({'A' : [np.nan, 2.0, np.nan]})

In [7]: x = pd.Series([1, 1, 1])

In [8]: df.ge(x, axis=0)
/Users/jreback/pandas/pandas/core/ops.py:1253: RuntimeWarning: invalid value encountered in greater_equal
  result = op(x, y)
Out[8]: 
       A
0  False
1   True
2  False

@jreback
Copy link
Contributor

jreback commented May 17, 2017

I recall the same issue, but can't find it w/o a quick search. @TomAugspurger any idea?

@jreback jreback changed the title clip throw inconsistent warnings against NaN BUG: comparison warning with NaN when DataFrame/Series broadcasting May 17, 2017
@jreback jreback added Bug Difficulty Novice Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations labels May 17, 2017
@jreback jreback added this to the Next Major Release milestone May 17, 2017
@jreback
Copy link
Contributor

jreback commented May 17, 2017

note that this fix is the same as in #16373, e.g. wrapping with np.errstate

@andrewarcher
Copy link
Contributor

I'm going to start working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants