-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
fix and test index division by zero #19347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 21 commits
969f342
e0e89b9
6acc2f7
84c74c5
06df02a
cd54349
ca3bf42
6fc61bd
ea75c3c
5d7e3ea
0277d9f
78de1a4
1ef3a6c
d648ef6
965f721
37efd51
b51c2e1
afedba9
b8cf21d
9de356a
000aefd
aa969f8
64b0c08
be1e2e1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -176,6 +176,50 @@ Please note that the string `index` is not supported with the round trip format, | |
new_df | ||
print(new_df.index.name) | ||
|
||
.. _whatsnew_0230.enhancements.index_division_by_zero | ||
|
||
Index Division By Zero Fills Correctly | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Division operations on ``Index`` and subclasses will now fill division of positive numbers by zero with ``np.inf``, division of negative numbers by zero with ``-np.inf`` and `0 / 0` with ``np.nan``. This matches existing ``Series`` behavior. (:issue:`19322`, :issue:`19347`) | ||
|
||
Previous Behavior: | ||
|
||
.. code-block:: ipython | ||
|
||
In [6]: index = pd.Int64Index([-1, 0, 1]) | ||
|
||
In [7]: index / 0 | ||
Out[7]: Int64Index([0, 0, 0], dtype='int64') | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. switch the order of these blocks, we always write previous, then current. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. check this after its built to make sure it looks ok. |
||
# Previous behavior yielded different results depending on the type of zero in the divisor | ||
In [8]: index / 0.0 | ||
Out[8]: Float64Index([-inf, nan, inf], dtype='float64') | ||
|
||
In [9]: index = pd.UInt64Index([0, 1]) | ||
|
||
In [10]: index / np.array([0, 0], dtype=np.uint64) | ||
Out[10]: UInt64Index([0, 0], dtype='uint64') | ||
|
||
In [11]: pd.RangeIndex(1, 5) / 0 | ||
ZeroDivisionError: integer division or modulo by zero | ||
|
||
Current Behavior: | ||
|
||
.. ipython:: python | ||
|
||
index = pd.Int64Index([-1, 0, 1]) | ||
# division by zero gives -infinity where negative, +infinity where positive, and NaN for 0 / 0 | ||
index / 0 | ||
|
||
# The result of division by zero should not depend on whether the zero is int or float | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove this case (or add above as well). the point of the previous/current display is to make it easy to map up what is changing. |
||
index / 0.0 | ||
|
||
index = pd.UInt64Index([0, 1]) | ||
index / np.array([0, 0], dtype=np.uint64) | ||
|
||
pd.RangeIndex(1, 5) / 0 | ||
|
||
.. _whatsnew_0230.enhancements.other: | ||
|
||
Other Enhancements | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -550,7 +550,7 @@ def __getitem__(self, key): | |
return super_getitem(key) | ||
|
||
def __floordiv__(self, other): | ||
if is_integer(other): | ||
if is_integer(other) and other != 0: | ||
if (len(self) == 0 or | ||
self._start % other == 0 and | ||
self._step % other == 0): | ||
|
@@ -592,26 +592,27 @@ def _evaluate_numeric_binop(self, other): | |
attrs = self._get_attributes_dict() | ||
attrs = self._maybe_update_attributes(attrs) | ||
|
||
left, right = self, other | ||
if reversed: | ||
self, other = other, self | ||
left, right = right, left | ||
|
||
try: | ||
# apply if we have an override | ||
if step: | ||
with np.errstate(all='ignore'): | ||
rstep = step(self._step, other) | ||
rstep = step(left._step, right) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Side-note: This is still not great; There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use getattr There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Huh? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and remove the AttributeError catching There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you do this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Out of scope. Until I look at this closely, I'm assuming that the author had a reason for doing it this way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok fair enough |
||
|
||
# we don't have a representable op | ||
# so return a base index | ||
if not is_integer(rstep) or not rstep: | ||
raise ValueError | ||
|
||
else: | ||
rstep = self._step | ||
rstep = left._step | ||
|
||
with np.errstate(all='ignore'): | ||
rstart = op(self._start, other) | ||
rstop = op(self._stop, other) | ||
rstart = op(left._start, right) | ||
rstop = op(left._stop, right) | ||
|
||
result = RangeIndex(rstart, | ||
rstop, | ||
|
@@ -627,18 +628,12 @@ def _evaluate_numeric_binop(self, other): | |
|
||
return result | ||
|
||
except (ValueError, TypeError, AttributeError): | ||
pass | ||
|
||
# convert to Int64Index ops | ||
if isinstance(self, RangeIndex): | ||
self = self.values | ||
if isinstance(other, RangeIndex): | ||
other = other.values | ||
|
||
with np.errstate(all='ignore'): | ||
results = op(self, other) | ||
return Index(results, **attrs) | ||
except (ValueError, TypeError, AttributeError, | ||
ZeroDivisionError): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when does this raise ZeroDivisionError? isn't he point of the errstate to catch that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
ATM
I think errstate is to suppress warnings, but not really sure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, that seems wrong, but I guess can cover later |
||
# Defer to Int64Index implementation | ||
if reversed: | ||
return op(other, self._int64index) | ||
return op(self._int64index, other) | ||
|
||
return _evaluate_numeric_binop | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
""" | ||
Routines for filling missing data | ||
""" | ||
import operator | ||
|
||
import numpy as np | ||
from distutils.version import LooseVersion | ||
|
@@ -645,6 +646,87 @@ def fill_zeros(result, x, y, name, fill): | |
return result | ||
|
||
|
||
def mask_zero_div_zero(x, y, result, copy=False): | ||
""" | ||
Set results of 0 / 0 or 0 // 0 to np.nan, regardless of the dtypes | ||
of the numerator or the denominator. | ||
|
||
Parameters | ||
---------- | ||
x : ndarray | ||
y : ndarray | ||
result : ndarray | ||
copy : bool (default False) | ||
Whether to always create a new array or try to fill in the existing | ||
array if possible. | ||
|
||
Returns | ||
------- | ||
filled_result : ndarray | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the semantics you have here are not explained well. you are mutating result in-place, then returning it. is that intentional? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll flesh it out. The semantics were meant to follow fill_zeros as closely as possible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok some more expl would be helpful. filling result is ok, you just have to make this really clear. |
||
|
||
Examples | ||
-------- | ||
>>> x = np.array([1, 0, -1], dtype=np.int64) | ||
>>> y = 0 # int 0; numpy behavior is different with float | ||
>>> result = x / y | ||
>>> result # raw numpy result does not fill division by zero | ||
array([0, 0, 0]) | ||
>>> mask_zero_div_zero(x, y, result) | ||
array([ inf, nan, -inf]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pls add to the list to consolidate this module (e.g. the existing fill_zeros) |
||
""" | ||
if is_scalar(y): | ||
y = np.array(y) | ||
|
||
zmask = y == 0 | ||
if zmask.any(): | ||
shape = result.shape | ||
|
||
nan_mask = (zmask & (x == 0)).ravel() | ||
neginf_mask = (zmask & (x < 0)).ravel() | ||
posinf_mask = (zmask & (x > 0)).ravel() | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add some comments here |
||
if nan_mask.any() or neginf_mask.any() or posinf_mask.any(): | ||
# Fill negative/0 with -inf, positive/0 with +inf, 0/0 with NaN | ||
result = result.astype('float64', copy=copy).ravel() | ||
|
||
np.putmask(result, nan_mask, np.nan) | ||
np.putmask(result, posinf_mask, np.inf) | ||
np.putmask(result, neginf_mask, -np.inf) | ||
|
||
result = result.reshape(shape) | ||
|
||
return result | ||
|
||
|
||
def dispatch_missing(op, left, right, result): | ||
""" | ||
Fill nulls caused by division by zero, casting to a diffferent dtype | ||
if necessary. | ||
|
||
Parameters | ||
---------- | ||
op : function (operator.add, operator.div, ...) | ||
left : object (Index for non-reversed ops) | ||
right : object (Index fof reversed ops) | ||
result : ndarray | ||
|
||
Returns | ||
------- | ||
result : ndarray | ||
""" | ||
opstr = '__{opname}__'.format(opname=op.__name__).replace('____', '__') | ||
if op in [operator.truediv, operator.floordiv, | ||
getattr(operator, 'div', None)]: | ||
result = mask_zero_div_zero(left, right, result) | ||
elif op is operator.mod: | ||
result = fill_zeros(result, left, right, opstr, np.nan) | ||
elif op is divmod: | ||
res0 = mask_zero_div_zero(left, right, result[0]) | ||
res1 = fill_zeros(result[1], left, right, opstr, np.nan) | ||
result = (res0, res1) | ||
return result | ||
|
||
|
||
def _interp_limit(invalid, fw_limit, bw_limit): | ||
"""Get idx of values that won't be filled b/c they exceed the limits. | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
import pytest | ||
|
||
from datetime import datetime | ||
from pandas.compat import range, PY3 | ||
from pandas.compat import range, PY3, long | ||
|
||
import numpy as np | ||
|
||
|
@@ -18,6 +18,16 @@ | |
from pandas.tests.indexes.common import Base | ||
|
||
|
||
# For testing division by (or of) zero for Series with length 5, this | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these should be fixtures There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe you misunderstand what I want here
then remove the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See previous comment:
AFAICT the problem is caused by the fixture being in the namespace for test_numeric but not for test_range, which has a class that inherits from the test_numeric.Numeric (which defines the relevant tests). I guess we could get around that by putting this in a shared conftest file, but all that accomplishes is making it harder for a reader to find out where things are defined. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so move it to the conftest, this is quite standard practice and avoids code duplication. |
||
# gives several scalar-zeros and length-5 vector-zeros | ||
zeros = [box([0] * 5, dtype=dtype) | ||
for box in [pd.Index, np.array] | ||
for dtype in [np.int64, np.uint64, np.float64]] | ||
zeros.extend([np.array(0, dtype=dtype) | ||
for dtype in [np.int64, np.uint64, np.float64]]) | ||
zeros.extend([0, 0.0, long(0)]) | ||
|
||
|
||
def full_like(array, value): | ||
"""Compatibility for numpy<1.8.0 | ||
""" | ||
|
@@ -157,6 +167,52 @@ def test_divmod_series(self): | |
for r, e in zip(result, expected): | ||
tm.assert_series_equal(r, e) | ||
|
||
@pytest.mark.parametrize('zero', zeros) | ||
def test_div_zero(self, zero): | ||
idx = self.create_index() | ||
|
||
expected = Index([np.nan, np.inf, np.inf, np.inf, np.inf], | ||
dtype=np.float64) | ||
result = idx / zero | ||
tm.assert_index_equal(result, expected) | ||
ser_compat = Series(idx).astype('i8') / np.array(zero).astype('i8') | ||
tm.assert_series_equal(ser_compat, Series(result)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a "check that this matches the Series behavior" check. Note that because the Series implementation does not yet get this right for all dtypes, we are casting the Series version to int64 and checking that it matches the one version that does consistently work. Once the Series methods are fixed, this casting can be removed. |
||
|
||
@pytest.mark.parametrize('zero', zeros) | ||
def test_floordiv_zero(self, zero): | ||
idx = self.create_index() | ||
expected = Index([np.nan, np.inf, np.inf, np.inf, np.inf], | ||
dtype=np.float64) | ||
|
||
result = idx // zero | ||
tm.assert_index_equal(result, expected) | ||
ser_compat = Series(idx).astype('i8') // np.array(zero).astype('i8') | ||
tm.assert_series_equal(ser_compat, Series(result)) | ||
|
||
@pytest.mark.parametrize('zero', zeros) | ||
def test_mod_zero(self, zero): | ||
idx = self.create_index() | ||
|
||
expected = Index([np.nan, np.nan, np.nan, np.nan, np.nan], | ||
dtype=np.float64) | ||
result = idx % zero | ||
tm.assert_index_equal(result, expected) | ||
ser_compat = Series(idx).astype('i8') % np.array(zero).astype('i8') | ||
tm.assert_series_equal(ser_compat, Series(result)) | ||
|
||
@pytest.mark.parametrize('zero', zeros) | ||
def test_divmod_zero(self, zero): | ||
idx = self.create_index() | ||
|
||
exleft = Index([np.nan, np.inf, np.inf, np.inf, np.inf], | ||
dtype=np.float64) | ||
exright = Index([np.nan, np.nan, np.nan, np.nan, np.nan], | ||
dtype=np.float64) | ||
|
||
result = divmod(idx, zero) | ||
tm.assert_index_equal(result[0], exleft) | ||
tm.assert_index_equal(result[1], exright) | ||
|
||
def test_explicit_conversions(self): | ||
|
||
# GH 8608 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current should always be ipython, so the code executes.