Skip to content

Continue de-nesting core.ops #19448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 2, 2018
Merged

Conversation

jbrockmendel
Copy link
Member

  • Move isinstance(other, ABCDataFrame) checks to consistently be the first thing checked in Series ops

  • Remove force kwarg, define it in the one place it is used.

  • Remove kludge for PeriodIndex

  • Handle categorical_dtype earlier in arith_method_SERIES, decreasing complexity of the closure.

  • Handle scalar na other earlier in _comp_method_SERIES, decreasing complexity of the closure.

  • Remove broken broadcasting case from _arith_method_FRAME (closes ops._arith_method_FRAME typo? #19421)

@jbrockmendel
Copy link
Member Author

Just had to revert the removal of an ugly case, longstanding bug #5284, #5035.

@codecov
Copy link

codecov bot commented Jan 30, 2018

Codecov Report

Merging #19448 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19448      +/-   ##
==========================================
+ Coverage   91.62%   91.62%   +<.01%     
==========================================
  Files         150      150              
  Lines       48728    48732       +4     
==========================================
+ Hits        44645    44650       +5     
+ Misses       4083     4082       -1
Flag Coverage Δ
#multiple 89.99% <100%> (ø) ⬆️
#single 41.75% <48.64%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/sparse/series.py 95.26% <ø> (ø) ⬆️
pandas/core/ops.py 95.74% <100%> (+0.21%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca4ae4f...399fcd5. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear what you moved and what you added. need tests for new paths.

for name, method in new_methods.items():
# inplace SparseArray methods do not get overriden; everything else
# does
force = not (issubclass(cls, np.ndarray) and name.startswith('__i'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ABCSparseArray (may need to disentagle a bit)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you fix this up / make comments more clear here

@@ -658,6 +649,10 @@ def wrapper(left, right, name=name, na_op=na_op):
index=left.index, name=res_name,
dtype=result.dtype)

elif is_categorical_dtype(left):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests that hit this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the request for comments in the code for what tests hit this path? Or confirmation here that such tests exist?

Copy link
Member Author

@jbrockmendel jbrockmendel Jan 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests.categorical.test_operators.TestCategoricalOps.test_numeric_like_ops hits this path. The change here is catching this case early (and explicitly) instead of in the else: clause within the na_op above (see point where this PR changes elif isinstance(x, np.ndarray) to else: assert isinstance(x, np.ndarray)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my requests are about things that are being added. e.g. if its just refactoing then existing tests are ok, but some things look like they are catching additional cases, so should have tests for these. a coverage analysis can test you (compare before and after)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its just refactoing then existing tests are ok

This is very nearly pure refactoring to catch things earlier, more explicitly, and with fewer levels of indentation. I'll go through the diff and annotate any exceptions to this rule. The is_categorical_dtype check on 652 replaces the else previously on 621 (i.e. not an exception to the rule).

result = _comp_method_OBJECT_ARRAY(op, x, y)

elif is_datetimelike_v_numeric(x, y):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests that hit this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests.series.test_operators.TestSeriesComparisons.test_comparison_invalid. This is just moved up a few lines and down one level of indentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


elif is_datetimelike_v_numeric(x, y):
raise TypeError("invalid type comparison")

else:

# we want to compare like types
# we only want to convert to integer like if
# we are not NotImplemented, otherwise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment no longer relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment may be more verbose than it needs to be, but is still relevant. Once #19301 goes in we can dispatch to DTI and TDI and the needs_i8_conversion block below can be simplified quite a bit.

elif isinstance(other, (np.ndarray, pd.Index)):
# do not check length of zerodim array
# as it will broadcast
if (not is_scalar(lib.item_from_zerodim(other)) and
len(self) != len(other)):
raise ValueError('Lengths must match to compare')

if isinstance(other, ABCPeriodIndex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test that hit this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests.series.test_operators.TestSeriesComparisons.test_nat_comparisons (2 parametrized cases)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing this check is the main piece of code actually removed instead of just rearranged. The comment mentions a temp workaround and it is no longer necessary.

"'series <op> np.asarray(other)'.")
raise TypeError(msg.format(op=op, typ=self.dtype))
elif (isinstance(other, pd.Categorical) and
not is_categorical_dtype(self)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests that hit this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests.categorical.test_operators.TestCategoricalOpsWithFactor.test_comparisons

.format(op=op, typ=self.dtype))

elif is_scalar(other) and isna(other):
# numpy does not like comparisons vs None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test that hit this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests.series.test_operators.TestSeriesComparisons.test_nat_comparisons_scalar, tests.series.test_operators.TestSeriesComparisons.test_more_na_comparisons, tests.series.test_arithmetic.TestTimestampSeriesComparison.test_timestamp_equality, tests.series.test_arithmetic.TestTimestampSeriesComparison.test_timestamp_compare_series

total of 10 cases between these with parametrization

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is moved down from na_op)

if yrav.shape != mask.shape:
yrav = np.empty(mask.shape, dtype=yrav.dtype)
yrav.fill(yrav.item())
# FIXME: GH#5284, GH#5035, GH#19448
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests that hit this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #19421. Only one test hits this, and only in py3:

s = Series([2, 3, 4, 5, 6, 7, 8, 9, datetime(2005, 1, 1)])
s[::2] = np.nan
d = DataFrame({'A': s})

with pytest.raises(ValueError):
    d.__and__(s, axis='columns')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change and not just a refactor. As per the comments, it is horribly broken, and better to raise intentionally than accidentally have the yrav.item() raise a ValueError.

else:
raise TypeError("{typ} cannot perform the operation "
"{op}".format(typ=type(x).__name__,
op=str_rep))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cases currently caught by this else: raise TypeError are after this PR caught in the is_categorical_dtype check on 652.

if name == '__ne__':
return np.ones(len(x), dtype=bool)
else:
return np.zeros(len(x), dtype=bool)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_datetimelike_v_numeric and is_scalar cases are moved to earlier in the checking process, not removed.

elif isinstance(other, ABCDataFrame): # pragma: no cover
return NotImplemented
res_values = na_op(self.values, other.values)
return self._constructor(res_values, index=self.index, name=name)
Copy link
Member Author

@jbrockmendel jbrockmendel Jan 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything in this part of the diff (788-799) is pure refactor, moving the ABCDataFrame check to the top spot for consistency


# let null fall thru
if not isna(y):
y = bool(y)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just moving the not isna(y) outside of the try/except block to be more specific about whats being tried.


res_values = na_op(self.values, other)
unfilled = self._constructor(res_values, index=self.index)
return filler(unfilled).__finalize__(self)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the edits to this function (895-918) are cleanup/refactor.

result = np.empty(x.size, dtype=x.dtype)
mask = notna(xrav)
xrav = xrav[mask]
if np.prod(xrav.shape):
if xrav.size:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using xrav.size instead of np.prod(xrav.shape) here and above is to move towards joining this case with the case above it. This masking logic is done almost identically in Series/DataFrame/Panel methods and one of the next steps will be to de-duplicate these.

with np.errstate(all='ignore'):
result[mask] = op(xrav, yrav)
elif hasattr(x, 'size'):

elif isinstance(x, np.ndarray):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance(x, np.ndarray) instead of hasattr(x, 'size') to be more explicit. The other case that otherwise gets to this point in tests is Categorical, but that raises shortly after this anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. indicated where some added comments.

for name, method in new_methods.items():
# inplace SparseArray methods do not get overriden; everything else
# does
force = not (issubclass(cls, np.ndarray) and name.startswith('__i'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you fix this up / make comments more clear here

@@ -795,39 +785,43 @@ def wrapper(self, other, axis=None):
if axis is not None:
self._get_axis_number(axis)

if isinstance(other, ABCSeries):
if isinstance(other, ABCDataFrame): # pragma: no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here about early failing

"of dtype {typ}.\nIf you want to compare values, use "
"'series <op> np.asarray(other)'.")
raise TypeError(msg.format(op=op, typ=self.dtype))
elif (isinstance(other, pd.Categorical) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_categorical_dtype(other)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is catching pd.Categorical specifically (as opposed to CategoricalIndex or Series[Categorical])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's totally not obvious, is this tested or needed? that seems oddly specific

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the two branches preceeding this handle ABCSeries and pd.Index cases, and the (existing) error message specifically refers to a Categorical

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, pls revist at some point, this should prob be less specific and more about a non-ndarray like (e.g. an ExtensionArray check)

@@ -899,26 +892,30 @@ def wrapper(self, other):

self, other = _align_method_SERIES(self, other, align_asobject=True)

if isinstance(other, ABCSeries):
if isinstance(other, ABCDataFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment here

with np.errstate(all='ignore'):
result[mask] = op(xrav, yrav)
elif hasattr(x, 'size'):

elif isinstance(x, np.ndarray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here

@jreback jreback added the Clean label Jan 31, 2018
"of dtype {typ}.\nIf you want to compare values, use "
"'series <op> np.asarray(other)'.")
raise TypeError(msg.format(op=op, typ=self.dtype))
elif (isinstance(other, pd.Categorical) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, pls revist at some point, this should prob be less specific and more about a non-ndarray like (e.g. an ExtensionArray check)

@jreback jreback added this to the 0.23.0 milestone Feb 2, 2018
@jreback jreback merged commit 7db4bea into pandas-dev:master Feb 2, 2018
@jreback
Copy link
Contributor

jreback commented Feb 2, 2018

thanks!

@jbrockmendel jbrockmendel deleted the ops_kwargs5 branch February 4, 2018 16:43
harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ops._arith_method_FRAME typo?
2 participants