-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Continue de-nesting core.ops #19448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continue de-nesting core.ops #19448
Conversation
Codecov Report
@@ Coverage Diff @@
## master #19448 +/- ##
==========================================
+ Coverage 91.62% 91.62% +<.01%
==========================================
Files 150 150
Lines 48728 48732 +4
==========================================
+ Hits 44645 44650 +5
+ Misses 4083 4082 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not clear what you moved and what you added. need tests for new paths.
pandas/core/ops.py
Outdated
for name, method in new_methods.items(): | ||
# inplace SparseArray methods do not get overriden; everything else | ||
# does | ||
force = not (issubclass(cls, np.ndarray) and name.startswith('__i')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ABCSparseArray (may need to disentagle a bit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you fix this up / make comments more clear here
@@ -658,6 +649,10 @@ def wrapper(left, right, name=name, na_op=na_op): | |||
index=left.index, name=res_name, | |||
dtype=result.dtype) | |||
|
|||
elif is_categorical_dtype(left): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests that hit this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the request for comments in the code for what tests hit this path? Or confirmation here that such tests exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests.categorical.test_operators.TestCategoricalOps.test_numeric_like_ops hits this path. The change here is catching this case early (and explicitly) instead of in the else:
clause within the na_op above (see point where this PR changes elif isinstance(x, np.ndarray)
to else: assert isinstance(x, np.ndarray)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my requests are about things that are being added. e.g. if its just refactoing then existing tests are ok, but some things look like they are catching additional cases, so should have tests for these. a coverage analysis can test you (compare before and after)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if its just refactoing then existing tests are ok
This is very nearly pure refactoring to catch things earlier, more explicitly, and with fewer levels of indentation. I'll go through the diff and annotate any exceptions to this rule. The is_categorical_dtype
check on 652 replaces the else
previously on 621 (i.e. not an exception to the rule).
result = _comp_method_OBJECT_ARRAY(op, x, y) | ||
|
||
elif is_datetimelike_v_numeric(x, y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests that hit this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests.series.test_operators.TestSeriesComparisons.test_comparison_invalid. This is just moved up a few lines and down one level of indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
elif is_datetimelike_v_numeric(x, y): | ||
raise TypeError("invalid type comparison") | ||
|
||
else: | ||
|
||
# we want to compare like types | ||
# we only want to convert to integer like if | ||
# we are not NotImplemented, otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment no longer relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment may be more verbose than it needs to be, but is still relevant. Once #19301 goes in we can dispatch to DTI and TDI and the needs_i8_conversion block below can be simplified quite a bit.
elif isinstance(other, (np.ndarray, pd.Index)): | ||
# do not check length of zerodim array | ||
# as it will broadcast | ||
if (not is_scalar(lib.item_from_zerodim(other)) and | ||
len(self) != len(other)): | ||
raise ValueError('Lengths must match to compare') | ||
|
||
if isinstance(other, ABCPeriodIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test that hit this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests.series.test_operators.TestSeriesComparisons.test_nat_comparisons (2 parametrized cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think removing this check is the main piece of code actually removed instead of just rearranged. The comment mentions a temp workaround and it is no longer necessary.
"'series <op> np.asarray(other)'.") | ||
raise TypeError(msg.format(op=op, typ=self.dtype)) | ||
elif (isinstance(other, pd.Categorical) and | ||
not is_categorical_dtype(self)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests that hit this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests.categorical.test_operators.TestCategoricalOpsWithFactor.test_comparisons
.format(op=op, typ=self.dtype)) | ||
|
||
elif is_scalar(other) and isna(other): | ||
# numpy does not like comparisons vs None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test that hit this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests.series.test_operators.TestSeriesComparisons.test_nat_comparisons_scalar, tests.series.test_operators.TestSeriesComparisons.test_more_na_comparisons, tests.series.test_arithmetic.TestTimestampSeriesComparison.test_timestamp_equality, tests.series.test_arithmetic.TestTimestampSeriesComparison.test_timestamp_compare_series
total of 10 cases between these with parametrization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is moved down from na_op
)
if yrav.shape != mask.shape: | ||
yrav = np.empty(mask.shape, dtype=yrav.dtype) | ||
yrav.fill(yrav.item()) | ||
# FIXME: GH#5284, GH#5035, GH#19448 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests that hit this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #19421. Only one test hits this, and only in py3:
s = Series([2, 3, 4, 5, 6, 7, 8, 9, datetime(2005, 1, 1)])
s[::2] = np.nan
d = DataFrame({'A': s})
with pytest.raises(ValueError):
d.__and__(s, axis='columns')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a change and not just a refactor. As per the comments, it is horribly broken, and better to raise intentionally than accidentally have the yrav.item()
raise a ValueError.
else: | ||
raise TypeError("{typ} cannot perform the operation " | ||
"{op}".format(typ=type(x).__name__, | ||
op=str_rep)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cases currently caught by this else: raise TypeError
are after this PR caught in the is_categorical_dtype
check on 652.
if name == '__ne__': | ||
return np.ones(len(x), dtype=bool) | ||
else: | ||
return np.zeros(len(x), dtype=bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_datetimelike_v_numeric and is_scalar cases are moved to earlier in the checking process, not removed.
elif isinstance(other, ABCDataFrame): # pragma: no cover | ||
return NotImplemented | ||
res_values = na_op(self.values, other.values) | ||
return self._constructor(res_values, index=self.index, name=name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything in this part of the diff (788-799) is pure refactor, moving the ABCDataFrame check to the top spot for consistency
|
||
# let null fall thru | ||
if not isna(y): | ||
y = bool(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just moving the not isna(y)
outside of the try/except block to be more specific about whats being tried.
|
||
res_values = na_op(self.values, other) | ||
unfilled = self._constructor(res_values, index=self.index) | ||
return filler(unfilled).__finalize__(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the edits to this function (895-918) are cleanup/refactor.
result = np.empty(x.size, dtype=x.dtype) | ||
mask = notna(xrav) | ||
xrav = xrav[mask] | ||
if np.prod(xrav.shape): | ||
if xrav.size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using xrav.size
instead of np.prod(xrav.shape)
here and above is to move towards joining this case with the case above it. This masking logic is done almost identically in Series/DataFrame/Panel methods and one of the next steps will be to de-duplicate these.
with np.errstate(all='ignore'): | ||
result[mask] = op(xrav, yrav) | ||
elif hasattr(x, 'size'): | ||
|
||
elif isinstance(x, np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isinstance(x, np.ndarray)
instead of hasattr(x, 'size')
to be more explicit. The other case that otherwise gets to this point in tests is Categorical, but that raises shortly after this anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. indicated where some added comments.
pandas/core/ops.py
Outdated
for name, method in new_methods.items(): | ||
# inplace SparseArray methods do not get overriden; everything else | ||
# does | ||
force = not (issubclass(cls, np.ndarray) and name.startswith('__i')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you fix this up / make comments more clear here
@@ -795,39 +785,43 @@ def wrapper(self, other, axis=None): | |||
if axis is not None: | |||
self._get_axis_number(axis) | |||
|
|||
if isinstance(other, ABCSeries): | |||
if isinstance(other, ABCDataFrame): # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment here about early failing
"of dtype {typ}.\nIf you want to compare values, use " | ||
"'series <op> np.asarray(other)'.") | ||
raise TypeError(msg.format(op=op, typ=self.dtype)) | ||
elif (isinstance(other, pd.Categorical) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_categorical_dtype(other)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is catching pd.Categorical specifically (as opposed to CategoricalIndex or Series[Categorical])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's totally not obvious, is this tested or needed? that seems oddly specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the two branches preceeding this handle ABCSeries and pd.Index cases, and the (existing) error message specifically refers to a Categorical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, pls revist at some point, this should prob be less specific and more about a non-ndarray like (e.g. an ExtensionArray check)
@@ -899,26 +892,30 @@ def wrapper(self, other): | |||
|
|||
self, other = _align_method_SERIES(self, other, align_asobject=True) | |||
|
|||
if isinstance(other, ABCSeries): | |||
if isinstance(other, ABCDataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment here
with np.errstate(all='ignore'): | ||
result[mask] = op(xrav, yrav) | ||
elif hasattr(x, 'size'): | ||
|
||
elif isinstance(x, np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here
"of dtype {typ}.\nIf you want to compare values, use " | ||
"'series <op> np.asarray(other)'.") | ||
raise TypeError(msg.format(op=op, typ=self.dtype)) | ||
elif (isinstance(other, pd.Categorical) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, pls revist at some point, this should prob be less specific and more about a non-ndarray like (e.g. an ExtensionArray check)
thanks! |
Move
isinstance(other, ABCDataFrame)
checks to consistently be the first thing checked in Series opsRemove
force
kwarg, define it in the one place it is used.Remove kludge for
PeriodIndex
Handle categorical_dtype earlier in arith_method_SERIES, decreasing complexity of the closure.
Handle scalar na other earlier in _comp_method_SERIES, decreasing complexity of the closure.
Remove broken broadcasting case from _arith_method_FRAME (closes ops._arith_method_FRAME typo? #19421)