-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN refactor rest of core #37586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN refactor rest of core #37586
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one comment otherwise lgtm
pandas/core/generic.py
Outdated
else: | ||
object.__setattr__(self, name, value) | ||
self[name] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you check the logic that was changed here (IIRC this is for performance but the change might be good)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this change on L5509. pls revert and can do in a separate PR when checking perf. This causes recursion (which might be fine, though need to confirm)
pandas/core/algorithms.py
Outdated
is_object_dtype(dtype) | ||
or not is_object_dtype(dtype) | ||
and is_object_dtype(values) | ||
and dtype is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put parens where appropriate to make this more obvious
pandas/core/base.py
Outdated
@@ -1244,8 +1242,7 @@ def searchsorted(self, value, side="left", sorter=None) -> np.ndarray: | |||
|
|||
def drop_duplicates(self, keep="first"): | |||
duplicated = self.duplicated(keep=keep) | |||
result = self[np.logical_not(duplicated)] | |||
return result | |||
return self[np.logical_not(duplicated)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id prefer ~duplicated
over logical_not
pandas/core/generic.py
Outdated
left = left.copy() | ||
right = right.copy() | ||
left.index = join_index | ||
right.index = join_index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
identical(?) to chunk above, could make a helper. also im pretty sure mypy is going to complain about Index not having a tz attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel do you know when this code is meant to be reached? Running
pytest pandas/tests/generic/
doesn't cover it. I couldn't get here with test_fast.sh
either, even though codecov shows it as covered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like test_alignment_doesnt_change_tz
from #36503 hits the other usage of this pattern, might not have gotten both
pandas/core/nanops.py
Outdated
@@ -185,14 +182,11 @@ def _get_fill_value( | |||
else: | |||
return -np.inf | |||
else: | |||
if fill_value_typ is None: | |||
return iNaT | |||
if fill_value_typ is not None and fill_value_typ == "+inf": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think can just do the str comparison?
pandas/core/nanops.py
Outdated
if values.ndim == 1: | ||
return fill_value | ||
elif axis is None: | ||
if values.ndim == 1 or axis is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is a case where its useful to see we get coverage of both cases
pandas/core/sorting.py
Outdated
@@ -218,7 +218,7 @@ def decons_group_index(comp_labels, shape): | |||
x = comp_labels | |||
for i in reversed(range(len(shape))): | |||
labels = (x - y) % (factor * shape[i]) // factor | |||
np.putmask(labels, comp_labels < 0, -1) | |||
np.putmask(labels, x < 0, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets just do labels[x < 0] = -1
, putmask has different semantics when the last arg is a listlike but thats not relevant here
can you merge master ping on green |
I still need to address #37586 (comment) , will ping when it's ready |
moving off 1.2 until fixed up |
Sure, no problem, though I believe I've responded to the review comments, I think it should be ready now |
@MarcoGorelli can you rebase and see if you can get passing |
ok this looks fine, but a ci /checks is failing. ping on greenish |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment + merge master. master is moving, so ping on green when ready.
pandas/core/algorithms.py
Outdated
@@ -114,11 +114,8 @@ def _ensure_data( | |||
values = extract_array(values, extract_numpy=True) | |||
|
|||
# we check some simple dtypes first | |||
if is_object_dtype(dtype): | |||
if is_object_dtype(dtype) or (is_object_dtype(values) and (dtype is None)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove the parens around dtype is None as confusing (and actually put around the entire expression and format)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 changes to revert. these types of PRs are very hard to review. in the future pls split out the non-controversial changes otherwise these linger forever.
ping on green when reverted
pandas/core/generic.py
Outdated
else: | ||
object.__setattr__(self, name, value) | ||
self[name] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this change on L5509. pls revert and can do in a separate PR when checking perf. This causes recursion (which might be fine, though need to confirm)
pandas/core/generic.py
Outdated
@@ -11106,6 +11091,22 @@ def last_valid_index(self): | |||
return self._find_valid_index("last") | |||
|
|||
|
|||
def _set_join_index( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be here but in pandas/core/reshape/merge.py or appropriate
thanks @MarcoGorelli this is a nice cleanup. in the future pls do these types of changes in smaller batches. |
Some refactorings found by Sourcery https://sourcery.ai/
I've removed the ones of the kind