-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: annotation in reshape.merge #29490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jbrockmendel. minor comments otherwise lgtm.
rcodes, lcodes, shape = map(list, zip(*map(fkeys, index.levels, join_keys))) | ||
mapped = [fkeys(index.levels[n], join_keys[n]) for n in range(len(index.levels))] | ||
zipped = zip(*mapped) | ||
rcodes, lcodes, shape = [list(x) for x in zipped] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this cleaning or silencing mypy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was necessary to silence mypy. i also dont like the pattern that was used before (and am not wild about the zip
still here, open to suggestions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. fine with this. just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK with splitting this out but should probably keep mapped
as a generator; coercing to list may have some overhead
@@ -577,7 +581,7 @@ def __init__( | |||
self.indicator = indicator | |||
|
|||
if isinstance(self.indicator, str): | |||
self.indicator_name = self.indicator | |||
self.indicator_name = self.indicator # type: Optional[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use py3.6 syntax :)
my preference would be to declare outside the if else.
pandas/core/reshape/merge.py
Outdated
left, | ||
right, | ||
how="inner", | ||
left: FrameOrSeries, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't debugged to verify but we can mix and match objects here right? So something as follows:
>>> df = pd.DataFrame([[1]])
>>> ser = pd.Series([1], name="a")
>>> pd.merge(df, ser, left_index=True, right_index=True)
0 a
0 1 1
If that's the case then these actually should be Union[DataFrame, Series]
instead of FrameOrSeries
. Pretty tricky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, left and right can each be either DataFrame or Series, i.e. 4 valid combinations. Does FrameOrSeries
assume you're not mix-and-matching? will update
rcodes, lcodes, shape = map(list, zip(*map(fkeys, index.levels, join_keys))) | ||
mapped = [fkeys(index.levels[n], join_keys[n]) for n in range(len(index.levels))] | ||
zipped = zip(*mapped) | ||
rcodes, lcodes, shape = [list(x) for x in zipped] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK with splitting this out but should probably keep mapped
as a generator; coercing to list may have some overhead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel lgtm pending @WillAyd comment #29490 (comment)
pandas/core/reshape/merge.py
Outdated
left: FrameOrSeries, | ||
right: FrameOrSeries, | ||
left: "Union[Series, DataFrame]", | ||
right: "Union[Series, DataFrame]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs quotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Union[Series, DataFrame]" -> Union["Series", "DataFrame"]
I think is clearer. not sure if we have a preferred style here. @WillAyd ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that having the quotes inside the union would be nicer, but doing so gave flake8 complaints about Series being unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding # noqa: F401
to the imports inside the TYPE_CHECKING
block I think is acceptable and not uncommon in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
@WillAyd has your comment #29490 (comment) been addressed? |
pandas/core/reshape/merge.py
Outdated
""" | ||
|
||
_merge_type = "merge" | ||
|
||
indicator_name: Optional[str] | ||
left: "DataFrame" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for adding these definitions at the class level? Would prefer not to do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was a request to use py36 style annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these be done in the init? These are instance variables shouldn't need to expose as class variables now for annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess so, sure. is this the wrong use case for putting these a the class level? or is it a preference/policy thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these be done in the init? These are instance variables shouldn't need to expose as class variables now for annotations
since these are type declarations and not variable initialisations, I think that these are a noop at runtime and therefore not exposed as class variables.
>>> class foo():
... bar: str
...
>>> foo().bar
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'foo' object has no attribute 'bar'
>>> foo.bar
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'foo' has no attribute 'bar'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there consensus on the desired usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can be done in the init I still think better. Should keep things localized and not push to higher namespaces unless really necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
@simonjayhawkins Will has given the OK, back to you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment
self.right = self.orig_right = right | ||
_left = _validate_operand(left) | ||
_right = _validate_operand(right) | ||
self.left = self.orig_left = _validate_operand(_left) # type: "DataFrame" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think can update to 36 syntax here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually if you can do in a followon is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and remove the duplicated _validate_operand
that crept in in the final commit.
This will need multiple passes, trying to keep a moderately-sized diff.