-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYP: Made ffill_indexer arg-type ndarray #44893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/reshape/merge.py
Outdated
right_join_indexer = libjoin.ffill_indexer( | ||
right_indexer # type: ignore[arg-type] | ||
) | ||
left_join_indexer = libjoin.ffill_indexer(np.asarray(left_indexer)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid calling asarray. This adds unnecessary overhead and does not guarantee that we get an array back if None is passed into it.
You can use assert if it is guaranteed, that left_indexer is not None here. Adjusting the return type of _get_join_info would be even better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phofl Hi Patrick. I see what you mean about the overhead of np.asarray()
. None seems to come from L957 and L962 in _get_join_info
which comes from return value of _left_join_on_index
L2071. Not sure what a good alternative np.ndarray alternative to None would be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reason about the code s.t. we know we can never get here with left_indexer being None? if so, then can do a left_indexer = cast(np.ndarray, left_indexer)
. otherwise
if left_indexer is None:
raise AppropriateException("Useful message")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel thank you for the suggestion. I was unsure if it was okay to use typing.cast()
Do these changes look okay?
thanks @nickleus27 |
right_join_indexer = libjoin.ffill_indexer( | ||
right_indexer # type: ignore[arg-type] | ||
if left_indexer is None: | ||
raise TypeError("left_indexer cannot be None") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this ever reached? is it because the user passed something invalid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel I was wondering the same thing so I could write a better error message. Tbh I am not sure. I need to spend some more time to see how we get to this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. this probably shouldn't have been merged until this was sorted out. NBD bc it isn't actively harmful here, but id appreciate it if you'd follow up on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel I am having a hard time finding how to get the exception to be thrown. I am working on it, but it seems it will take me a little time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that this exception situation only happens when the fill_method=='ffill'
.
Is the fill_method argument only in pandas.merge_ordered()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the fill_method argument only in pandas.merge_ordered()?
looks like it, yah. i dont know this part of the code particularly well so im just checking docstrings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel i do not think we can ever get to the TyperError thrown on L1651 because the left_indexer
is only None
when L956 has self.right_index
as True
. But, pandas.merge_ordered()
does not have a right_index
argument. I might be misunderstanding what is happening but I do not understand how you can set the fill_method='ffill'
that i think is only an argument for merge_ordered
but it has no right_index
argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right. In this case, the check should go right after the _get_join_info call since i dont think it depends on anything after it (double-check me on this pls)
If im looking at this correctly, i think we can also rule out the cases where either left_join_indexer or right_join_indexer is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is possible for left_indexer and right_indexer to be None, but not when fill_method == "ffill".
xref #37715