-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REGR: fix error caused by deprecation warning in pd.merge when joining two Series #47946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you post the current warning?
@phofl There's no warning. An exception occurs while processing the warning, and I've included the message in my original post. The reason for the exception is that |
Sorry, did not understand you then. could you please add a test? |
And a whatsnew in 1.4.4, this is a regression |
@phofl No worries. I'll add one in a few hours. |
@nsarang would you have time to add a test ? |
@jorisvandenbossche Forgot about this one! |
@@ -2167,7 +2169,7 @@ def test_merge_series(on, left_on, right_on, left_index, right_index, nm): | |||
name=nm, | |||
) | |||
expected = DataFrame( | |||
{"A": [2, 4], "B": [1, 3]}, | |||
{"A": [2, 4], nm: [1, 3]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a dedicated test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, looks good now. Could you add a whatsnew to v1.4.4,rst?
Sure |
@phofl I can't reference the PR in the release notes since its path is not added to |
It is sufficient if you add something like to https://github.com/pandas-dev/pandas/blob/main/doc/source/whatsnew/v1.4.4.rst You don't have to do anything else. Maybe you did not merge main in a while? |
Fine by me. Just wanted to make sure it's okay that the release note is not having a reference.
I don't follow. Can you explain? |
I pushed a small patch to ensure we catch the warning in the test (ideally, our tests don't cause warnings from pandas, if a warning is expected, it should be either asserted or explicitly ignored). However, doing that also made we wonder why we are actually raising a deprecation warning here. The case from the test:
So the original |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of my previous comment, this PR certainly already avoids the error, so this looks good to merge. Thanks @nsarang!
We wanted to get rid of value dependent behaviour. Your op looks "correct", but if "a" also has a midx, then the tuple name is converted into a MultiIndex itself:
And the more complicated case:
raises: If the DataFrame has more levels than our tuple name has elements, we flatten the DataFrame MultiIndex columns to a tuple too:
|
thx @nsarang |
This comment was marked as resolved.
This comment was marked as resolved.
@@ -19,6 +19,7 @@ Fixed regressions | |||
- Fixed regression in :meth:`DataFrame.loc` not updating the cache correctly after values were set (:issue:`47867`) | |||
- Fixed regression in :meth:`DataFrame.loc` not aligning index in some cases when setting a :class:`DataFrame` (:issue:`47578`) | |||
- Fixed regression in setting ``None`` or non-string value into a ``string``-dtype Series using a mask (:issue:`47628`) | |||
- Fixed regression in :func:`merge` throwing an error when passing a :class:`Series` with a multi-level name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there is no issue then the PR number should be added as the issue instead
(imo that should never happen though as an issue should always be raised to document the issue. there are many reasons to do this, help users find it, avoid duplicate issues, allow proper triage and give others an opportunity for discussion, and keep these out of the PR discussion as appears to have happen here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and also sometimes PRs go stale so the discussion in those cases is effectively lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks for pointing out Simon!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the sentiment. The only reason that I didn't create an issue in the first place was that I thought the bug happened because of a typo and that this PR should be straightforward. Aside from this, I want to point out two things:
- Now that you've decided to reference the PR instead of the issue, you may want to add the base path of Github PRs to
sphinx.ext.extlinks
. From what I'm seeing, only the base path of issues is provided in the configuration file. - FYI, I ran into this bug when I was applying a complex function to a Series after a multi-level GroupBy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Now that you've decided to reference the PR instead of the issue, you may want to add the base path of Github PRs to
sphinx.ext.extlinks
. From what I'm seeing, only the base path of issues is provided in the configuration file.
GitHub itself handles those links fine. For example the link for this PR (#47946), also works (redirects) if used with "issues": #47946
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh I see. Thanks @jorisvandenbossche !
…g two Series (pandas-dev#47946) * Fix bug in warning * add test * add note to whatsnew * Add gh ref * catch warning in the test Co-authored-by: Patrick Hoefler <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]>
and to be clear, this is a regression from 1.0.5? |
I've added the deprecation in the 1.3 series, which causes the error |
the code sample (test) was raising an exception before that ~/miniconda3/envs/pandas-1.1.5/lib/python3.9/site-packages/pandas/core/reshape/merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
639 "merging between different levels can give an unintended "
640 f"result ({left.columns.nlevels} levels on the left,"
--> 641 f"{right.columns.nlevels} on the right)"
642 )
643 warnings.warn(msg, UserWarning)
~/miniconda3/envs/pandas-1.1.5/lib/python3.9/site-packages/pandas/core/generic.py in __getattr__(self, name)
5139 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5140 return self[name]
-> 5141 return object.__getattribute__(self, name)
5142
5143 def __setattr__(self, name: str, value) -> None:
AttributeError: 'Series' object has no attribute 'columns' |
Ok this is weird. I remember testing this and I thought I installed 1.2.5 for testing purposes. But can not reproduce now. Should we avoid backporting then? |
the fix seems low risk so (imo) ok with the backport. however, if there is an outstanding issue about whether we should be raising a warning here #47946 (comment), then should open an issue about that to ensure it is not forgotten. Whether any further changes get backported would be part of that discussion. |
Ok, got it. I think the warning is oldish, the current behaviour is definitely buggy (#47946 (comment)), so in principal I am ok with he warning, but we could also avoid treating tuples as MultiIndexes. Disallowing this would avoid any potential value dependent bugs |
…pd.merge when joining two Series (#48154) Co-authored-by: Patrick Hoefler <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Nima Sarang <[email protected]>
Yes, indeed with the current behaviour in other cases, the warning seems correct. |
…g two Series (pandas-dev#47946) * Fix bug in warning * add test * add note to whatsnew * Add gh ref * catch warning in the test Co-authored-by: Patrick Hoefler <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]>
There is a small bug/typo in the warning of message of
pd.merge
when merging twoSeries
of different levels which results in an exception.Thrown error message:
AttributeError: 'Series' object has no attribute 'columns'