Skip to content

Update join docs for other param #46850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jul 21, 2022
Merged

Conversation

multimeric
Copy link
Contributor

@multimeric multimeric commented Apr 23, 2022

Currently the DataFrame.join docs imply that the type of other is DataFrame | Series | list[DataFrame], but this is not correct, and list[Series | DataFrame] is a more appropriate type.

The type signature and also the method description have been updated to amend this.

  • closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Update join docs regarding using multiple Series
@multimeric
Copy link
Contributor Author

No idea why this is failing 🤷

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Are iterables of Series tested? If not, can you add a test for them.

@rhshadrach rhshadrach added Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 25, 2022
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label May 26, 2022
@rhshadrach
Copy link
Member

@multimeric - are you interested in continuing this?

@multimeric
Copy link
Contributor Author

Hmm, I'm a bit stuck. I can't work out how to get mypy to understand it's working with a list of Series | DataFrame.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you post the mypy issues you're seeing?

@multimeric
Copy link
Contributor Author

multimeric commented Jun 12, 2022

pandas/core/frame.py:9586: error: Argument 1 to "list" has incompatible type "Union[DataFrame, Series, Iterable[Union[DataFrame, Series]]]"; expected "Iterable[DataFrame]" [arg-type]

@multimeric
Copy link
Contributor Author

It's not clear to me what type signature is causing this to fail. Even if I type cast it I'm still getting the same error.

@rhshadrach
Copy link
Member

I think the line should be:

frames = [cast("DataFrame | Series", self)] + list(cast("Iterable[DataFrame | Series]", other))
  • You need quotes because otherwise the Python tries to actually use the | operator. In other words, the first argument of cast needs to be a valid Python object.
  • You need to cast [self] as well; mypy will object to combining lists of different types.

@multimeric
Copy link
Contributor Author

Thanks, that fixed it. Indeed I didn't try casting both parts of the concatenation, partly because I would have thought mypy could have narrowed the type of self at that point to other to Iterable because of the if statements. Ah well!

@datapythonista
Copy link
Member

@multimeric can you merge master into your branch, seems like the problem in the CI is unrelated.

@multimeric
Copy link
Contributor Author

I actually did that just before. It looks like the current failure is just a timeout? Happy to pull again though.

@multimeric
Copy link
Contributor Author

No, there is indeed a failure here. But it seems unrelated entirely.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad it's working - two requests on the test.

cc @Dr-Irv for the change to the type-hints. Opened #29; plan to put up a PR there once this is merged.

@@ -9582,7 +9582,7 @@ def join(

Parameters
----------
other : DataFrame, Series, or list of DataFrame
other : DataFrame, Series, or list of either of these
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should say "or list of DataFrames, Series, or both". "Either of these" implies that it has to be all one type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with "or a list containing any combination of DataFrames and Series", if that's okay

Copy link
Contributor

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small thing to comment on cast

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jul 18, 2022
@datapythonista
Copy link
Member

@multimeric do you have time to add the requested comment? And ir'd also be good if you can merge master into your branch

@rhshadrach I think your comments were addressed, can you check if this is ready from your side?

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -9805,7 +9805,8 @@ def join(

Parameters
----------
other : DataFrame, Series, or list of DataFrame
other : DataFrame, Series, or a list containing any combination of DataFrames
and Series.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code checks build is failing with

Error: /home/runner/work/pandas/pandas/pandas/core/frame.py:9789:PR08:pandas.DataFrame.join:Parameter "other" description should start with a capital letter

I think it's confused by the period here; hopefully removing it should resolve.

@rhshadrach rhshadrach removed the Stale label Jul 18, 2022
@rhshadrach rhshadrach added this to the 1.5 milestone Jul 18, 2022
Comment on lines 9808 to 9809
other : DataFrame, Series, or a list containing any combination of DataFrames
and Series
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it now; the linter is complaining about the "and" being lowercase as if it's the start the description. I'd hazard a guess that sphinx does too. I think we should make the type more consise:

DataFrame, Series, or a list of any combination of these

@mroeschke mroeschke merged commit 0ce2921 into pandas-dev:main Jul 21, 2022
@mroeschke
Copy link
Member

Thanks @multimeric for your persistence and all for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants