Skip to content

DOC: Add multi-conditional example to .loc reference page. #53546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
sweisss opened this issue Jun 6, 2023 · 6 comments · Fixed by #53572
Closed
1 task done

DOC: Add multi-conditional example to .loc reference page. #53546

sweisss opened this issue Jun 6, 2023 · 6 comments · Fixed by #53572
Assignees
Labels
Docs Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@sweisss
Copy link
Contributor

sweisss commented Jun 6, 2023

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

Documentation problem

The .loc API reference contains examples for single conditional lookups but does not include examples of multi-conditional lookups.

Multiple conditional statements in .loc must be wrapped in parens ( ) and separated by a single & or |. This differs from typical python conditional statement syntax and can lead to confusion.

Suggested fix for documentation

Add an example of a multi-conditional lookup after the single conditional example and before the "Callable that returns a boolean Series" example.

>>> df.loc[(df['max_speed'] > 1) & (df['shield'] < 8)]
           max_speed       shield
viper              4            5

Also, add a note that restructuring a DataFrame into a MultiIndex object for lookup may yield better performance gains over using .loc with 3 or more conditionals. Prominently link to the MultiIndex user guide in this note and mention that usages of .loc on MultiIndex objects can be found further down the page.

Rationale
.loc is a commonly used attribute and its reference page is often the first point of entry into the documentation from a Google search, Stack Overflow article, or other outside source. A relatively new user may miss the user guides because of this entry point (I personally did for months while learning pandas). Providing an example of .loc usage with multiple conditionals and a prominent link to related user guides will help reduce hours of frustration.

Note
I have already written a draft of this addition to the documentation and plan to assign this issue to myself and make this contribution on the assumption that it is approved.

@sweisss sweisss added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 6, 2023
@sweisss
Copy link
Contributor Author

sweisss commented Jun 6, 2023

take

@topper-123
Copy link
Contributor

It's a good idea to add an example with multiple conditions.

The perf. benefit of MultiIndex is probably too specialized to put in the loc doc string, but could maybe go to the advanced indexing section of the docs.

@topper-123 topper-123 added Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 6, 2023
@rhshadrach
Copy link
Member

Multiple conditional statements in .loc must be wrapped in parens ( ) and separated by a single & or |. This differs from typical python conditional statement syntax and can lead to confusion.

Perhaps this a bit of a nit pick, but mutli-conditional statements don't really have such requirements. You're more talking about the syntax behind combining Boolean Series. How ever the Series is constructed (there are many ways), at the end of the day you're just passing a Series into .loc. For example:

df.loc[df['max_speed'].gt(1) & df['shield'].lt(8)]

works just fine, as does

df.loc[df.eval('max_speed > 1 and shield < 8')]

@sweisss
Copy link
Contributor Author

sweisss commented Jun 7, 2023

Perhaps this a bit of a nit pick, but mutli-conditional statements don't really have such requirements. You're more talking about the syntax behind combining Boolean Series. How ever the Series is constructed (there are many ways), at the end of the day you're just passing a Series into .loc.

Thank you for this distinction. I'll be sure to note this in my update. Is there somewhere in the docs that goes into more detail about combining Boolean Series that I can link to? I can't seem to find it if it exists.

@rhshadrach
Copy link
Member

@sweisss
Copy link
Contributor Author

sweisss commented Jun 7, 2023

https://pandas.pydata.org/pandas-docs/dev/user_guide/indexing.html#boolean-indexing

Yes, thank you! I thought I came across it somewhere before!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants