Skip to content

DOC: Add DataFrame.index.levels #55437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Oct 12, 2023
Merged
Changes from 9 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
50ff6c2
modified: pandas/core/indexes/multi.py
shiersansi Oct 7, 2023
f5b7e29
modified: pandas/core/indexes/multi.py
shiersansi Oct 7, 2023
2c8b861
modified: pandas/core/indexes/multi.py
shiersansi Oct 7, 2023
6241262
modified: pandas/core/indexes/multi.py
shiersansi Oct 8, 2023
aac99e1
modified: pandas/core/indexes/multi.py
shiersansi Oct 8, 2023
a3984e1
modified: pandas/core/indexes/multi.py
shiersansi Oct 8, 2023
c11f34a
modified: pandas/core/indexes/multi.py
shiersansi Oct 8, 2023
f2bf96a
modified: pandas/core/indexes/multi.py
shiersansi Oct 8, 2023
164df6f
modified: pandas/core/indexes/multi.py
shiersansi Oct 8, 2023
14be467
modified: pandas/core/indexes/multi.py
shiersansi Oct 8, 2023
f5085f8
modified: pandas/core/indexes/multi.py
shiersansi Oct 8, 2023
1bca09d
modified: pandas/core/indexes/multi.py
shiersansi Oct 9, 2023
ad66dfb
modified: pandas/core/indexes/multi.py
shiersansi Oct 9, 2023
9ea6ebb
modified: pandas/core/indexes/multi.py
shiersansi Oct 9, 2023
329e45c
modified: pandas/core/indexes/multi.py
shiersansi Oct 9, 2023
b871123
modified: ../pandas/core/indexes/multi.py
shiersansi Oct 11, 2023
ca13b9a
modified: ../pandas/core/indexes/multi.py
shiersansi Oct 11, 2023
4865b2a
modified: pandas/core/indexes/multi.py
shiersansi Oct 11, 2023
6eaa176
modified: pandas/core/indexes/multi.py
shiersansi Oct 12, 2023
ed8ec94
modified: pandas/core/indexes/multi.py
shiersansi Oct 12, 2023
39eacd2
modified: pandas/core/indexes/multi.py
shiersansi Oct 12, 2023
19298c4
modified: pandas/core/indexes/multi.py
shiersansi Oct 12, 2023
324b682
modified: pandas/core/indexes/multi.py
shiersansi Oct 12, 2023
1e1c873
modified: pandas/core/indexes/multi.py
shiersansi Oct 12, 2023
1d658e0
modified: pandas/core/indexes/multi.py
shiersansi Oct 12, 2023
efe5e0a
modified: pandas/core/indexes/multi.py
shiersansi Oct 12, 2023
596d55c
modified: pandas/core/indexes/multi.py
shiersansi Oct 12, 2023
786abee
Update pandas/core/indexes/multi.py
datapythonista Oct 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -840,6 +840,46 @@ def size(self) -> int:

@cache_readonly
def levels(self) -> FrozenList:
"""
Levels of the MultiIndex.

What are levels?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to remove the questions. In isolation I think they are fine, but since we don't use them in any other API documentation page, they feel a bit strange and inconsistent. I think it's easy to understand the explanations without the questions, so no big deal.

Levels refer to the different hierarchical levels or layers in a MultiIndex.
In a MultiIndex, each level represents a distinct dimension or category of
the index.

How can levels be seen in pandas?
To access the levels, you can use the levels attribute of the MultiIndex,
which returns a tuple of Index objects. Each Index object represents a
level in the MultiIndex and contains the unique values found in that
specific level.

Notes
-----
If the MultiIndex is sliced, this method still returns the original
levels of the MultiIndex.
When using this method on a DataFrame index, it may show "extra".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this last sentence talking about the MultiIndex returning all original levels? I personally find the it may show extra comment misleading. If you want to complement the previous sentence, what about something like For example, if a MultiIndex is created with levels A, B, C, and the DataFrame using it filters out all rows of the level C, MultiIndex.levels will still return A, B, C. Something like this should make things easier to understand than this last comment IMHO.


Examples
--------
Example 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the Example 1 and just show the example directly.

>>> idx = pd.MultiIndex.from_product([(0, 1, 2), ('green', 'purple')],
... names=['number', 'color'])
>>> idx.levels
FrozenList([[0, 1, 2], ['green', 'purple']])

Example 2:
>>> idx = pd.MultiIndex.from_product([['John', 'Josh', 'Alex'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reuse the index from above, so this is less verbose please.

... list('abcde')], names=['Person', 'Letter'])
>>> large = pd.DataFrame(data=np.random.randn(15, 2),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to just call this df instead of large

... index=idx, columns=['one', 'two'])
>>> small = large.loc[['Jo'==d[0:2] for d in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show the dataframe here first, so the user can better understand the data of the example? I think you'll want to avoid using random data in the example, as it'll make your life more difficult. And if you use data that is meaningful instead of some random things, that will also help users understand better. As an idea, you can use a MultiIndex of animals with class/animal (e.g. insect/ant, insect/spides, mammal/goat...) and use a dataframe with just one column, for example number of legs (we use this example in other docs). The example is silly, but meaningful and any user can understand it very quickly, and focus on the concepts, not in understanding the data.

Then, as said, you can show the dataframe first, show the levels of the multiindex from it (i.e. df.index.levels), and then you filter the data many_leg_animals = animals[animals.num_legs > 4] to finally show that even if now there are no mammals, in the data, the multiindex still contains all the levels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

屏幕截图 2023-10-08 164751

Thank you for your suggestion. Is it easier to understand the example part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a problem that I may need your help with. When I tested the code after I committed it, four tests failed, but I couldn't figure out exactly what went wrong from the error log provided.

... large.index.get_level_values('Person')]]
>>> large.index.levels
FrozenList([['Alex', 'John', 'Josh'], ['a', 'b', 'c', 'd', 'e']])
>>> small.index.levels
FrozenList([['Alex', 'John', 'Josh'], ['a', 'b', 'c', 'd', 'e']])
"""
# Use cache_readonly to ensure that self.get_locs doesn't repeatedly
# create new IndexEngine
# https://github.com/pandas-dev/pandas/issues/31648
Expand Down