Skip to content

DOC: add examples to get_indexer_non_unique #50152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

natmokval
Copy link
Contributor

Updated docstring for Index.get_indexer_non_unique. Added examples to clarify Index.get_indexer_non_unique behavior.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on this

(array([-1, -1, -1]), array([0, 1, 2]))

Notice that the return value is a tuple contains two items.
The first item is an array of locations in ``index`` and ``x``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's x?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, there is no x in my examples ;) In this example, it should be q, r, and t.

Comment on lines +5567 to +5569
>>> index = pd.Index(['c', 'b', 'a', 'b', 'b'])
>>> index.get_indexer_non_unique(['b', 'b'])
(array([1, 3, 4, 1, 3, 4]), array([], dtype=int64))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we give this example first?

and then have the examples with missing values

Apart from the first example, can we also have a sentence before the missing values one, describing what the example is demonstrating?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we give this example first?

and then have the examples with missing values

Certainly. I will change the order of examples as you suggested.

Apart from the first example, can we also have a sentence before the missing values one, describing what the example is demonstrating?

Adding a sentence before the example with missing values sounds like a great idea.

To make this example more understandable I suggest the following explanation :
In the example below there are no matched values. For this reason, the returned indexer contains only integers equal to -1. It demonstrates no index at these positions that match the corresponding target values. The mask [0, 1, 2] in the return value shows that the first, second, and third elements are missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah looks fine

Comment on lines 5577 to 5578
is marked by -1, as it is not in ``index``. The second item
is a mask for new index given the current index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we need to repeat the return description here - perhaps, describe the mask given the example? e.g. given that the first and third elements are missing, the mask shows [0, 2]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, there is no need to repeat return sentence. I will add an explanation of what the mask shows.

@MarcoGorelli MarcoGorelli added this to the 2.0 milestone Dec 9, 2022
@natmokval
Copy link
Contributor Author

thanks for working on this

Thank you @MarcoGorelli for reviewing my PR.

@natmokval
Copy link
Contributor Author

Some tests in CI failed, I am not sure if it’s related to my changes. Locally (on Linux) test pandas/tests/indexes/object/test_indexing.py passes. My local python version in mamba environment is 3.8.15. It’s interesting that Python 3.11 is used in CI.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah CI failures are unrelated, and due to #50124

I've just left a couple of minor comments, but overall this looks good

Comment on lines 5569 to 5571
It demonstrates no index at these positions that match the corresponding
``target`` values. The mask [0, 1, 2] in the return value shows that the first,
second, and third elements are missing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just suggesting a slight rewording

Suggested change
It demonstrates no index at these positions that match the corresponding
``target`` values. The mask [0, 1, 2] in the return value shows that the first,
second, and third elements are missing.
It demonstrates that there's no match between the index and the ``target``
values at these positions. The mask [0, 1, 2] in the return value shows that
the first, second, and third elements are missing.

Comment on lines 5581 to 5583
Notice that the return value is a tuple contains two items. In the example
above the first item is an array of locations in ``index``. The second
item is a mask shows that the first and third elements are missing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this come first?

So, overall, I'd suggest:

  1. the example with >>> index.get_indexer_non_unique(['f', 'b', 's'])...
  2. this text "Notice that the return value is a tuple", then the example with >>> index.get_indexer_non_unique(['f', 'b', 's'])
  3. The text "In the example below there are no matched values."and then the example with ['q', 'r', 't']

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, MarcoGorelli, for your suggestions. I updated my PR accordingly.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @natmokval !

@MarcoGorelli MarcoGorelli merged commit f5323ab into pandas-dev:main Dec 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: add docstring example to Index.get_indexer_non_unique.html
2 participants