Skip to content

ENH: Add to_tuples method to IntervalIndex #538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 17, 2023

Conversation

skatsuta
Copy link
Contributor

@skatsuta skatsuta commented Feb 15, 2023

  • Closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added: Please use assert_type() to assert the type of any return value

Thank you for always maintaining this stub!
It contributes greatly to the robustness of Python programs using pandas.

Problem

IntervalIndex has to_tuples method, but its stub is currently missing.

Solution

Add a stub for IntervalIndex#to_tuples and a test for it.

Note that the documentation above states that the return type of that method is an ndarray,
but it is actually Index (the test to be added checks for this).
This appears to be a mistake in the pandas doc.
(Probably the doc for IntervalArray#to_tuples is also being used for IntervalIndex#to_tuples, and since IntervalArray#to_tuples does indeed return an ndarray, it is incorrectly showing up in the doc for IntervalIndex#to_tuples as well.)

@skatsuta skatsuta changed the title Add to_tuples method to IntervalIndex ENH: Add to_tuples method to IntervalIndex Feb 15, 2023
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. One small change in the test.

@@ -31,6 +31,11 @@ def test_from_tuples() -> None:
)


def test_to_tuples() -> None:
ind = pd.IntervalIndex.from_tuples([(0, 1), (1, 2)]).to_tuples()
check(assert_type(ind, pd.Index), pd.Index)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this test to

check(assert_type(ind, pd.Index), pd.Index, tuple)

That will then check we get an Index of tuples.

Copy link
Contributor Author

@skatsuta skatsuta Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I wrote such a test at first, but finally removed the last tuple argument because I think it is not necessary to check the type of elements in the Index in this test at this point.

If Indexes are generic now, then this test should be like this:

check(assert_type(ind, "pd.Index[tuple]"), pd.Index, tuple)

in which case it is important to check that the inferred return type is actually an instance of Index of tuples.

However, since Indexes are not generic at the moment, it is not the responsibility of this stub to ensure that a return value is actually an Index "of tuples".
For now, this type stub layer should only guarantee the behavior of "returning an Index (of something)".

So, I think the current test is fine for now, and checking for the type of elements in an Index will only make sense after generic support for Indexes is added in the future (at that time this test should be modified to the code above).

Are you suggesting that this test should check for the type of elements in the Index even though it is currently a test for runtime behavior and not a test for this type stub?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current test is fine for now, and checking for the type of elements in an Index will only make sense after generic support for Indexes is added in the future (at that time this test should be modified to the code above).

Are you suggesting that this test should check for the type of elements in the Index even though it is currently a test for runtime behavior and not a test for this type stub?

The assert_type() call checks the validity of the stub. The check() call validates that the stub and runtime behavior are aligned. By adding the tuple argument to check, it is just making sure that alignment is correct.

We've done that (admittedly inconsistently) in other places, and it actually has revealed some errors in pandas itself, or in the stubs. So the goal is to try to get more consistency in how we do our testing.

I agree that things will be better once we make Index generic in the stubs.

Copy link
Contributor Author

@skatsuta skatsuta Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that the check() call checks that the type information in the stub and the runtime type information match, but that alignment check is accomplished even without the dtype=tuple argument (because Index is currently not generic and the type of elements in an Index on the stub side is not known).

However, I agree with you that adding the argument would improve consistency between pandas and stubs.
It would allow the stub layer to perform additional checks on pandas' behavior, for example, to detect bugs in pandas (or stubs).
Thanks for the explanation of your intention, I am convinced now that it is better to have this argument.

I have reverted to passing the last tuple argument to the check() call.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Feb 15, 2023

Note that the documentation above states that the return type of that method is an ndarray,
but it is actually Index (the test to be added checks for this).
This appears to be a mistake in the pandas doc.
(Probably the doc for IntervalArray#to_tuples is also being used for IntervalIndex#to_tuples, and since IntervalArray#to_tuples does indeed return an ndarray, it is incorrectly showing up in the doc for IntervalIndex#to_tuples as well.)

Can you create an issue in the pandas repo about this doc issue? You're correct that the docs for IntervalIndex.to_tuples() is generated from IntervalArray.to_tuples() so that is something that should get fixed over there.

@skatsuta
Copy link
Contributor Author

Can you create an issue in the pandas repo about this doc issue? You're correct that the docs for IntervalIndex.to_tuples() is generated from IntervalArray.to_tuples() so that is something that should get fixed over there.

OK thanks, will open an issue.

@skatsuta skatsuta force-pushed the add-to-tuples-to-interval-index branch from 433439c to b6bad8d Compare February 17, 2023 12:29
Copy link
Contributor Author

@skatsuta skatsuta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dr-Irv Thanks for your suggestion and additional comment!
I have understood your intention now and added the tuple argument back in.
Would appreciate it if you could re-review it when you have time.

Also, a test with mypy nightly is failing, but I don't think that is due to this change.

@@ -31,6 +31,11 @@ def test_from_tuples() -> None:
)


def test_to_tuples() -> None:
ind = pd.IntervalIndex.from_tuples([(0, 1), (1, 2)]).to_tuples()
check(assert_type(ind, pd.Index), pd.Index)
Copy link
Contributor Author

@skatsuta skatsuta Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that the check() call checks that the type information in the stub and the runtime type information match, but that alignment check is accomplished even without the dtype=tuple argument (because Index is currently not generic and the type of elements in an Index on the stub side is not known).

However, I agree with you that adding the argument would improve consistency between pandas and stubs.
It would allow the stub layer to perform additional checks on pandas' behavior, for example, to detect bugs in pandas (or stubs).
Thanks for the explanation of your intention, I am convinced now that it is better to have this argument.

I have reverted to passing the last tuple argument to the check() call.

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @skatsuta

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Feb 17, 2023

Also, a test with mypy nightly is failing, but I don't think that is due to this change.

That was fixed in #541

@Dr-Irv Dr-Irv merged commit 6d28a28 into pandas-dev:main Feb 17, 2023
@skatsuta skatsuta deleted the add-to-tuples-to-interval-index branch February 18, 2023 09:00
twoertwein pushed a commit to twoertwein/pandas-stubs that referenced this pull request Apr 1, 2023
Add to_tuples method to IntervalIndex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants