Skip to content

ENH: Synchronize large parts of IO with pandas #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

bashtage
Copy link
Contributor

No description provided.

storage_options: StorageOptions = ...,
) -> IOHandles[bytes]: ...
@overload
def get_handle(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, only classes/functions/... listed on this page are considered to be public https://pandas.pydata.org/docs/reference/index.html Since there are many classes/functions that should be public but are not listed there, people seem to assume that classes/functions that do not start with _ are by common conventions public.

As far as I know, nothing in io/common is meant to be public. Adding it here might suggest that it is public. That is obviously a pandas issue but it would be great to discuss some guidelines of what should/shouldn't be in the stubs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the goal then it is probably a lot more doable. The stubs now look like they were generated using an earlier version of stubgen. They have a lot of wrong things in them, including entire files that have no corresponding code in pandas.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, only classes/functions/... listed on this page are considered to be public https://pandas.pydata.org/docs/reference/index.html Since there are many classes/functions that should be public but are not listed there, people seem to assume that classes/functions that do not start with _ are by common conventions public.

As far as I know, nothing in io/common is meant to be public. Adding it here might suggest that it is public. That is obviously a pandas issue but it would be great to discuss some guidelines of what should/shouldn't be in the stubs.

We could have that discussion outside this issue. Do you want to start an issue or a discussion on this topic? I have some thoughts on it, but would rather discuss it in a more focused area on that topic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the goal then it is probably a lot more doable. The stubs now look like they were generated using an earlier version of stubgen. They have a lot of wrong things in them, including entire files that have no corresponding code in pandas.

Yes, this is a historical artifact. These stubs were generated by Microsoft, with stubgen, and possibly on pandas 1.1 or 1.2. I used them heavily when they were only shipped with VS Code, and kept doing PR's there to make my team's code pass things, and then we had discussions in our monthly pandas dev meetings about how to move forward, given there was another effort for stubs that had testing (which is where the tests here came from). Net result are these stubs, which we all knew would take a lot of work to get right, but provided a good starting point. Didn't want to wait to make them "perfect".

@overload
def validate_integer(name, val: int | None, min_val=...) -> int | None: ...
@overload
def read_csv(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that the version of read_csv() that is currently in parsers.pyi is copied here as is. I can't tell because of the file reorg. A lot of work went into getting that set of overloads correct (not to say that they couldn't be improved). Any improvements to individual stubs should be left to a second PR.

@@ -0,0 +1,7 @@
from pandas.io.parsers.readers import (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the change here is to remove io/parsers.pyi and replace with a directory parsers, but for some reason the removal of io/parsers.pyi is not showing up as a "changed file" . Could be a github bug, but can you check that the git rm io/parsers.pyi is in your commit?

@bashtage
Copy link
Contributor Author

bashtage commented Jul 22, 2022

I"m going to put this on hold until a decision is made on #161 . Going through is rather limited exercise showed that is is effectively impossible to maintain stubs for the entirety of pandas including private methods. IIRC numpy doesn't type anything private either in pyi files.

@bashtage bashtage marked this pull request as draft July 22, 2022 14:41
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jul 22, 2022

I"m going to put this on hold until a decision is made on #161 . Going through is rather limited exercise showed that is is effectively impossible to maintain stubs for the entirety of pandas including private methods. IIRC numpy doesn't type anything private either in pyi files.

One thing to consider is taking what we know as public, e.g., read_csv(), which is now declared in parsers.pyi, and moving it to parsers/readers.pyi so that at least we are mirroring where the declarations happen in pandas.

@bashtage bashtage closed this Aug 22, 2022
@bashtage bashtage deleted the more-io branch September 1, 2022 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants