-
-
Notifications
You must be signed in to change notification settings - Fork 141
ENH: Synchronize large parts of IO with pandas #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Improve accuracy of io stub files
storage_options: StorageOptions = ..., | ||
) -> IOHandles[bytes]: ... | ||
@overload | ||
def get_handle( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, only classes/functions/... listed on this page are considered to be public https://pandas.pydata.org/docs/reference/index.html Since there are many classes/functions that should be public but are not listed there, people seem to assume that classes/functions that do not start with _
are by common conventions public.
As far as I know, nothing in io/common is meant to be public. Adding it here might suggest that it is public. That is obviously a pandas issue but it would be great to discuss some guidelines of what should/shouldn't be in the stubs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is the goal then it is probably a lot more doable. The stubs now look like they were generated using an earlier version of stubgen. They have a lot of wrong things in them, including entire files that have no corresponding code in pandas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, only classes/functions/... listed on this page are considered to be public https://pandas.pydata.org/docs/reference/index.html Since there are many classes/functions that should be public but are not listed there, people seem to assume that classes/functions that do not start with
_
are by common conventions public.As far as I know, nothing in io/common is meant to be public. Adding it here might suggest that it is public. That is obviously a pandas issue but it would be great to discuss some guidelines of what should/shouldn't be in the stubs.
We could have that discussion outside this issue. Do you want to start an issue or a discussion on this topic? I have some thoughts on it, but would rather discuss it in a more focused area on that topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is the goal then it is probably a lot more doable. The stubs now look like they were generated using an earlier version of stubgen. They have a lot of wrong things in them, including entire files that have no corresponding code in pandas.
Yes, this is a historical artifact. These stubs were generated by Microsoft, with stubgen, and possibly on pandas 1.1 or 1.2. I used them heavily when they were only shipped with VS Code, and kept doing PR's there to make my team's code pass things, and then we had discussions in our monthly pandas dev meetings about how to move forward, given there was another effort for stubs that had testing (which is where the tests here came from). Net result are these stubs, which we all knew would take a lot of work to get right, but provided a good starting point. Didn't want to wait to make them "perfect".
@overload | ||
def validate_integer(name, val: int | None, min_val=...) -> int | None: ... | ||
@overload | ||
def read_csv( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure that the version of read_csv()
that is currently in parsers.pyi
is copied here as is. I can't tell because of the file reorg. A lot of work went into getting that set of overloads correct (not to say that they couldn't be improved). Any improvements to individual stubs should be left to a second PR.
@@ -0,0 +1,7 @@ | |||
from pandas.io.parsers.readers import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the change here is to remove io/parsers.pyi
and replace with a directory parsers
, but for some reason the removal of io/parsers.pyi
is not showing up as a "changed file" . Could be a github bug, but can you check that the git rm io/parsers.pyi
is in your commit?
I"m going to put this on hold until a decision is made on #161 . Going through is rather limited exercise showed that is is effectively impossible to maintain stubs for the entirety of pandas including private methods. IIRC numpy doesn't type anything private either in pyi files. |
One thing to consider is taking what we know as public, e.g., |
No description provided.