ENH: Synchronize large parts of IO with pandas #160

bashtage · 2022-07-21T17:34:12Z

No description provided.

Improve accuracy of io stub files

twoertwein · 2022-07-21T19:16:01Z

pandas-stubs/io/common.pyi

+    storage_options: StorageOptions = ...,
+) -> IOHandles[bytes]: ...
+@overload
+def get_handle(


Technically, only classes/functions/... listed on this page are considered to be public https://pandas.pydata.org/docs/reference/index.html Since there are many classes/functions that should be public but are not listed there, people seem to assume that classes/functions that do not start with _ are by common conventions public.

As far as I know, nothing in io/common is meant to be public. Adding it here might suggest that it is public. That is obviously a pandas issue but it would be great to discuss some guidelines of what should/shouldn't be in the stubs.

If that is the goal then it is probably a lot more doable. The stubs now look like they were generated using an earlier version of stubgen. They have a lot of wrong things in them, including entire files that have no corresponding code in pandas.

Technically, only classes/functions/... listed on this page are considered to be public https://pandas.pydata.org/docs/reference/index.html Since there are many classes/functions that should be public but are not listed there, people seem to assume that classes/functions that do not start with _ are by common conventions public.

As far as I know, nothing in io/common is meant to be public. Adding it here might suggest that it is public. That is obviously a pandas issue but it would be great to discuss some guidelines of what should/shouldn't be in the stubs.

We could have that discussion outside this issue. Do you want to start an issue or a discussion on this topic? I have some thoughts on it, but would rather discuss it in a more focused area on that topic.

If that is the goal then it is probably a lot more doable. The stubs now look like they were generated using an earlier version of stubgen. They have a lot of wrong things in them, including entire files that have no corresponding code in pandas.

Yes, this is a historical artifact. These stubs were generated by Microsoft, with stubgen, and possibly on pandas 1.1 or 1.2. I used them heavily when they were only shipped with VS Code, and kept doing PR's there to make my team's code pass things, and then we had discussions in our monthly pandas dev meetings about how to move forward, given there was another effort for stubs that had testing (which is where the tests here came from). Net result are these stubs, which we all knew would take a lot of work to get right, but provided a good starting point. Didn't want to wait to make them "perfect".

Dr-Irv · 2022-07-22T12:34:05Z

pandas-stubs/io/parsers/readers.pyi

+@overload
+def validate_integer(name, val: int | None, min_val=...) -> int | None: ...
+@overload
+def read_csv(


Please make sure that the version of read_csv() that is currently in parsers.pyi is copied here as is. I can't tell because of the file reorg. A lot of work went into getting that set of overloads correct (not to say that they couldn't be improved). Any improvements to individual stubs should be left to a second PR.

Dr-Irv · 2022-07-22T13:26:49Z

pandas-stubs/io/parsers/__init__.pyi

@@ -0,0 +1,7 @@
+from pandas.io.parsers.readers import (


I know the change here is to remove io/parsers.pyi and replace with a directory parsers, but for some reason the removal of io/parsers.pyi is not showing up as a "changed file" . Could be a github bug, but can you check that the git rm io/parsers.pyi is in your commit?

bashtage · 2022-07-22T14:40:58Z

I"m going to put this on hold until a decision is made on #161 . Going through is rather limited exercise showed that is is effectively impossible to maintain stubs for the entirety of pandas including private methods. IIRC numpy doesn't type anything private either in pyi files.

Dr-Irv · 2022-07-22T16:03:13Z

I"m going to put this on hold until a decision is made on #161 . Going through is rather limited exercise showed that is is effectively impossible to maintain stubs for the entirety of pandas including private methods. IIRC numpy doesn't type anything private either in pyi files.

One thing to consider is taking what we know as public, e.g., read_csv(), which is now declared in parsers.pyi, and moving it to parsers/readers.pyi so that at least we are mirroring where the declarations happen in pandas.

bashtage added 4 commits July 21, 2022 14:06

ENH: Improve fidelity of io stubs

3db307d

Improve accuracy of io stub files

ENH: Improve pandas/io

a66ac99

ENH: More improvements for IO

118c974

CLN: Remove pyi files without an upstream py file

01d11f0

twoertwein reviewed Jul 21, 2022

View reviewed changes

ENH: Modernize io

495e206

Dr-Irv reviewed Jul 22, 2022

View reviewed changes

bashtage marked this pull request as draft July 22, 2022 14:41

bashtage closed this Aug 22, 2022

bashtage deleted the more-io branch September 1, 2022 06:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Synchronize large parts of IO with pandas #160

ENH: Synchronize large parts of IO with pandas #160

bashtage commented Jul 21, 2022

twoertwein Jul 21, 2022

bashtage Jul 21, 2022

Dr-Irv Jul 22, 2022

Dr-Irv Jul 22, 2022

Dr-Irv Jul 22, 2022

Dr-Irv Jul 22, 2022

bashtage commented Jul 22, 2022 •

edited

Loading

Dr-Irv commented Jul 22, 2022

ENH: Synchronize large parts of IO with pandas #160

ENH: Synchronize large parts of IO with pandas #160

Conversation

bashtage commented Jul 21, 2022

twoertwein Jul 21, 2022

Choose a reason for hiding this comment

bashtage Jul 21, 2022

Choose a reason for hiding this comment

Dr-Irv Jul 22, 2022

Choose a reason for hiding this comment

Dr-Irv Jul 22, 2022

Choose a reason for hiding this comment

Dr-Irv Jul 22, 2022

Choose a reason for hiding this comment

Dr-Irv Jul 22, 2022

Choose a reason for hiding this comment

bashtage commented Jul 22, 2022 • edited Loading

Dr-Irv commented Jul 22, 2022

bashtage commented Jul 22, 2022 •

edited

Loading