Typ parts of c parser #44677

phofl · 2021-11-29T22:20:10Z

Ensure all linting tests pass, see here for how to run them

# Conflicts: # pandas/io/parsers/base_parser.py # pandas/io/parsers/c_parser_wrapper.py

pandas/io/parsers/base_parser.py

twoertwein · 2021-12-01T01:52:22Z

pandas/io/parsers/c_parser_wrapper.py

@@ -34,7 +45,7 @@ class CParserWrapper(ParserBase):

    def __init__(
        self, src: FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str], **kwds
-    ):
+    ) -> None:


None is optional for __init__.

Personally, I still like to add it. If mypy sees __init__ without None as partially typed (I'm not sure whether that is the case), it would be good to keep None.

I prefer adding it to

we have some (now outdated) style guidelines for the typing in https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#style-guidelines.

The not including the return type of __init__ was an unwritten style choice (IIRC preferred by @WillAyd)

If this is to be changed, we should be consistent and add it elsewhere (and update the style guidlines). I did have no strong preference originally but now that we have been not adding the redundant None for so long, I would now prefer the status quo.

Ok, removed it

twoertwein · 2021-12-01T01:55:21Z

pandas/io/parsers/c_parser_wrapper.py

+    ) -> tuple[
+        Index | MultiIndex | None,
+        Sequence[Hashable] | MultiIndex,
+        Mapping[Hashable, ArrayLike],


Return types should be as concrete as possible. If you know it is a list/dict, it probably shouldn't be Sequence/Mapping.

The basic issue here is, if we type some functions as sequence, we also have to type the return types as sequence, because most of the time there is one code branch just passing the inputs along (for example _do_date_conversion). If we want to use lists, we get into a bunch of other issues. Not 100% sure what to do there. Went with Sequence for now

FWIW i usually use list/tuple/whatever on the theory that "well it's accurate and more specific than Sequence!" and then @simonjayhawkins tells me to use Sequence anyway

pandas/io/parsers/c_parser_wrapper.py

twoertwein

Looks good to me.

@simonjayhawkins should probably have a look at the Sequence vs list cases.

phofl · 2021-12-10T09:24:03Z

gentle ping @simonjayhawkins

jreback

cc @twoertwein @simonjayhawkins if comments

simonjayhawkins · 2021-12-17T18:17:10Z

pandas/io/parsers/base_parser.py

-    def _evaluate_usecols(self, usecols, names):
+    @overload
+    def _evaluate_usecols(
+        self, usecols: set[int] | Callable[[Hashable], int], names: Sequence[Hashable]


can the callable return a bool?

No, the return values are indices

sorry should have been more clear.

I'm curious as to why usecols is a callable with an int return type.

ParserBase.__init__ is not typed, so it is not clear to me why this is int. bool is type compatible here and surely any truthy value is valid. So the return type of the usecols callable is whatever the public api accepts. This is any object that can be truthy?

ah got you, yeah you are correct, this has to return a bool

in the docs for say read_csv https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.read_csv.html

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True.

So doesn't explicitly say it should be bool, but why has int been chosen?

Because I minsinterpreted the return value of the callable as the return value of _evaluate_use_cols. Confused both with eacht other

sure. we do use bool in many places in the public api where the api probably accepts any object that can be truthy. This may become an issue for users when the types are public and users start getting false positives when type checking.

so changing to bool is fine for now.

Technically this could return anything, which evaluates to True/False. But since the docs say that this has to evaluate to True, I think we can type it like this?

Based on the doc-string (I didn't look at the code), it should probably be Callable[[Hashable], object].

Changed to object

jreback · 2021-12-22T02:40:16Z

thanks @phofl

phofl added 5 commits November 12, 2021 16:51

Start typing

92ed9b7

Continue typing

182217a

Merge remote-tracking branch 'upstream/master' into typ_c_parser

42b15a2

# Conflicts: # pandas/io/parsers/base_parser.py # pandas/io/parsers/c_parser_wrapper.py

Resolve conflicts

091b052

Add argument back in

a9b4e07

phofl added IO CSV read_csv, to_csv Typing type annotations, mypy/pyright type checking labels Nov 29, 2021

twoertwein reviewed Dec 1, 2021

View reviewed changes

phofl added 2 commits December 1, 2021 20:41

Adress review

2596f19

Improve callable

95a0de0

twoertwein approved these changes Dec 1, 2021

View reviewed changes

jreback added this to the 1.4 milestone Dec 1, 2021

phofl added 3 commits December 4, 2021 23:45

Merge remote-tracking branch 'upstream/master' into typ_c_parser

9979f41

Merge remote-tracking branch 'upstream/master' into typ_c_parser

de71573

Merge remote-tracking branch 'upstream/master' into typ_c_parser

3146aae

phofl added 3 commits December 10, 2021 16:08

Merge remote-tracking branch 'upstream/master' into typ_c_parser

30c46b2

Merge remote-tracking branch 'upstream/master' into typ_c_parser

129d5af

Merge remote-tracking branch 'upstream/master' into typ_c_parser

5f32d2f

jreback approved these changes Dec 17, 2021

View reviewed changes

Remove return from init

34795f8

simonjayhawkins reviewed Dec 17, 2021

View reviewed changes

phofl added 3 commits December 17, 2021 19:45

Change callable

fa8fc9b

Change callable

6df4cdc

Merge remote-tracking branch 'upstream/master' into typ_c_parser

d8fa395

jreback merged commit 9138b1d into pandas-dev:master Dec 22, 2021

phofl deleted the typ_c_parser branch December 22, 2021 09:09

phofl mentioned this pull request Mar 11, 2022

TYP: annotation of __init__ return type (PEP 484) (pandas/plotting) #46283

Merged

1 task

jreback mentioned this pull request Mar 12, 2022

TYP: returning None for __init__ #46337

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typ parts of c parser #44677

Typ parts of c parser #44677

phofl commented Nov 29, 2021

twoertwein Dec 1, 2021

phofl Dec 1, 2021

simonjayhawkins Dec 17, 2021

phofl Dec 17, 2021

twoertwein Dec 1, 2021

phofl Dec 1, 2021

jbrockmendel Dec 5, 2021

twoertwein left a comment

phofl commented Dec 10, 2021

jreback left a comment

simonjayhawkins Dec 17, 2021

phofl Dec 17, 2021

simonjayhawkins Dec 17, 2021

phofl Dec 17, 2021

simonjayhawkins Dec 17, 2021

phofl Dec 17, 2021 •

edited

Loading

simonjayhawkins Dec 17, 2021

phofl Dec 17, 2021

twoertwein Dec 17, 2021

phofl Dec 17, 2021

jreback commented Dec 22, 2021

Typ parts of c parser #44677

Typ parts of c parser #44677

Conversation

phofl commented Nov 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

twoertwein left a comment

Choose a reason for hiding this comment

phofl commented Dec 10, 2021

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phofl Dec 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 22, 2021

phofl Dec 17, 2021 •

edited

Loading