Skip to content

TYP: stubs for tslibs #40433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions pandas/_libs/tslibs/ccalendar.pyi
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@

DAYS: list[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does list[str] work in Python < 3.9? (asking because I don't know)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i decided to try this out bc i think ive seen @simonjayhawkins using this pattern using it recently

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does list[str] work in Python < 3.9? (asking because I don't know)

pyi files don't go through the Python interpreter. so have the advantage of being able to use the latest typing features understood by mypy/pyre/pyright/pytype. list[str] is PEP 585 compliant.


side note: we are in the process of attempting to inline the microsoft pyright stubs. inline types not only need to satisfy the python interpreter supported (i.e. python 3.7 without typing extensions) but the code in the annotated functions also needs to be consistent with the in-line function annotations.

so (hopefully without sounding too negative) I imagine that not only will that take a while but may also not even be possible in some cases.

For type checking user code against the public api, I expect using stubs (eg. bundled with pyright or 3rd party such as https://github.com/predictive-analytics-lab/data-science-types (development ceased 16 Feb 2021)) will be the better option for the foreseeable future.

However, internal consistency/robustness is achieved though adding inline types and checking with mypy (which has excellent features that support the gradual typing process.)


we do yet have a good way of generating type stubs from cython files, so in order to have types available for the compiled code, we probably need to manually curate these. So being able to use the pyright stubs here would be beneficial.

@jbrockmendel Are these stubs manually curated or forked from pyright? I believe @Dr-Irv is coordinating between pandas and pyright and may have some input on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these stubs manually curated or forked from pyright?

Manually curated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing this is OK, as adding stubs may reveal problems in the python code that may need to be fixed. So doing this before pulling in the pyright stubs could be advantageous and make the migration smoother.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel Are these stubs manually curated or forked from pyright? I believe @Dr-Irv is coordinating between pandas and pyright and may have some input on this.

I wouldn't say I'm "coordinating", but the work that has been done by the Microsoft team provides a good starting point (and possible reference) for pandas type stubs. See https://github.com/microsoft/python-type-stubs/tree/main/pandas

MONTH_ALIASES: dict[int, str]
MONTH_NUMBERS: dict[str, int]
MONTHS: list[str]
int_to_weekday: dict[int, str]

def get_firstbday(year: int, month: int) -> int: ...
def get_lastbday(year: int, month: int) -> int: ...
def get_day_of_year(year: int, month: int, day: int) -> int: ...
def get_iso_calendar(year: int, month: int, day: int) -> tuple[int, int, int]: ...
def get_week_of_year(year: int, month: int, day: int) -> int: ...
def get_days_in_month(year: int, month: int) -> int: ...
11 changes: 11 additions & 0 deletions pandas/_libs/tslibs/strptime.pyi
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from typing import Optional

import numpy as np

def array_strptime(
values: np.ndarray, # np.ndarray[object]
fmt: Optional[str],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str | None and then don't need the import.

exact: bool = True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't add the defaults. just ... less to keep in sync

errors: str = "raise"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

) -> tuple[np.ndarray, np.ndarray]: ...
# first ndarray is M8[ns], second is object ndarray of Optional[tzinfo]
32 changes: 32 additions & 0 deletions pandas/_libs/tslibs/timezones.pyi
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
from datetime import (
datetime,
tzinfo,
)
from typing import (
Callable,
Optional,
Union,
Comment on lines +7 to +8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could lose these imports using the newer syntax

)

import numpy as np

# imported from dateutil.tz
dateutil_gettz: Callable[[str], tzinfo]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could import from dateutil.tz directly in the stub.



def tz_standardize(tz: tzinfo) -> tzinfo: ...

def tz_compare(start: Optional[tzinfo], end: Optional[tzinfo]) -> bool: ...

def infer_tzinfo(
start: Optional[datetime], end: Optional[datetime],
) -> Optional[tzinfo]: ...

# ndarrays returned are both int64_t
def get_dst_info(tz: tzinfo) -> tuple[np.ndarray, np.ndarray, str]: ...

def maybe_get_tz(tz: Optional[Union[str, int, np.int64, tzinfo]]) -> Optional[tzinfo]: ...

def get_timezone(tz: tzinfo) -> Union[tzinfo, str]: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again from https://github.com/python/typeshed/blob/master/CONTRIBUTING.md#stub-file-coding-style

avoid Union return types: python/mypy#1693

although not sure this is possible without a refactor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah id like to change this behavior too, but should be separate from typing


def is_utc(tz: Optional[tzinfo]) -> bool: ...
5 changes: 5 additions & 0 deletions pandas/_libs/tslibs/timezones.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ cdef inline bint treat_tz_as_dateutil(tzinfo tz):
return hasattr(tz, '_trans_list') and hasattr(tz, '_trans_idx')


# Returns str or tzinfo object
cpdef inline object get_timezone(tzinfo tz):
"""
We need to do several things here:
Expand All @@ -80,6 +81,8 @@ cpdef inline object get_timezone(tzinfo tz):
the tz name. It needs to be a string so that we can serialize it with
UJSON/pytables. maybe_get_tz (below) is the inverse of this process.
"""
if tz is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you've defined def get_timezone(tz: tzinfo) -> Union[tzinfo, str]: .... tz shouldn't be None? (from python code anyway)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with cython semantics tzinfo tz doesn't exclude None, so we have to do it ourselves

raise TypeError("tz argument cannot be None")
if is_utc(tz):
return tz
else:
Expand Down Expand Up @@ -364,6 +367,8 @@ cpdef bint tz_compare(tzinfo start, tzinfo end):
elif is_utc(end):
# Ensure we don't treat tzlocal as equal to UTC when running in UTC
return False
elif start is None or end is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably same answer as above. can start or end be None?

return start is None and end is None
return get_timezone(start) == get_timezone(end)


Expand Down
25 changes: 25 additions & 0 deletions pandas/_libs/tslibs/tzconversion.pyi
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
from datetime import (
timedelta,
tzinfo,
)
from typing import (
Iterable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from https://github.com/python/typeshed/blob/master/CONTRIBUTING.md#stub-file-coding-style

in Python 3 stubs, import collections (Mapping, Iterable, etc.) from collections.abc instead of typing;

Optional,
Union,
Comment on lines +7 to +8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind if i apply these comments in the next pass? already have a branch on deck

)

import numpy as np

def tz_convert_from_utc(
vals: np.ndarray, # const int64_t[:]
tz: tzinfo,
) -> np.ndarray: ... # np.ndarray[np.int64]

def tz_convert_from_utc_single(val: np.int64, tz: tzinfo) -> np.int64: ...

def tz_localize_to_utc(
vals: np.ndarray, # np.ndarray[np.int64]
tz: Optional[tzinfo],
ambiguous: Optional[Union[str, bool, Iterable[bool]]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... for default

nonexistent: Optional[Union[str, timedelta, np.timedelta64]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

) -> np.ndarray: ... # np.ndarray[np.int64]
47 changes: 47 additions & 0 deletions pandas/_libs/tslibs/vectorized.pyi
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
"""
For cython types that cannot be represented precisely, closest-available
python equivalents are used, and the precise types kept as adjacent comments.
"""
from datetime import tzinfo
from typing import (
Optional,
Union,
)

import numpy as np

from pandas._libs.tslibs.dtypes import Resolution
from pandas._libs.tslibs.offsets import BaseOffset

def dt64arr_to_periodarr(
stamps: np.ndarray, # const int64_t[:]
freq: int,
tz: Optional[tzinfo],
) -> np.ndarray: ... # np.ndarray[np.int64, ndim=1]


def is_date_array_normalized(
stamps: np.ndarray, # const int64_t[:]
tz: Optional[tzinfo] = None,
) -> bool: ...


def normalize_i8_timestamps(
stamps: np.ndarray, # const int64_t[:]
tz: Optional[tzinfo],
) -> np.ndarray: ... # np.ndarray[np.int64]


def get_resolution(
stamps: np.ndarray, # const int64_t[:]
tz: Optional[tzinfo] = None,
) -> Resolution: ...


def ints_to_pydatetime(
arr: np.ndarray, # const int64_t[:}]
tz: Optional[tzinfo] = None,
freq: Optional[Union[str, BaseOffset]] = None,
fold: bool = False,
box: str = "datetime",
) -> np.ndarray: ... # np.ndarray[object]
6 changes: 3 additions & 3 deletions pandas/core/tools/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -285,7 +285,7 @@ def _convert_listlike_datetimes(
name: Hashable = None,
tz: Optional[Timezone] = None,
unit: Optional[str] = None,
errors: Optional[str] = None,
errors: str = "raise",
infer_datetime_format: bool = False,
dayfirst: Optional[bool] = None,
yearfirst: Optional[bool] = None,
Expand Down Expand Up @@ -428,7 +428,7 @@ def _array_strptime_with_fallback(
tz,
fmt: str,
exact: bool,
errors: Optional[str],
errors: str,
infer_datetime_format: bool,
) -> Optional[Index]:
"""
Expand Down Expand Up @@ -476,7 +476,7 @@ def _to_datetime_with_format(
tz,
fmt: str,
exact: bool,
errors: Optional[str],
errors: str,
infer_datetime_format: bool,
) -> Optional[Index]:
"""
Expand Down