Skip to content

ENH: Improve Pandas scalars #383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 30 commits into from
Closed
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
5c0dc1c
ENH: Improve DatetimeTZDtype
bashtage Oct 11, 2022
9f5058f
ENH: Improve PeriodDtype
bashtage Oct 11, 2022
7ad615a
ENH: Improve IntervalDtype
bashtage Oct 11, 2022
fb6d782
ENH: Improve CategoricalDtype
bashtage Oct 11, 2022
fecd9f9
ENH: Improve StringDtype and StringArray
bashtage Oct 11, 2022
5adba7b
ENH: Improve BooleanDtype and BooleanArray
bashtage Oct 11, 2022
c4dca27
ENH: Improve Timestamp
bashtage Oct 11, 2022
d86adfd
ENH: Improve Timedelta
bashtage Oct 11, 2022
79f56c2
ENH: Further improvements to Timestamp
bashtage Oct 11, 2022
cc390f5
CLN: Additional cleanups to pass tests
bashtage Oct 11, 2022
c044a8c
ENH: Improve Period
bashtage Oct 11, 2022
f58c47c
TST Add tests for dtypes
Oct 11, 2022
fbfa4c1
Merge remote-tracking branch 'upstream/main' into pandas-scalars
Oct 11, 2022
ccbde88
TST: Correct types in tests
Oct 12, 2022
25aa8c9
Merge branch 'pandas-scalars' of github.com:bashtage/pandas-stubs int…
bashtage Oct 12, 2022
290f951
BUG: Correct errors in period and interval
bashtage Oct 12, 2022
e62cc10
TST: Add tests for Period
bashtage Oct 12, 2022
9c6b963
TST: Add tests for timedelta
bashtage Oct 12, 2022
e93ca54
TST: Add tests for arrays
bashtage Oct 13, 2022
9070b0d
TST: Add more scalar tests
Oct 13, 2022
28e127e
ENH: Complete Timedelta
bashtage Oct 13, 2022
f59c689
Merge remote-tracking branch 'upstream/main' into pandas-scalars
bashtage Oct 13, 2022
2f262d4
ENH: Improve Timestamp and enable tests
bashtage Oct 13, 2022
7dd8c56
TST: Add reverse ops
bashtage Oct 14, 2022
9c2d500
ENH/TST: Improve Period and its tests
bashtage Oct 14, 2022
0c2f5ca
ENH/TST: Improve Timestamp, Timedelta and their tests
bashtage Oct 14, 2022
e224537
REF: Move tests to test_scalar
bashtage Oct 14, 2022
08156cc
CLN: Final fixes to passing
bashtage Oct 14, 2022
1672751
BUG: Correct Interval
bashtage Oct 14, 2022
4c3ea13
ENH: Improve array typing
bashtage Oct 15, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions pandas-stubs/_libs/interval.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -142,10 +142,6 @@ class Interval(IntervalMixin, Generic[_OrderableT]):
def __floordiv__(self: Interval[float], y: float) -> Interval[float]: ...
def overlaps(self: Interval[_OrderableT], other: Interval[_OrderableT]) -> bool: ...

def intervals_to_interval_bounds(
intervals: np.ndarray, validate_closed: bool = ...
) -> tuple[np.ndarray, np.ndarray, str]: ...

class IntervalTree(IntervalMixin):
def __init__(
self,
Expand Down
142 changes: 110 additions & 32 deletions pandas-stubs/_libs/tslibs/period.pyi
Original file line number Diff line number Diff line change
@@ -1,34 +1,108 @@
from typing import Any
import datetime
from typing import (
Literal,
Union,
overload,
)

import numpy as np
from pandas import (
DatetimeIndex,
Index,
PeriodIndex,
Timedelta,
)
from typing_extensions import TypeAlias

from pandas._typing import npt

from .timestamps import Timestamp

class IncompatibleFrequency(ValueError): ...

class Period:
from pandas._libs.tslibs.offsets import BaseOffset

_PeriodAddSub: TypeAlias = Union[
Timedelta, datetime.timedelta, np.timedelta64, np.int64, int
]

_PeriodEqualityComparison: TypeAlias = Union[
Period, datetime.datetime, datetime.date, Timestamp, np.datetime64, int, np.int64
]

_PeriodFreqHow: TypeAlias = Literal[
"S",
"E",
"Start",
"Finish",
"Begin",
"End",
"s",
"e",
"start",
"finish",
"begin",
"end",
]

class PeriodMixin:
@property
def end_time(self) -> Timestamp: ...
@property
def start_time(self) -> Timestamp: ...

class Period(PeriodMixin):
def __init__(
self,
value: Any = ...,
freqstr: Any = ...,
ordinal: Any = ...,
year: Any = ...,
month: int = ...,
quarter: Any = ...,
day: int = ...,
hour: int = ...,
minute: int = ...,
second: int = ...,
value: Period | str | None = ...,
freq: str | BaseOffset | None = ...,
ordinal: int | None = ...,
year: int | None = ...,
month: int | None = ...,
quarter: int | None = ...,
day: int | None = ...,
hour: int | None = ...,
minute: int | None = ...,
second: int | None = ...,
) -> None: ...
def __add__(self, other) -> Period: ...
def __eq__(self, other) -> bool: ...
def __ge__(self, other) -> bool: ...
def __gt__(self, other) -> bool: ...
@overload
def __sub__(self, other: _PeriodAddSub) -> Period: ...
@overload
def __sub__(self, other: Period) -> BaseOffset: ...
@overload
def __sub__(self, other: PeriodIndex) -> Index: ...
@overload
def __add__(self, other: _PeriodAddSub) -> Period: ...
@overload
def __add__(self, other: Index) -> Period: ...
@overload # type: ignore[override]
def __eq__(self, other: _PeriodEqualityComparison) -> bool: ...
@overload
def __eq__(self, other: PeriodIndex | DatetimeIndex) -> npt.NDArray[np.bool_]: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first overload should be object. You can compare a Period to any object. But then you can't make that the first overload. So swap the order, as the first overload would return npt.NDArray[np.bool_] and the second one would be for object returning bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky here. For now I've only added types that could actually be either True or False. For always False types, can get array-like[bool] when compared to array-like or bool for scalar types.

@overload
def __ge__(self, other: Period) -> bool: ...
@overload
def __ge__(self, other: PeriodIndex) -> npt.NDArray[np.bool_]: ...
@overload
def __gt__(self, other: Period) -> bool: ...
@overload
def __gt__(self, other: PeriodIndex) -> npt.NDArray[np.bool_]: ...
def __hash__(self) -> int: ...
def __le__(self, other) -> bool: ...
def __lt__(self, other) -> bool: ...
def __new__(cls, *args, **kwargs) -> Period: ...
def __ne__(self, other) -> bool: ...
def __radd__(self, other) -> Period: ...
def __reduce__(self, *args, **kwargs) -> Any: ... # what should this be?
def __rsub__(self, other) -> Period: ...
def __setstate__(self, *args, **kwargs) -> Any: ... # what should this be?
@overload
def __le__(self, other: Period) -> bool: ...
@overload
def __le__(self, other: PeriodIndex) -> npt.NDArray[np.bool_]: ...
@overload
def __lt__(self, other: Period) -> bool: ...
@overload
def __lt__(self, other: PeriodIndex) -> npt.NDArray[np.bool_]: ...
@overload # type: ignore[override]
def __ne__(self, other: _PeriodEqualityComparison) -> bool: ...
@overload
def __ne__(self, other: PeriodIndex | DatetimeIndex) -> npt.NDArray[np.bool_]: ...
# Ignored due to indecipherable error from mypy:
# Forward operator "__add__" is not callable [misc]
def __radd__(self, other: _PeriodAddSub) -> Period: ... # type: ignore[misc]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error might be due to having conflicts between one of the __add__() declarations allowing Period. So try __radd__() with each one of the types in _PeriodAddSub to narrow down the possible cause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is because Period + Index -> PeriodIndex here but Index + Period -> Index there.

@property
def day(self) -> int: ...
@property
Expand All @@ -42,7 +116,7 @@ class Period:
@property
def end_time(self) -> Timestamp: ...
@property
def freq(self) -> Any: ...
def freq(self) -> BaseOffset: ...
@property
def freqstr(self) -> str: ...
@property
Expand Down Expand Up @@ -71,12 +145,16 @@ class Period:
def weekofyear(self) -> int: ...
@property
def year(self) -> int: ...
# Static methods
@property
def day_of_year(self) -> int: ...
@property
def day_of_week(self) -> int: ...
def asfreq(self, freq: str | BaseOffset, how: _PeriodFreqHow = ...) -> Period: ...
@classmethod
def now(cls) -> Period: ...
# Methods
def asfreq(self, freq: str, how: str = ...) -> Period: ...
def now(cls, freq: str | BaseOffset = ...) -> Period: ...
def strftime(self, fmt: str) -> str: ...
def to_timestamp(self, freq: str, how: str = ...) -> Timestamp: ...

from .timestamps import Timestamp
def to_timestamp(
self,
freq: str | BaseOffset | None = ...,
how: _PeriodFreqHow = ...,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these arguments are correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you missing a "do not" before think? Not sure how to interpret this comment since it seems to be saying I have them correct. They are all tested FWIW.

) -> Timestamp: ...
49 changes: 30 additions & 19 deletions pandas-stubs/_libs/tslibs/timedeltas.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ from datetime import timedelta
from typing import (
ClassVar,
Literal,
NamedTuple,
TypeVar,
Union,
overload,
Expand All @@ -11,11 +12,21 @@ import numpy as np
from typing_extensions import TypeAlias

from pandas._libs.tslibs import (
BaseOffset,
NaTType,
Tick,
)
from pandas._typing import npt

class Components(NamedTuple):
days: int
hours: int
minutes: int
seconds: int
milliseconds: int
microseconds: int
nanoseconds: int

# This should be kept consistent with the keys in the dict timedelta_abbrevs
# in pandas/_libs/tslibs/timedeltas.pyx
TimeDeltaUnitChoices: TypeAlias = Literal[
Expand Down Expand Up @@ -70,35 +81,32 @@ UnitChoices: TypeAlias = Union[

_S = TypeVar("_S", bound=timedelta)

def ints_to_pytimedelta(
arr: npt.NDArray[np.int64], # const int64_t[:]
box: bool = ...,
) -> npt.NDArray[np.object_]: ...
def array_to_timedelta64(
values: npt.NDArray[np.object_],
unit: str | None = ...,
errors: str = ...,
) -> np.ndarray: ... # np.ndarray[m8ns]
def parse_timedelta_unit(unit: str | None) -> UnitChoices: ...
def delta_to_nanoseconds(delta: np.timedelta64 | timedelta | Tick) -> int: ...

class Timedelta(timedelta):
min: ClassVar[Timedelta]
max: ClassVar[Timedelta]
resolution: ClassVar[Timedelta]
value: int # np.int64
value: int
def __new__(
cls: type[_S],
value=...,
unit: str = ...,
**kwargs: float | np.integer | np.floating,
value: str | int | Timedelta | timedelta | np.timedelta64 = ...,
unit: TimeDeltaUnitChoices = ...,
*,
days: float | np.integer | np.floating = ...,
seconds: float | np.integer | np.floating = ...,
microseconds: float | np.integer | np.floating = ...,
milliseconds: float | np.integer | np.floating = ...,
minutes: float | np.integer | np.floating = ...,
hours: float | np.integer | np.floating = ...,
weeks: float | np.integer | np.floating = ...,
) -> _S: ...
# GH 46171
# While Timedelta can return pd.NaT, having the constructor return
# a Union with NaTType makes things awkward for users of pandas
@property
def days(self) -> int: ...
@property
def nanoseconds(self) -> int: ...
@property
def seconds(self) -> int: ...
@property
def microseconds(self) -> int: ...
Expand All @@ -108,9 +116,9 @@ class Timedelta(timedelta):
@property
def asm8(self) -> np.timedelta64: ...
# TODO: round/floor/ceil could return NaT?
def round(self: _S, freq: str) -> _S: ...
def floor(self: _S, freq: str) -> _S: ...
def ceil(self: _S, freq: str) -> _S: ...
def round(self: _S, freq: str | BaseOffset) -> _S: ...
def floor(self: _S, freq: str | BaseOffset) -> _S: ...
def ceil(self: _S, freq: str | BaseOffset) -> _S: ...
@property
def resolution_string(self) -> str: ...
def __add__(self, other: timedelta) -> Timedelta: ...
Expand Down Expand Up @@ -154,3 +162,6 @@ class Timedelta(timedelta):
def __hash__(self) -> int: ...
def isoformat(self) -> str: ...
def to_numpy(self) -> np.timedelta64: ...
@property
def components(self) -> Components: ...
def view(self, dtype: npt.DTypeLike = ...) -> object: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs don't have dtype arg, but the implementation does, so docs should get updated.

29 changes: 14 additions & 15 deletions pandas-stubs/_libs/tslibs/timestamps.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@ from datetime import (
from time import struct_time
from typing import (
ClassVar,
Literal,
TypeVar,
overload,
)

from dateutil.tz import tzfile
import numpy as np
from pandas import Index
from pandas.core.series import (
Expand All @@ -29,19 +31,17 @@ from pandas._typing import np_ndarray_bool

_DatetimeT = TypeVar("_DatetimeT", bound=datetime)

def integer_op_not_supported(obj: object) -> TypeError: ...

class Timestamp(datetime):
min: ClassVar[Timestamp]
max: ClassVar[Timestamp]

resolution: ClassVar[Timedelta]
value: int # np.int64
value: int
def __new__(
cls: type[_DatetimeT],
ts_input: np.integer | float | str | _date | datetime | np.datetime64 = ...,
freq: int | str | BaseOffset | None = ...,
tz: str | _tzinfo | int | None = ...,
*,
tz: str | _tzinfo | tzfile | int | None = ...,
unit: str | int | None = ...,
year: int | None = ...,
month: int | None = ...,
Expand All @@ -52,8 +52,7 @@ class Timestamp(datetime):
microsecond: int | None = ...,
nanosecond: int | None = ...,
tzinfo: _tzinfo | None = ...,
*,
fold: int | None = ...,
fold: Literal[0, 1] | None = ...,
) -> _DatetimeT: ...
# GH 46171
# While Timestamp can return pd.NaT, having the constructor return
Expand Down Expand Up @@ -113,15 +112,15 @@ class Timestamp(datetime):
def timetz(self) -> _time: ...
def replace(
self,
year: int = ...,
month: int = ...,
day: int = ...,
hour: int = ...,
minute: int = ...,
second: int = ...,
microsecond: int = ...,
year: int | None = ...,
month: int | None = ...,
day: int | None = ...,
hour: int | None = ...,
minute: int | None = ...,
second: int | None = ...,
microsecond: int | None = ...,
tzinfo: _tzinfo | None = ...,
fold: int = ...,
fold: int | None = ...,
) -> Timestamp: ...
def astimezone(self: _DatetimeT, tz: _tzinfo | None = ...) -> _DatetimeT: ...
def ctime(self) -> str: ...
Expand Down
17 changes: 7 additions & 10 deletions pandas-stubs/core/arrays/boolean.pyi
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import numpy as np
from pandas.core.arrays import ExtensionArray

from pandas._typing import (
Scalar,
Expand All @@ -10,27 +11,23 @@ from pandas.core.dtypes.base import ExtensionDtype as ExtensionDtype
from .masked import BaseMaskedArray as BaseMaskedArray

class BooleanDtype(ExtensionDtype):
name: str = ...
@property
def na_value(self) -> Scalar: ...
@property
def type(self) -> type_t: ...
@property
def kind(self) -> str: ...
@classmethod
def construct_array_type(cls) -> type_t[BooleanArray]: ...
def __from_arrow__(self, array): ...

def coerce_to_array(values, mask=..., copy: bool = ...): ...

class BooleanArray(BaseMaskedArray):
def __init__(
self, values: np.ndarray, mask: np.ndarray, copy: bool = ...
) -> None: ...
def __setitem__(self, key: int | np.ndarray | slice, value: object) -> None: ...
@property
def dtype(self): ...
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): ...
def __setitem__(self, key, value) -> None: ...
def astype(self, dtype, copy: bool = ...): ...
def any(self, skipna: bool = ..., **kwargs): ...
def all(self, skipna: bool = ..., **kwargs): ...
def astype(
self, dtype: str | np.dtype, copy: bool = ...
) -> np.ndarray | ExtensionArray: ...
def any(self, skipna: bool = ..., **kwargs) -> bool: ...
def all(self, skipna: bool = ..., **kwargs) -> bool: ...
Loading