Skip to content

Commit f3e5a49

Browse files
authored
add docs/philosphy.md (#41)
* update readme, rename docs that have development instructions * add link to dev docs * remove duplicate link to docs * add philsophy docs and example to help those docs. Bump version
1 parent cb455b9 commit f3e5a49

File tree

5 files changed

+183
-8
lines changed

5 files changed

+183
-8
lines changed

README.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,55 @@ The stubs are likely incomplete in terms of covering the published API of pandas
2222
The stubs are tested with [mypy](http://mypy-lang.org/) and [pyright](https://github.com/microsoft/pyright#readme) and are currently shipped with the Visual Studio Code extension
2323
[pylance](https://github.com/microsoft/pylance-release#readme).
2424

25+
## Usage
26+
27+
Let’s take this example piece of code in file `round.py`
28+
29+
```python
30+
import pandas as pd
31+
32+
decimals = pd.DataFrame({'TSLA': 2, 'AMZN': 1})
33+
prices = pd.DataFrame(data={'date': ['2021-08-13', '2021-08-07', '2021-08-21'],
34+
'TSLA': [720.13, 716.22, 731.22], 'AMZN': [3316.50, 3200.50, 3100.23]})
35+
rounded_prices = prices.round(decimals=decimals)
36+
```
37+
38+
Mypy won't see any issues with that, but after installing pandas-stubs and running it again:
39+
40+
```sh
41+
mypy round.py
42+
```
43+
44+
we get the following error message:
45+
46+
```text
47+
round.py:6: error: Argument "decimals" to "round" of "DataFrame" has incompatible type "DataFrame"; expected "Union[int, Dict[Any, Any], Series[Any]]" [arg-type]
48+
Found 1 error in 1 file (checked 1 source file)
49+
```
50+
51+
And, if you use pyright:
52+
53+
```sh
54+
pyright round.py
55+
```
56+
57+
you get the following error message:
58+
59+
```text
60+
round.py:6:40 - error: Argument of type "DataFrame" cannot be assigned to parameter "decimals" of type "int | Dict[Unknown, Unknown] | Series[Unknown]" in function "round"
61+
  Type "DataFrame" cannot be assigned to type "int | Dict[Unknown, Unknown] | Series[Unknown]"
62+
    "DataFrame" is incompatible with "int"
63+
    "DataFrame" is incompatible with "Dict[Unknown, Unknown]"
64+
    "DataFrame" is incompatible with "Series[Unknown]" (reportGeneralTypeIssues)
65+
```
66+
67+
And after confirming with the [docs](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.round.html)
68+
we can fix the code:
69+
70+
```python
71+
decimals = pd.Series({'TSLA': 2, 'AMZN': 1})
72+
```
73+
2574
## Version Numbering Convention
2675

2776
The version number x.y.z.yymmdd corresponds to a test done with pandas version x.y.z, with the stubs released on the date mm/yy/dd.

docs/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@
33
For any update to the stubs, we require an associated test case that should fail without
44
a proposed change, and works with a proposed change. See <https://github.com/pandas-dev/pandas-stubs/tree/main/tests/> for examples.
55

6+
The stubs are developed with a certain [philosophy](philosophy.md) that should be
7+
understood by developers proposing changes to the stubs.
8+
69
Instructions for working with the code are found here:
710

811
- [How to set up the environment](setup.md)

docs/philosophy.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# pandas-stubs Type Checking Philosophy
2+
3+
The goal of the pandas-stubs project is to provide type stubs for the public API
4+
that represent the recommended ways of using pandas. This is opposed to the
5+
philosophy within the pandas source, as described [here](https://pandas.pydata.org/docs/development/contributing_codebase.html?highlight=typing#type-hints), which
6+
is to assist with the development of the pandas source code to ensure type safety within
7+
that source.
8+
9+
Due to the methodology used by Microsoft to develop the original stubs, there are internal
10+
classes, methods and functions that are annotated within the pandas-stubs project
11+
that are incorrect with respect to the pandas source, but that have no effect on type
12+
checking user code that calls the public API.
13+
14+
## Use of Generic Types
15+
16+
There are other differences that are extensions of the pandas API to assist in type
17+
checking. Two key examples are that `Series` and `Interval` are typed as generic types.
18+
19+
### Series are Generic
20+
21+
`Series` is declared as `Series[S1]` where `S1` is a `TypeVar` consisting of types normally
22+
used within series, if that type can be inferred. Consider the following example
23+
that compares the values in a `Series` to an integer.
24+
25+
```python
26+
s = pd.Series([1, 2, 3])
27+
lt = s < 3
28+
```
29+
30+
In the pandas source, `lt` is a `Series` with a `dtype` of `bool`. In the pandas-stubs,
31+
the type of `lt` is `Series[bool]`. This allows further type checking to occur in other
32+
pandas methods. Note that in the above example, `s` is typed as `Series[Any]` because
33+
its type cannot be statically inferred.
34+
35+
This also allows type checking for operations on series that contain date/time data. Consider
36+
the following example that creates two series of datetimes with corresponding arithmetic.
37+
38+
```python
39+
s1 = pd.Series(pd.to_datetime(["2022-05-01", "2022-06-01"]))
40+
reveal_type(s1)
41+
s2 = pd.Series(pd.to_datetime(["2022-05-15", "2022-06-15"]))
42+
reveal_type(s2)
43+
td = s1 - s2
44+
reveal_type(td)
45+
ssum = s1 + s2
46+
reveal_type(ssum)
47+
```
48+
49+
The above code (without the `reveal_type()` statements) will raise an `Exception` on the computation of `ssum` because it is
50+
inappropriate to add two series containing `Timestamp` values. The types will be
51+
revealed as follows:
52+
53+
```text
54+
ttest.py:4: note: Revealed type is "pandas.core.series.TimestampSeries"
55+
ttest.py:6: note: Revealed type is "pandas.core.series.TimestampSeries"
56+
ttest.py:8: note: Revealed type is "pandas.core.series.TimedeltaSeries"
57+
ttest.py:10: note: Revealed type is "builtins.Exception"
58+
```
59+
60+
The type `TimestampSeries` is the result of creating a series from `pd.to_datetime()`, while
61+
the type `TimedeltaSeries` is the result of subtracting two `TimestampSeries` as well as
62+
the result of `pd.to_timedelta()`.
63+
64+
### Interval is Generic
65+
66+
A pandas `Interval` can be a time interval, an interval of integers, or an interval of
67+
time, represented as having endpoints with the `Timestamp` class. pandas-stubs tracks
68+
the type of an `Interval`, based on the arguments passed to the `Interval` constructor.
69+
This allows detecting inappropriate operations, such as adding an integer to an
70+
interval of `Timestamp`s.
71+
72+
## Testing the Type Stubs
73+
74+
A set of (most likely incomplete) tests for testing the type stubs is in the pandas-stubs
75+
repository in the `tests` directory. The tests are used with `mypy` and `pyright` to
76+
validate correct typing, and also with `pytest` to validate that the provided code
77+
actually executes. The recent decision for Python 3.11 to include `assert_type()`,
78+
which is supported by `typing_extensions` version 4.2 and beyond makes it easier
79+
to test to validate the return types of functions and methods. Future work
80+
is intended to expand the use of `assert_type()` in the test code.
81+
82+
## Narrow vs. Wide Arguments
83+
84+
A consideration in creating stubs is too make the set of type annotations for
85+
function arguments "just right", i.e.,
86+
not too narrow and not too wide. A type annotation to an argument to a function or
87+
method is too narrow if it disallows valid arguments. A type annotation to
88+
an argument to a function or method is too wide if
89+
it allows invalid arguments. Testing for type annotations that are too narrow is rather
90+
straightforward. It is easy to create an example for which the type checker indicates
91+
the argument is incorrect, and add it to the set of tests in the pandas-stubs
92+
repository after fixing the appropriate stub. However, testing for when type annotations
93+
are too wide is a bit more complicated.
94+
In this case, the test will fail when using `pytest`, but it is also desirable to
95+
have type checkers report errors for code that is expected to fail type checking.
96+
97+
Here is an example that illustrates this concept, from `tests/test_interval.py`:
98+
99+
```python
100+
i1 = pd.Interval(
101+
pd.Timestamp("2000-01-01"), pd.Timestamp("2000-01-03"), closed="both"
102+
)
103+
if TYPE_CHECKING:
104+
i1 + pd.Timestamp("2000-03-03") # type: ignore
105+
106+
```
107+
108+
In this particular example, the stubs consider that `i1` will have the type
109+
`pd.Interval[pd.Timestamp]`. It is incorrect code to add a `Timestamp` to a
110+
time-based interval. Without the `if TYPE_CHECKING` construct, the code would fail.
111+
However, it is also desirable to have the type checker pick up this failure, and by
112+
placing the `# type: ignore` on the line, an indication is made to the type checker
113+
that we expect this line to not pass the type checker.

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "pandas-stubs"
3-
version = "1.4.2.220622"
3+
version = "1.4.2.220626"
44
description = "Type annotations for pandas"
55
authors = ["The Pandas Development Team <[email protected]>"]
66
license = "BSD-3-Clause"

tests/test_timefuncs.py

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -47,16 +47,17 @@ def test_types_comparison() -> None:
4747

4848

4949
def test_types_timestamp_series_comparisons() -> None:
50-
#GH 27
51-
df = pd.DataFrame(['2020-01-01','2019-01-01'])
52-
tss = pd.to_datetime(df[0], format = '%Y-%m-%d')
53-
ts = pd.to_datetime('2019-02-01', format = '%Y-%m-%d')
50+
# GH 27
51+
df = pd.DataFrame(["2020-01-01", "2019-01-01"])
52+
tss = pd.to_datetime(df[0], format="%Y-%m-%d")
53+
ts = pd.to_datetime("2019-02-01", format="%Y-%m-%d")
5454
tssr = tss <= ts
5555
tssr2 = tss >= ts
5656
tssr3 = tss == ts
57-
assert_type(tssr,'pd.Series[bool]')
58-
assert_type(tssr2,'pd.Series[bool]')
59-
assert_type(tssr3,'pd.Series[bool]')
57+
assert_type(tssr, "pd.Series[bool]")
58+
assert_type(tssr2, "pd.Series[bool]")
59+
assert_type(tssr3, "pd.Series[bool]")
60+
6061

6162
def test_types_pydatetime() -> None:
6263
ts: pd.Timestamp = pd.Timestamp("2021-03-01T12")
@@ -166,3 +167,12 @@ def test_iso_calendar() -> None:
166167
# GH 31
167168
dates = pd.date_range(start="2012-01-01", end="2019-12-31", freq="W-MON")
168169
dates.isocalendar()
170+
171+
172+
def fail_on_adding_two_timestamps() -> None:
173+
s1 = pd.Series(pd.to_datetime(["2022-05-01", "2022-06-01"]))
174+
s2 = pd.Series(pd.to_datetime(["2022-05-15", "2022-06-15"]))
175+
if TYPE_CHECKING:
176+
ssum: pd.Series = s1 + s2 # type: ignore
177+
ts = pd.Timestamp("2022-06-30")
178+
tsum: pd.Series = s1 + ts # type: ignore

0 commit comments

Comments
 (0)