Skip to content

Commit 59c0043

Browse files
authored
Merge branch 'master' into dev23122
2 parents 6566435 + d228a78 commit 59c0043

File tree

266 files changed

+6377
-4280
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

266 files changed

+6377
-4280
lines changed

.github/ISSUE_TEMPLATE/bug_report.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ body:
77
- type: checkboxes
88
id: checks
99
attributes:
10+
label: Pandas version checks
1011
options:
1112
- label: >
1213
I have checked that this issue has not already been reported.

.github/ISSUE_TEMPLATE/documentation_improvement.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ labels: [Docs, Needs Triage]
66
body:
77
- type: checkboxes
88
attributes:
9+
label: Pandas version checks
910
options:
1011
- label: >
1112
I have checked that the issue still exists on the latest versions of the docs

.github/ISSUE_TEMPLATE/installation_issue.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ body:
77
- type: checkboxes
88
id: checks
99
attributes:
10+
label: Installation check
1011
options:
1112
- label: >
1213
I have read the [installation guide](https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html#installing-pandas).

.github/ISSUE_TEMPLATE/performance_issue.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ body:
77
- type: checkboxes
88
id: checks
99
attributes:
10+
label: Pandas version checks
1011
options:
1112
- label: >
1213
I have checked that this issue has not already been reported.

.github/ISSUE_TEMPLATE/submit_question.yml

+1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ body:
1111
usage questions, we ask that all usage questions are first asked on StackOverflow.
1212
- type: checkboxes
1313
attributes:
14+
label: Research
1415
options:
1516
- label: >
1617
I have searched the [[pandas] tag](https://stackoverflow.com/questions/tagged/pandas)

.github/workflows/ci.yml

+34
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,40 @@ jobs:
7878
run: pytest scripts
7979
if: always()
8080

81+
benchmarks:
82+
name: Benchmarks
83+
runs-on: ubuntu-latest
84+
defaults:
85+
run:
86+
shell: bash -l {0}
87+
88+
concurrency:
89+
# https://github.community/t/concurrecy-not-work-for-push/183068/7
90+
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-benchmarks
91+
cancel-in-progress: true
92+
93+
steps:
94+
- name: Checkout
95+
uses: actions/checkout@v2
96+
with:
97+
fetch-depth: 0
98+
99+
- name: Cache conda
100+
uses: actions/cache@v2
101+
with:
102+
path: ~/conda_pkgs_dir
103+
key: ${{ runner.os }}-conda-${{ hashFiles('${{ env.ENV_FILE }}') }}
104+
105+
- uses: conda-incubator/setup-miniconda@v2
106+
with:
107+
activate-environment: pandas-dev
108+
channel-priority: strict
109+
environment-file: ${{ env.ENV_FILE }}
110+
use-only-tar-bz2: true
111+
112+
- name: Build Pandas
113+
uses: ./.github/actions/build_pandas
114+
81115
- name: Running benchmarks
82116
run: |
83117
cd asv_bench

.github/workflows/posix.yml

+21-1
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,12 @@ jobs:
3131
[actions-38-slow.yaml, "slow", "", "", "", "", ""],
3232
[actions-38-locale.yaml, "not slow and not network", "language-pack-zh-hans xsel", "zh_CN.utf8", "zh_CN.utf8", "", ""],
3333
[actions-39-slow.yaml, "slow", "", "", "", "", ""],
34+
[actions-pypy-38.yaml, "not slow and not clipboard", "", "", "", "", "--max-worker-restart 0"],
3435
[actions-39-numpydev.yaml, "not slow and not network", "xsel", "", "", "deprecate", "-W error"],
3536
[actions-39.yaml, "not slow and not clipboard", "", "", "", "", ""]
3637
]
3738
fail-fast: false
3839
env:
39-
COVERAGE: true
4040
ENV_FILE: ci/deps/${{ matrix.settings[0] }}
4141
PATTERN: ${{ matrix.settings[1] }}
4242
EXTRA_APT: ${{ matrix.settings[2] }}
@@ -45,6 +45,9 @@ jobs:
4545
PANDAS_TESTING_MODE: ${{ matrix.settings[5] }}
4646
TEST_ARGS: ${{ matrix.settings[6] }}
4747
PYTEST_TARGET: pandas
48+
IS_PYPY: ${{ contains(matrix.settings[0], 'pypy') }}
49+
# TODO: re-enable coverage on pypy, its slow
50+
COVERAGE: ${{ !contains(matrix.settings[0], 'pypy') }}
4851
concurrency:
4952
# https://github.community/t/concurrecy-not-work-for-push/183068/7
5053
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-${{ matrix.settings[0] }}
@@ -82,12 +85,29 @@ jobs:
8285
channel-priority: flexible
8386
environment-file: ${{ env.ENV_FILE }}
8487
use-only-tar-bz2: true
88+
if: ${{ env.IS_PYPY == 'false' }} # No pypy3.8 support
89+
90+
- name: Setup PyPy
91+
uses: actions/[email protected]
92+
with:
93+
python-version: "pypy-3.8"
94+
if: ${{ env.IS_PYPY == 'true' }}
95+
96+
- name: Setup PyPy dependencies
97+
shell: bash
98+
run: |
99+
# TODO: re-enable cov, its slowing the tests down though
100+
# TODO: Unpin Cython, the new Cython 0.29.26 is causing compilation errors
101+
pip install Cython==0.29.25 numpy python-dateutil pytz pytest>=6.0 pytest-xdist>=1.31.0 hypothesis>=5.5.3
102+
if: ${{ env.IS_PYPY == 'true' }}
85103

86104
- name: Build Pandas
87105
uses: ./.github/actions/build_pandas
88106

89107
- name: Test
90108
run: ci/run_tests.sh
109+
# TODO: Don't continue on error for PyPy
110+
continue-on-error: ${{ env.IS_PYPY == 'true' }}
91111
if: always()
92112

93113
- name: Build Version

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@ dist
5050
*.egg-info
5151
.eggs
5252
.pypirc
53+
# type checkers
54+
pandas/py.typed
5355

5456
# tox testing tool
5557
.tox

asv_bench/benchmarks/arithmetic.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ def setup(self, op, shape):
144144
# should already be the case, but just to be sure
145145
df._consolidate_inplace()
146146

147-
# TODO: GH#33198 the setting here shoudlnt need two steps
147+
# TODO: GH#33198 the setting here shouldn't need two steps
148148
arr1 = np.random.randn(n_rows, max(n_cols // 4, 3)).astype("f8")
149149
arr2 = np.random.randn(n_rows, n_cols // 2).astype("i8")
150150
arr3 = np.random.randn(n_rows, n_cols // 4).astype("f8")

asv_bench/benchmarks/io/csv.py

+35
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,26 @@ def time_frame(self, kind):
5555
self.df.to_csv(self.fname)
5656

5757

58+
class ToCSVMultiIndexUnusedLevels(BaseIO):
59+
60+
fname = "__test__.csv"
61+
62+
def setup(self):
63+
df = DataFrame({"a": np.random.randn(100_000), "b": 1, "c": 1})
64+
self.df = df.set_index(["a", "b"])
65+
self.df_unused_levels = self.df.iloc[:10_000]
66+
self.df_single_index = df.set_index(["a"]).iloc[:10_000]
67+
68+
def time_full_frame(self):
69+
self.df.to_csv(self.fname)
70+
71+
def time_sliced_frame(self):
72+
self.df_unused_levels.to_csv(self.fname)
73+
74+
def time_single_index_frame(self):
75+
self.df_single_index.to_csv(self.fname)
76+
77+
5878
class ToCSVDatetime(BaseIO):
5979

6080
fname = "__test__.csv"
@@ -67,6 +87,21 @@ def time_frame_date_formatting(self):
6787
self.data.to_csv(self.fname, date_format="%Y%m%d")
6888

6989

90+
class ToCSVDatetimeIndex(BaseIO):
91+
92+
fname = "__test__.csv"
93+
94+
def setup(self):
95+
rng = date_range("2000", periods=100_000, freq="S")
96+
self.data = DataFrame({"a": 1}, index=rng)
97+
98+
def time_frame_date_formatting_index(self):
99+
self.data.to_csv(self.fname, date_format="%Y-%m-%d %H:%M:%S")
100+
101+
def time_frame_date_no_format_index(self):
102+
self.data.to_csv(self.fname)
103+
104+
70105
class ToCSVDatetimeBig(BaseIO):
71106

72107
fname = "__test__.csv"

ci/deps/actions-38-db.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ dependencies:
1212
- pytest-cov>=2.10.1 # this is only needed in the coverage build, ref: GH 35737
1313

1414
# pandas dependencies
15-
- aiobotocore<2.0.0
15+
- aiobotocore<2.0.0 # GH#44311 pinned to fix docbuild
1616
- beautifulsoup4
1717
- boto3
1818
- botocore>=1.11

ci/deps/actions-pypy-38.yaml

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
name: pandas-dev
2+
channels:
3+
- conda-forge
4+
dependencies:
5+
# TODO: Add the rest of the dependencies in here
6+
# once the other plentiful failures/segfaults
7+
# with base pandas has been dealt with
8+
- python=3.8[build=*_pypy] # TODO: use this once pypy3.8 is available
9+
10+
# tools
11+
- cython>=0.29.24
12+
- pytest>=6.0
13+
- pytest-cov
14+
- pytest-xdist>=1.31
15+
- hypothesis>=5.5.3
16+
17+
# required
18+
- numpy
19+
- python-dateutil
20+
- pytz

ci/run_tests.sh

+6-1
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,17 @@
55
# https://github.com/pytest-dev/pytest/issues/1075
66
export PYTHONHASHSEED=$(python -c 'import random; print(random.randint(1, 4294967295))')
77

8+
# May help reproduce flaky CI builds if set in subsequent runs
9+
echo PYTHONHASHSEED=$PYTHONHASHSEED
10+
811
if [[ "not network" == *"$PATTERN"* ]]; then
912
export http_proxy=http://1.2.3.4 https_proxy=http://1.2.3.4;
1013
fi
1114

12-
if [ "$COVERAGE" ]; then
15+
if [[ "$COVERAGE" == "true" ]]; then
1316
COVERAGE="-s --cov=pandas --cov-report=xml --cov-append"
17+
else
18+
COVERAGE="" # We need to reset this for COVERAGE="false" case
1419
fi
1520

1621
# If no X server is found, we use xvfb to emulate it

doc/source/development/contributing_codebase.rst

+36-6
Original file line numberDiff line numberDiff line change
@@ -303,7 +303,7 @@ pandas strongly encourages the use of :pep:`484` style type hints. New developme
303303
Style guidelines
304304
~~~~~~~~~~~~~~~~
305305

306-
Types imports should follow the ``from typing import ...`` convention. So rather than
306+
Type imports should follow the ``from typing import ...`` convention. Some types do not need to be imported since :pep:`585` some builtin constructs, such as ``list`` and ``tuple``, can directly be used for type annotations. So rather than
307307

308308
.. code-block:: python
309309
@@ -315,21 +315,31 @@ You should write
315315

316316
.. code-block:: python
317317
318-
from typing import List, Optional, Union
318+
primes: list[int] = []
319319
320-
primes: List[int] = []
320+
``Optional`` should be avoided in favor of the shorter ``| None``, so instead of
321321

322-
``Optional`` should be used where applicable, so instead of
322+
.. code-block:: python
323+
324+
from typing import Union
325+
326+
maybe_primes: list[Union[int, None]] = []
327+
328+
or
323329

324330
.. code-block:: python
325331
326-
maybe_primes: List[Union[int, None]] = []
332+
from typing import Optional
333+
334+
maybe_primes: list[Optional[int]] = []
327335
328336
You should write
329337

330338
.. code-block:: python
331339
332-
maybe_primes: List[Optional[int]] = []
340+
from __future__ import annotations # noqa: F404
341+
342+
maybe_primes: list[int | None] = []
333343
334344
In some cases in the code base classes may define class variables that shadow builtins. This causes an issue as described in `Mypy 1775 <https://github.com/python/mypy/issues/1775#issuecomment-310969854>`_. The defensive solution here is to create an unambiguous alias of the builtin and use that without your annotation. For example, if you come across a definition like
335345

@@ -410,6 +420,26 @@ A recent version of ``numpy`` (>=1.21.0) is required for type validation.
410420

411421
.. _contributing.ci:
412422

423+
Testing type hints in code using pandas
424+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
425+
426+
.. warning::
427+
428+
* Pandas is not yet a py.typed library (:pep:`561`)!
429+
The primary purpose of locally declaring pandas as a py.typed library is to test and
430+
improve the pandas-builtin type annotations.
431+
432+
Until pandas becomes a py.typed library, it is possible to easily experiment with the type
433+
annotations shipped with pandas by creating an empty file named "py.typed" in the pandas
434+
installation folder:
435+
436+
.. code-block:: none
437+
438+
python -c "import pandas; import pathlib; (pathlib.Path(pandas.__path__[0]) / 'py.typed').touch()"
439+
440+
The existence of the py.typed file signals to type checkers that pandas is already a py.typed
441+
library. This makes type checkers aware of the type annotations shipped with pandas.
442+
413443
Testing with continuous integration
414444
-----------------------------------
415445

doc/source/development/developer.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ As an example of fully-formed metadata:
180180
'numpy_type': 'int64',
181181
'metadata': None}
182182
],
183-
'pandas_version': '0.20.0',
183+
'pandas_version': '1.4.0',
184184
'creator': {
185185
'library': 'pyarrow',
186186
'version': '0.13.0'

doc/source/reference/groupby.rst

+1
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ application to columns of a specific data type.
122122
DataFrameGroupBy.skew
123123
DataFrameGroupBy.take
124124
DataFrameGroupBy.tshift
125+
DataFrameGroupBy.value_counts
125126

126127
The following methods are available only for ``SeriesGroupBy`` objects.
127128

doc/source/user_guide/io.rst

+11-1
Original file line numberDiff line numberDiff line change
@@ -1903,6 +1903,7 @@ with optional parameters:
19031903
``index``; dict like {index -> {column -> value}}
19041904
``columns``; dict like {column -> {index -> value}}
19051905
``values``; just the values array
1906+
``table``; adhering to the JSON `Table Schema`_
19061907

19071908
* ``date_format`` : string, type of date conversion, 'epoch' for timestamp, 'iso' for ISO8601.
19081909
* ``double_precision`` : The number of decimal places to use when encoding floating point values, default 10.
@@ -2477,7 +2478,6 @@ A few notes on the generated table schema:
24772478
* For ``MultiIndex``, ``mi.names`` is used. If any level has no name,
24782479
then ``level_<i>`` is used.
24792480

2480-
24812481
``read_json`` also accepts ``orient='table'`` as an argument. This allows for
24822482
the preservation of metadata such as dtypes and index names in a
24832483
round-trippable manner.
@@ -2519,8 +2519,18 @@ indicate missing values and the subsequent read cannot distinguish the intent.
25192519
25202520
os.remove("test.json")
25212521
2522+
When using ``orient='table'`` along with user-defined ``ExtensionArray``,
2523+
the generated schema will contain an additional ``extDtype`` key in the respective
2524+
``fields`` element. This extra key is not standard but does enable JSON roundtrips
2525+
for extension types (e.g. ``read_json(df.to_json(orient="table"), orient="table")``).
2526+
2527+
The ``extDtype`` key carries the name of the extension, if you have properly registered
2528+
the ``ExtensionDtype``, pandas will use said name to perform a lookup into the registry
2529+
and re-convert the serialized data into your custom dtype.
2530+
25222531
.. _Table Schema: https://specs.frictionlessdata.io/table-schema/
25232532

2533+
25242534
HTML
25252535
----
25262536

doc/source/user_guide/timeseries.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -2424,7 +2424,7 @@ you can use the ``tz_convert`` method.
24242424

24252425
For ``pytz`` time zones, it is incorrect to pass a time zone object directly into
24262426
the ``datetime.datetime`` constructor
2427-
(e.g., ``datetime.datetime(2011, 1, 1, tz=pytz.timezone('US/Eastern'))``.
2427+
(e.g., ``datetime.datetime(2011, 1, 1, tzinfo=pytz.timezone('US/Eastern'))``.
24282428
Instead, the datetime needs to be localized using the ``localize`` method
24292429
on the ``pytz`` time zone object.
24302430

0 commit comments

Comments
 (0)