Skip to content

Commit abced04

Browse files
Merge remote-tracking branch 'upstream/master' into bisect
2 parents 74d8a65 + 2ab1d1f commit abced04

File tree

92 files changed

+3301
-1927
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

92 files changed

+3301
-1927
lines changed

.github/workflows/ci.yml

+34
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,40 @@ jobs:
7878
run: pytest scripts
7979
if: always()
8080

81+
benchmarks:
82+
name: Benchmarks
83+
runs-on: ubuntu-latest
84+
defaults:
85+
run:
86+
shell: bash -l {0}
87+
88+
concurrency:
89+
# https://github.community/t/concurrecy-not-work-for-push/183068/7
90+
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-benchmarks
91+
cancel-in-progress: true
92+
93+
steps:
94+
- name: Checkout
95+
uses: actions/checkout@v2
96+
with:
97+
fetch-depth: 0
98+
99+
- name: Cache conda
100+
uses: actions/cache@v2
101+
with:
102+
path: ~/conda_pkgs_dir
103+
key: ${{ runner.os }}-conda-${{ hashFiles('${{ env.ENV_FILE }}') }}
104+
105+
- uses: conda-incubator/setup-miniconda@v2
106+
with:
107+
activate-environment: pandas-dev
108+
channel-priority: strict
109+
environment-file: ${{ env.ENV_FILE }}
110+
use-only-tar-bz2: true
111+
112+
- name: Build Pandas
113+
uses: ./.github/actions/build_pandas
114+
81115
- name: Running benchmarks
82116
run: |
83117
cd asv_bench

.github/workflows/posix.yml

+21-2
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,12 @@ jobs:
3131
[actions-38-slow.yaml, "slow", "", "", "", "", ""],
3232
[actions-38-locale.yaml, "not slow and not network", "language-pack-zh-hans xsel", "zh_CN.utf8", "zh_CN.utf8", "", ""],
3333
[actions-39-slow.yaml, "slow", "", "", "", "", ""],
34-
[actions-pypy-38.yaml, "not slow and not clipboard", "", "", "", "", ""],
34+
[actions-pypy-38.yaml, "not slow and not clipboard", "", "", "", "", "--max-worker-restart 0"],
3535
[actions-39-numpydev.yaml, "not slow and not network", "xsel", "", "", "deprecate", "-W error"],
3636
[actions-39.yaml, "not slow and not clipboard", "", "", "", "", ""]
3737
]
3838
fail-fast: false
3939
env:
40-
COVERAGE: true
4140
ENV_FILE: ci/deps/${{ matrix.settings[0] }}
4241
PATTERN: ${{ matrix.settings[1] }}
4342
EXTRA_APT: ${{ matrix.settings[2] }}
@@ -46,6 +45,9 @@ jobs:
4645
PANDAS_TESTING_MODE: ${{ matrix.settings[5] }}
4746
TEST_ARGS: ${{ matrix.settings[6] }}
4847
PYTEST_TARGET: pandas
48+
IS_PYPY: ${{ contains(matrix.settings[0], 'pypy') }}
49+
# TODO: re-enable coverage on pypy, its slow
50+
COVERAGE: ${{ !contains(matrix.settings[0], 'pypy') }}
4951
concurrency:
5052
# https://github.community/t/concurrecy-not-work-for-push/183068/7
5153
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-${{ matrix.settings[0] }}
@@ -83,12 +85,29 @@ jobs:
8385
channel-priority: flexible
8486
environment-file: ${{ env.ENV_FILE }}
8587
use-only-tar-bz2: true
88+
if: ${{ env.IS_PYPY == 'false' }} # No pypy3.8 support
89+
90+
- name: Setup PyPy
91+
uses: actions/[email protected]
92+
with:
93+
python-version: "pypy-3.8"
94+
if: ${{ env.IS_PYPY == 'true' }}
95+
96+
- name: Setup PyPy dependencies
97+
shell: bash
98+
run: |
99+
# TODO: re-enable cov, its slowing the tests down though
100+
# TODO: Unpin Cython, the new Cython 0.29.26 is causing compilation errors
101+
pip install Cython==0.29.25 numpy python-dateutil pytz pytest>=6.0 pytest-xdist>=1.31.0 hypothesis>=5.5.3
102+
if: ${{ env.IS_PYPY == 'true' }}
86103

87104
- name: Build Pandas
88105
uses: ./.github/actions/build_pandas
89106

90107
- name: Test
91108
run: ci/run_tests.sh
109+
# TODO: Don't continue on error for PyPy
110+
continue-on-error: ${{ env.IS_PYPY == 'true' }}
92111
if: always()
93112

94113
- name: Build Version

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@ dist
5050
*.egg-info
5151
.eggs
5252
.pypirc
53+
# type checkers
54+
pandas/py.typed
5355

5456
# tox testing tool
5557
.tox

asv_bench/benchmarks/io/csv.py

+23
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,26 @@ def time_frame(self, kind):
5555
self.df.to_csv(self.fname)
5656

5757

58+
class ToCSVMultiIndexUnusedLevels(BaseIO):
59+
60+
fname = "__test__.csv"
61+
62+
def setup(self):
63+
df = DataFrame({"a": np.random.randn(100_000), "b": 1, "c": 1})
64+
self.df = df.set_index(["a", "b"])
65+
self.df_unused_levels = self.df.iloc[:10_000]
66+
self.df_single_index = df.set_index(["a"]).iloc[:10_000]
67+
68+
def time_full_frame(self):
69+
self.df.to_csv(self.fname)
70+
71+
def time_sliced_frame(self):
72+
self.df_unused_levels.to_csv(self.fname)
73+
74+
def time_single_index_frame(self):
75+
self.df_single_index.to_csv(self.fname)
76+
77+
5878
class ToCSVDatetime(BaseIO):
5979

6080
fname = "__test__.csv"
@@ -78,6 +98,9 @@ def setup(self):
7898
def time_frame_date_formatting_index(self):
7999
self.data.to_csv(self.fname, date_format="%Y-%m-%d %H:%M:%S")
80100

101+
def time_frame_date_no_format_index(self):
102+
self.data.to_csv(self.fname)
103+
81104

82105
class ToCSVDatetimeBig(BaseIO):
83106

ci/deps/actions-pypy-38.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ dependencies:
55
# TODO: Add the rest of the dependencies in here
66
# once the other plentiful failures/segfaults
77
# with base pandas has been dealt with
8-
- python=3.8
8+
- python=3.8[build=*_pypy] # TODO: use this once pypy3.8 is available
99

1010
# tools
1111
- cython>=0.29.24

ci/run_tests.sh

+3-1
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,10 @@ if [[ "not network" == *"$PATTERN"* ]]; then
1212
export http_proxy=http://1.2.3.4 https_proxy=http://1.2.3.4;
1313
fi
1414

15-
if [ "$COVERAGE" ]; then
15+
if [[ "$COVERAGE" == "true" ]]; then
1616
COVERAGE="-s --cov=pandas --cov-report=xml --cov-append"
17+
else
18+
COVERAGE="" # We need to reset this for COVERAGE="false" case
1719
fi
1820

1921
# If no X server is found, we use xvfb to emulate it

doc/source/development/contributing_codebase.rst

+36-6
Original file line numberDiff line numberDiff line change
@@ -303,7 +303,7 @@ pandas strongly encourages the use of :pep:`484` style type hints. New developme
303303
Style guidelines
304304
~~~~~~~~~~~~~~~~
305305

306-
Types imports should follow the ``from typing import ...`` convention. So rather than
306+
Type imports should follow the ``from typing import ...`` convention. Some types do not need to be imported since :pep:`585` some builtin constructs, such as ``list`` and ``tuple``, can directly be used for type annotations. So rather than
307307

308308
.. code-block:: python
309309
@@ -315,21 +315,31 @@ You should write
315315

316316
.. code-block:: python
317317
318-
from typing import List, Optional, Union
318+
primes: list[int] = []
319319
320-
primes: List[int] = []
320+
``Optional`` should be avoided in favor of the shorter ``| None``, so instead of
321321

322-
``Optional`` should be used where applicable, so instead of
322+
.. code-block:: python
323+
324+
from typing import Union
325+
326+
maybe_primes: list[Union[int, None]] = []
327+
328+
or
323329

324330
.. code-block:: python
325331
326-
maybe_primes: List[Union[int, None]] = []
332+
from typing import Optional
333+
334+
maybe_primes: list[Optional[int]] = []
327335
328336
You should write
329337

330338
.. code-block:: python
331339
332-
maybe_primes: List[Optional[int]] = []
340+
from __future__ import annotations # noqa: F404
341+
342+
maybe_primes: list[int | None] = []
333343
334344
In some cases in the code base classes may define class variables that shadow builtins. This causes an issue as described in `Mypy 1775 <https://github.com/python/mypy/issues/1775#issuecomment-310969854>`_. The defensive solution here is to create an unambiguous alias of the builtin and use that without your annotation. For example, if you come across a definition like
335345

@@ -410,6 +420,26 @@ A recent version of ``numpy`` (>=1.21.0) is required for type validation.
410420

411421
.. _contributing.ci:
412422

423+
Testing type hints in code using pandas
424+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
425+
426+
.. warning::
427+
428+
* Pandas is not yet a py.typed library (:pep:`561`)!
429+
The primary purpose of locally declaring pandas as a py.typed library is to test and
430+
improve the pandas-builtin type annotations.
431+
432+
Until pandas becomes a py.typed library, it is possible to easily experiment with the type
433+
annotations shipped with pandas by creating an empty file named "py.typed" in the pandas
434+
installation folder:
435+
436+
.. code-block:: none
437+
438+
python -c "import pandas; import pathlib; (pathlib.Path(pandas.__path__[0]) / 'py.typed').touch()"
439+
440+
The existence of the py.typed file signals to type checkers that pandas is already a py.typed
441+
library. This makes type checkers aware of the type annotations shipped with pandas.
442+
413443
Testing with continuous integration
414444
-----------------------------------
415445

doc/source/development/developer.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ As an example of fully-formed metadata:
180180
'numpy_type': 'int64',
181181
'metadata': None}
182182
],
183-
'pandas_version': '0.20.0',
183+
'pandas_version': '1.4.0',
184184
'creator': {
185185
'library': 'pyarrow',
186186
'version': '0.13.0'

doc/source/reference/groupby.rst

+1
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ application to columns of a specific data type.
122122
DataFrameGroupBy.skew
123123
DataFrameGroupBy.take
124124
DataFrameGroupBy.tshift
125+
DataFrameGroupBy.value_counts
125126

126127
The following methods are available only for ``SeriesGroupBy`` objects.
127128

doc/source/user_guide/io.rst

+11-1
Original file line numberDiff line numberDiff line change
@@ -1903,6 +1903,7 @@ with optional parameters:
19031903
``index``; dict like {index -> {column -> value}}
19041904
``columns``; dict like {column -> {index -> value}}
19051905
``values``; just the values array
1906+
``table``; adhering to the JSON `Table Schema`_
19061907

19071908
* ``date_format`` : string, type of date conversion, 'epoch' for timestamp, 'iso' for ISO8601.
19081909
* ``double_precision`` : The number of decimal places to use when encoding floating point values, default 10.
@@ -2477,7 +2478,6 @@ A few notes on the generated table schema:
24772478
* For ``MultiIndex``, ``mi.names`` is used. If any level has no name,
24782479
then ``level_<i>`` is used.
24792480

2480-
24812481
``read_json`` also accepts ``orient='table'`` as an argument. This allows for
24822482
the preservation of metadata such as dtypes and index names in a
24832483
round-trippable manner.
@@ -2519,8 +2519,18 @@ indicate missing values and the subsequent read cannot distinguish the intent.
25192519
25202520
os.remove("test.json")
25212521
2522+
When using ``orient='table'`` along with user-defined ``ExtensionArray``,
2523+
the generated schema will contain an additional ``extDtype`` key in the respective
2524+
``fields`` element. This extra key is not standard but does enable JSON roundtrips
2525+
for extension types (e.g. ``read_json(df.to_json(orient="table"), orient="table")``).
2526+
2527+
The ``extDtype`` key carries the name of the extension, if you have properly registered
2528+
the ``ExtensionDtype``, pandas will use said name to perform a lookup into the registry
2529+
and re-convert the serialized data into your custom dtype.
2530+
25222531
.. _Table Schema: https://specs.frictionlessdata.io/table-schema/
25232532

2533+
25242534
HTML
25252535
----
25262536

doc/source/user_guide/timeseries.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -2424,7 +2424,7 @@ you can use the ``tz_convert`` method.
24242424

24252425
For ``pytz`` time zones, it is incorrect to pass a time zone object directly into
24262426
the ``datetime.datetime`` constructor
2427-
(e.g., ``datetime.datetime(2011, 1, 1, tz=pytz.timezone('US/Eastern'))``.
2427+
(e.g., ``datetime.datetime(2011, 1, 1, tzinfo=pytz.timezone('US/Eastern'))``.
24282428
Instead, the datetime needs to be localized using the ``localize`` method
24292429
on the ``pytz`` time zone object.
24302430

0 commit comments

Comments
 (0)