Skip to content

Commit 5350830

Browse files
Merge remote-tracking branch 'upstream/master' into bisect
2 parents a01b363 + 9a21c3c commit 5350830

File tree

205 files changed

+5400
-3063
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

205 files changed

+5400
-3063
lines changed

.circleci/config.yml

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
version: 2.1
2+
3+
jobs:
4+
test-arm:
5+
machine:
6+
image: ubuntu-2004:202101-01
7+
resource_class: arm.medium
8+
environment:
9+
ENV_FILE: ci/deps/circle-38-arm64.yaml
10+
PYTEST_WORKERS: auto
11+
PATTERN: "not slow and not network and not clipboard and not arm_slow"
12+
PYTEST_TARGET: "pandas"
13+
steps:
14+
- checkout
15+
- run: ci/setup_env.sh
16+
- run: PATH=$HOME/miniconda3/envs/pandas-dev/bin:$HOME/miniconda3/condabin:$PATH ci/run_tests.sh
17+
18+
workflows:
19+
test:
20+
jobs:
21+
- test-arm

.github/PULL_REQUEST_TEMPLATE.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
- [ ] closes #xxxx
22
- [ ] tests added / passed
3-
- [ ] Ensure all linting tests pass, see [here](https://pandas.pydata.org/pandas-docs/dev/development/contributing.html#code-standards) for how to run them
3+
- [ ] Ensure all linting tests pass, see [here](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#pre-commit) for how to run them
44
- [ ] whatsnew entry

.github/workflows/ci.yml

+1
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,7 @@ jobs:
168168
PANDAS_DATA_MANAGER: array
169169
PATTERN: ${{ matrix.pattern }}
170170
PYTEST_WORKERS: "auto"
171+
PYTEST_TARGET: pandas
171172
run: |
172173
source activate pandas-dev
173174
ci/run_tests.sh

.github/workflows/posix.yml

+1
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ jobs:
4444
LC_ALL: ${{ matrix.settings[4] }}
4545
PANDAS_TESTING_MODE: ${{ matrix.settings[5] }}
4646
TEST_ARGS: ${{ matrix.settings[6] }}
47+
PYTEST_TARGET: pandas
4748
concurrency:
4849
group: ${{ github.ref }}-${{ matrix.settings[0] }}
4950
cancel-in-progress: ${{github.event_name == 'pull_request'}}

.github/workflows/python-dev.yml

+1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ env:
1717
PANDAS_CI: 1
1818
PATTERN: "not slow and not network and not clipboard"
1919
COVERAGE: true
20+
PYTEST_TARGET: pandas
2021

2122
jobs:
2223
build:

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
[![License](https://img.shields.io/pypi/l/pandas.svg)](https://github.com/pandas-dev/pandas/blob/master/LICENSE)
1313
[![Azure Build Status](https://dev.azure.com/pandas-dev/pandas/_apis/build/status/pandas-dev.pandas?branch=master)](https://dev.azure.com/pandas-dev/pandas/_build/latest?definitionId=1&branch=master)
1414
[![Coverage](https://codecov.io/github/pandas-dev/pandas/coverage.svg?branch=master)](https://codecov.io/gh/pandas-dev/pandas)
15-
[![Downloads](https://anaconda.org/conda-forge/pandas/badges/downloads.svg)](https://pandas.pydata.org)
15+
[![Downloads](https://static.pepy.tech/personalized-badge/pandas?period=month&units=international_system&left_color=black&right_color=orange&left_text=PyPI%20downloads%20per%20month)](https://pepy.tech/project/pandas)
1616
[![Gitter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/pydata/pandas)
1717
[![Powered by NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](https://numfocus.org)
1818
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

asv_bench/benchmarks/algorithms.py

+6-6
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,9 @@ def setup(self, unique, sort, dtype):
4444
raise NotImplementedError
4545

4646
data = {
47-
"int": pd.Int64Index(np.arange(N)),
48-
"uint": pd.UInt64Index(np.arange(N)),
49-
"float": pd.Float64Index(np.random.randn(N)),
47+
"int": pd.Index(np.arange(N), dtype="int64"),
48+
"uint": pd.Index(np.arange(N), dtype="uint64"),
49+
"float": pd.Index(np.random.randn(N), dtype="float64"),
5050
"object": string_index,
5151
"datetime64[ns]": pd.date_range("2011-01-01", freq="H", periods=N),
5252
"datetime64[ns, tz]": pd.date_range(
@@ -76,9 +76,9 @@ class Duplicated:
7676
def setup(self, unique, keep, dtype):
7777
N = 10 ** 5
7878
data = {
79-
"int": pd.Int64Index(np.arange(N)),
80-
"uint": pd.UInt64Index(np.arange(N)),
81-
"float": pd.Float64Index(np.random.randn(N)),
79+
"int": pd.Index(np.arange(N), dtype="int64"),
80+
"uint": pd.Index(np.arange(N), dtype="uint64"),
81+
"float": pd.Index(np.random.randn(N), dtype="float64"),
8282
"string": tm.makeStringIndex(N),
8383
"datetime64[ns]": pd.date_range("2011-01-01", freq="H", periods=N),
8484
"datetime64[ns, tz]": pd.date_range(

asv_bench/benchmarks/groupby.py

+11-1
Original file line numberDiff line numberDiff line change
@@ -454,6 +454,16 @@ def setup(self, dtype, method, application, ncols):
454454
# DataFrameGroupBy doesn't have these methods
455455
raise NotImplementedError
456456

457+
if application == "transformation" and method in [
458+
"head",
459+
"tail",
460+
"unique",
461+
"value_counts",
462+
"size",
463+
]:
464+
# DataFrameGroupBy doesn't have these methods
465+
raise NotImplementedError
466+
457467
ngroups = 1000
458468
size = ngroups * 2
459469
rng = np.arange(ngroups).reshape(-1, 1)
@@ -480,7 +490,7 @@ def setup(self, dtype, method, application, ncols):
480490
if len(cols) == 1:
481491
cols = cols[0]
482492

483-
if application == "transform":
493+
if application == "transformation":
484494
if method == "describe":
485495
raise NotImplementedError
486496

asv_bench/benchmarks/indexing_engines.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ def setup(self, engine_and_dtype, index_type):
4848
"non_monotonic": np.array([1, 2, 3] * N, dtype=dtype),
4949
}[index_type]
5050

51-
self.data = engine(lambda: arr, len(arr))
51+
self.data = engine(arr)
5252
# code belows avoids populating the mapping etc. while timing.
5353
self.data.get_loc(2)
5454

@@ -70,7 +70,7 @@ def setup(self, index_type):
7070
"non_monotonic": np.array(list("abc") * N, dtype=object),
7171
}[index_type]
7272

73-
self.data = libindex.ObjectEngine(lambda: arr, len(arr))
73+
self.data = libindex.ObjectEngine(arr)
7474
# code belows avoids populating the mapping etc. while timing.
7575
self.data.get_loc("b")
7676

asv_bench/benchmarks/inference.py

+5
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,7 @@ def setup(self):
173173
self.strings_tz_space = [
174174
x.strftime("%Y-%m-%d %H:%M:%S") + " -0800" for x in rng
175175
]
176+
self.strings_zero_tz = [x.strftime("%Y-%m-%d %H:%M:%S") + "Z" for x in rng]
176177

177178
def time_iso8601(self):
178179
to_datetime(self.strings)
@@ -189,6 +190,10 @@ def time_iso8601_format_no_sep(self):
189190
def time_iso8601_tz_spaceformat(self):
190191
to_datetime(self.strings_tz_space)
191192

193+
def time_iso8601_infer_zero_tz_fromat(self):
194+
# GH 41047
195+
to_datetime(self.strings_zero_tz, infer_datetime_format=True)
196+
192197

193198
class ToDatetimeNONISO8601:
194199
def setup(self):

asv_bench/benchmarks/rolling.py

+27
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,33 @@ def time_quantile(self, constructor, window, dtype, percentile, interpolation):
180180
self.roll.quantile(percentile, interpolation=interpolation)
181181

182182

183+
class Rank:
184+
params = (
185+
["DataFrame", "Series"],
186+
[10, 1000],
187+
["int", "float"],
188+
[True, False],
189+
[True, False],
190+
["min", "max", "average"],
191+
)
192+
param_names = [
193+
"constructor",
194+
"window",
195+
"dtype",
196+
"percentile",
197+
"ascending",
198+
"method",
199+
]
200+
201+
def setup(self, constructor, window, dtype, percentile, ascending, method):
202+
N = 10 ** 5
203+
arr = np.random.random(N).astype(dtype)
204+
self.roll = getattr(pd, constructor)(arr).rolling(window)
205+
206+
def time_rank(self, constructor, window, dtype, percentile, ascending, method):
207+
self.roll.rank(pct=percentile, ascending=ascending, method=method)
208+
209+
183210
class PeakMemFixedWindowMinMax:
184211

185212
params = ["min", "max"]

asv_bench/benchmarks/series_methods.py

+13
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,19 @@ def time_constructor(self, data):
2727
Series(data=self.data, index=self.idx)
2828

2929

30+
class ToFrame:
31+
params = [["int64", "datetime64[ns]", "category", "Int64"], [None, "foo"]]
32+
param_names = ["dtype", "name"]
33+
34+
def setup(self, dtype, name):
35+
arr = np.arange(10 ** 5)
36+
ser = Series(arr, dtype=dtype)
37+
self.ser = ser
38+
39+
def time_to_frame(self, dtype, name):
40+
self.ser.to_frame(name)
41+
42+
3043
class NSort:
3144

3245
params = ["first", "last", "all"]

asv_bench/benchmarks/sparse.py

+43
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,20 @@ def time_sparse_series_to_coo_single_level(self, sort_labels):
9191
self.ss_two_lvl.sparse.to_coo(sort_labels=sort_labels)
9292

9393

94+
class ToCooFrame:
95+
def setup(self):
96+
N = 10000
97+
k = 10
98+
arr = np.full((N, k), np.nan)
99+
arr[0, 0] = 3.0
100+
arr[12, 7] = -1.0
101+
arr[0, 9] = 11.2
102+
self.df = pd.DataFrame(arr, dtype=pd.SparseDtype("float"))
103+
104+
def time_to_coo(self):
105+
self.df.sparse.to_coo()
106+
107+
94108
class Arithmetic:
95109

96110
params = ([0.1, 0.01], [0, np.nan])
@@ -152,4 +166,33 @@ def time_division(self, fill_value):
152166
self.arr1 / self.arr2
153167

154168

169+
class MinMax:
170+
171+
params = (["min", "max"], [0.0, np.nan])
172+
param_names = ["func", "fill_value"]
173+
174+
def setup(self, func, fill_value):
175+
N = 1_000_000
176+
arr = make_array(N, 1e-5, fill_value, np.float64)
177+
self.sp_arr = SparseArray(arr, fill_value=fill_value)
178+
179+
def time_min_max(self, func, fill_value):
180+
getattr(self.sp_arr, func)()
181+
182+
183+
class Take:
184+
185+
params = ([np.array([0]), np.arange(100_000), np.full(100_000, -1)], [True, False])
186+
param_names = ["indices", "allow_fill"]
187+
188+
def setup(self, indices, allow_fill):
189+
N = 1_000_000
190+
fill_value = 0.0
191+
arr = make_array(N, 1e-5, fill_value, np.float64)
192+
self.sp_arr = SparseArray(arr, fill_value=fill_value)
193+
194+
def time_take(self, indices, allow_fill):
195+
self.sp_arr.take(indices, allow_fill=allow_fill)
196+
197+
155198
from .pandas_vb_common import setup # noqa: F401 isort:skip

azure-pipelines.yml

+1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ pr:
1717

1818
variables:
1919
PYTEST_WORKERS: auto
20+
PYTEST_TARGET: pandas
2021

2122
jobs:
2223
# Mac and Linux use the same template

ci/azure/posix.yml

+7-1
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,16 @@ jobs:
99
strategy:
1010
matrix:
1111
${{ if eq(parameters.name, 'macOS') }}:
12-
py38_macos:
12+
py38_macos_1:
1313
ENV_FILE: ci/deps/azure-macos-38.yaml
1414
CONDA_PY: "38"
1515
PATTERN: "not slow and not network"
16+
PYTEST_TARGET: "pandas/tests/[a-h]*"
17+
py38_macos_2:
18+
ENV_FILE: ci/deps/azure-macos-38.yaml
19+
CONDA_PY: "38"
20+
PATTERN: "not slow and not network"
21+
PYTEST_TARGET: "pandas/tests/[i-z]*"
1622

1723
steps:
1824
- script: echo '##vso[task.prependpath]$(HOME)/miniconda3/bin'

ci/azure/windows.yml

+19-2
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,33 @@ jobs:
88
vmImage: ${{ parameters.vmImage }}
99
strategy:
1010
matrix:
11-
py38_np18:
11+
py38_np18_1:
1212
ENV_FILE: ci/deps/azure-windows-38.yaml
1313
CONDA_PY: "38"
1414
PATTERN: "not slow and not network"
1515
PYTEST_WORKERS: 2 # GH-42236
16+
PYTEST_TARGET: "pandas/tests/[a-h]*"
1617

17-
py39:
18+
py38_np18_2:
19+
ENV_FILE: ci/deps/azure-windows-38.yaml
20+
CONDA_PY: "38"
21+
PATTERN: "not slow and not network"
22+
PYTEST_WORKERS: 2 # GH-42236
23+
PYTEST_TARGET: "pandas/tests/[i-z]*"
24+
25+
py39_1:
26+
ENV_FILE: ci/deps/azure-windows-39.yaml
27+
CONDA_PY: "39"
28+
PATTERN: "not slow and not network and not high_memory"
29+
PYTEST_WORKERS: 2 # GH-42236
30+
PYTEST_TARGET: "pandas/tests/[a-h]*"
31+
32+
py39_2:
1833
ENV_FILE: ci/deps/azure-windows-39.yaml
1934
CONDA_PY: "39"
2035
PATTERN: "not slow and not network and not high_memory"
2136
PYTEST_WORKERS: 2 # GH-42236
37+
PYTEST_TARGET: "pandas/tests/[i-z]*"
2238

2339
steps:
2440
- powershell: |
@@ -39,6 +55,7 @@ jobs:
3955
displayName: 'Build'
4056
- bash: |
4157
source activate pandas-dev
58+
wmic.exe cpu get caption, deviceid, name, numberofcores, maxclockspeed
4259
ci/run_tests.sh
4360
displayName: 'Test'
4461
- task: PublishTestResults@2

ci/run_tests.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ if [[ $(uname) == "Linux" && -z $DISPLAY ]]; then
1919
XVFB="xvfb-run "
2020
fi
2121

22-
PYTEST_CMD="${XVFB}pytest -m \"$PATTERN\" -n $PYTEST_WORKERS --dist=loadfile $TEST_ARGS $COVERAGE pandas"
22+
PYTEST_CMD="${XVFB}pytest -m \"$PATTERN\" -n $PYTEST_WORKERS --dist=loadfile $TEST_ARGS $COVERAGE $PYTEST_TARGET"
2323

2424
if [[ $(uname) != "Linux" && $(uname) != "Darwin" ]]; then
2525
# GH#37455 windows py38 build appears to be running out of memory

doc/source/development/contributing.rst

+6-1
Original file line numberDiff line numberDiff line change
@@ -331,7 +331,12 @@ can comment::
331331

332332
@github-actions pre-commit
333333

334-
on that pull request. This will trigger a workflow which will autofix formatting errors.
334+
on that pull request. This will trigger a workflow which will autofix formatting
335+
errors.
336+
337+
To automatically fix formatting errors on each commit you make, you can
338+
set up pre-commit yourself. First, create a Python :ref:`environment
339+
<contributing_environment>` and then set up :ref:`pre-commit <contributing.pre-commit>`.
335340

336341
Delete your merged branch (optional)
337342
------------------------------------

doc/source/development/contributing_environment.rst

-1
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,6 @@ compiler installation instructions.
133133

134134
Let us know if you have any difficulties by opening an issue or reaching out on `Gitter <https://gitter.im/pydata/pandas/>`_.
135135

136-
137136
Creating a Python environment
138137
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
139138

doc/source/ecosystem.rst

+14
Original file line numberDiff line numberDiff line change
@@ -575,3 +575,17 @@ Library Accessor Classes Description
575575
.. _composeml: https://github.com/alteryx/compose
576576
.. _datatest: https://datatest.readthedocs.io/
577577
.. _woodwork: https://github.com/alteryx/woodwork
578+
579+
Development tools
580+
----------------------------
581+
582+
`pandas-stubs <https://github.com/VirtusLab/pandas-stubs>`__
583+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
584+
585+
While pandas repository is partially typed, the package itself doesn't expose this information for external use.
586+
Install pandas-stubs to enable basic type coverage of pandas API.
587+
588+
Learn more by reading through these issues `14468 <https://github.com/pandas-dev/pandas/issues/14468>`_,
589+
`26766 <https://github.com/pandas-dev/pandas/issues/26766>`_, `28142 <https://github.com/pandas-dev/pandas/issues/28142>`_.
590+
591+
See installation and usage instructions on the `github page <https://github.com/VirtusLab/pandas-stubs>`__.

doc/source/reference/style.rst

+1
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ Style application
3939
Styler.apply_index
4040
Styler.applymap_index
4141
Styler.format
42+
Styler.format_index
4243
Styler.hide_index
4344
Styler.hide_columns
4445
Styler.set_td_classes

0 commit comments

Comments
 (0)