Skip to content

Commit dfabdfc

Browse files
Merge remote-tracking branch 'upstream/master' into bisect
2 parents bc12c51 + 0b68d87 commit dfabdfc

File tree

298 files changed

+4371
-3615
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

298 files changed

+4371
-3615
lines changed

.github/workflows/ci.yml

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ on:
77
branches:
88
- master
99
- 1.2.x
10+
- 1.3.x
1011

1112
env:
1213
ENV_FILE: environment.yml

.github/workflows/database.yml

+3
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ on:
77
branches:
88
- master
99
- 1.2.x
10+
- 1.3.x
11+
paths-ignore:
12+
- "doc/**"
1013

1114
env:
1215
PYTEST_WORKERS: "auto"

.github/workflows/posix.yml

+3
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ on:
77
branches:
88
- master
99
- 1.2.x
10+
- 1.3.x
11+
paths-ignore:
12+
- "doc/**"
1013

1114
env:
1215
PYTEST_WORKERS: "auto"

.github/workflows/python-dev.yml

+2
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ on:
77
pull_request:
88
branches:
99
- master
10+
paths-ignore:
11+
- "doc/**"
1012

1113
jobs:
1214
build:

.pre-commit-config.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ repos:
99
- id: absolufy-imports
1010
files: ^pandas/
1111
- repo: https://github.com/python/black
12-
rev: 20.8b1
12+
rev: 21.5b2
1313
hooks:
1414
- id: black
1515
- repo: https://github.com/codespell-project/codespell

MANIFEST.in

+13-2
Original file line numberDiff line numberDiff line change
@@ -17,18 +17,19 @@ global-exclude *.h5
1717
global-exclude *.html
1818
global-exclude *.json
1919
global-exclude *.jsonl
20+
global-exclude *.msgpack
2021
global-exclude *.pdf
2122
global-exclude *.pickle
2223
global-exclude *.png
2324
global-exclude *.pptx
24-
global-exclude *.pyc
25-
global-exclude *.pyd
2625
global-exclude *.ods
2726
global-exclude *.odt
27+
global-exclude *.orc
2828
global-exclude *.sas7bdat
2929
global-exclude *.sav
3030
global-exclude *.so
3131
global-exclude *.xls
32+
global-exclude *.xlsb
3233
global-exclude *.xlsm
3334
global-exclude *.xlsx
3435
global-exclude *.xpt
@@ -39,6 +40,13 @@ global-exclude .DS_Store
3940
global-exclude .git*
4041
global-exclude \#*
4142

43+
global-exclude *.c
44+
global-exclude *.cpp
45+
global-exclude *.h
46+
47+
global-exclude *.py[ocd]
48+
global-exclude *.pxi
49+
4250
# GH 39321
4351
# csv_dir_path fixture checks the existence of the directory
4452
# exclude the whole directory to avoid running related tests in sdist
@@ -47,3 +55,6 @@ prune pandas/tests/io/parser/data
4755
include versioneer.py
4856
include pandas/_version.py
4957
include pandas/io/formats/templates/*.tpl
58+
59+
graft pandas/_libs/src
60+
graft pandas/_libs/tslibs/src

README.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,13 @@
1010
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3509134.svg)](https://doi.org/10.5281/zenodo.3509134)
1111
[![Package Status](https://img.shields.io/pypi/status/pandas.svg)](https://pypi.org/project/pandas/)
1212
[![License](https://img.shields.io/pypi/l/pandas.svg)](https://github.com/pandas-dev/pandas/blob/master/LICENSE)
13-
[![Travis Build Status](https://travis-ci.org/pandas-dev/pandas.svg?branch=master)](https://travis-ci.org/pandas-dev/pandas)
1413
[![Azure Build Status](https://dev.azure.com/pandas-dev/pandas/_apis/build/status/pandas-dev.pandas?branch=master)](https://dev.azure.com/pandas-dev/pandas/_build/latest?definitionId=1&branch=master)
1514
[![Coverage](https://codecov.io/github/pandas-dev/pandas/coverage.svg?branch=master)](https://codecov.io/gh/pandas-dev/pandas)
1615
[![Downloads](https://anaconda.org/conda-forge/pandas/badges/downloads.svg)](https://pandas.pydata.org)
1716
[![Gitter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/pydata/pandas)
1817
[![Powered by NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](https://numfocus.org)
1918
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
19+
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
2020

2121
## What is it?
2222

@@ -101,8 +101,8 @@ pip install pandas
101101

102102
## Dependencies
103103
- [NumPy - Adds support for large, multi-dimensional arrays, matrices and high-level mathematical functions to operate on these arrays](https://www.numpy.org)
104-
- [python-dateutil - Provides powerful extensions to the standard datetime module](https://labix.org/python-dateutil)
105-
- [pytz - Brings the Olson tz database into Python which allows accurate and cross platform timezone calculations](https://pythonhosted.org/pytz)
104+
- [python-dateutil - Provides powerful extensions to the standard datetime module](https://dateutil.readthedocs.io/en/stable/index.html)
105+
- [pytz - Brings the Olson tz database into Python which allows accurate and cross platform timezone calculations](https://github.com/stub42/pytz)
106106

107107
See the [full installation instructions](https://pandas.pydata.org/pandas-docs/stable/install.html#dependencies) for minimum supported versions of required, recommended and optional dependencies.
108108

@@ -121,7 +121,7 @@ cloning the git repo), execute:
121121
python setup.py install
122122
```
123123

124-
or for installing in [development mode](https://pip.pypa.io/en/latest/reference/pip_install.html#editable-installs):
124+
or for installing in [development mode](https://pip.pypa.io/en/latest/cli/pip_install/#install-editable):
125125

126126

127127
```sh

asv_bench/benchmarks/algorithms.py

+10-13
Original file line numberDiff line numberDiff line change
@@ -23,41 +23,38 @@ class Factorize:
2323
"int",
2424
"uint",
2525
"float",
26-
"string",
26+
"object",
2727
"datetime64[ns]",
2828
"datetime64[ns, tz]",
2929
"Int64",
3030
"boolean",
31-
"string_arrow",
31+
"string[pyarrow]",
3232
],
3333
]
3434
param_names = ["unique", "sort", "dtype"]
3535

3636
def setup(self, unique, sort, dtype):
3737
N = 10 ** 5
3838
string_index = tm.makeStringIndex(N)
39-
try:
40-
from pandas.core.arrays.string_arrow import ArrowStringDtype
41-
42-
string_arrow = pd.array(string_index, dtype=ArrowStringDtype())
43-
except ImportError:
44-
string_arrow = None
45-
46-
if dtype == "string_arrow" and not string_arrow:
47-
raise NotImplementedError
39+
string_arrow = None
40+
if dtype == "string[pyarrow]":
41+
try:
42+
string_arrow = pd.array(string_index, dtype="string[pyarrow]")
43+
except ImportError:
44+
raise NotImplementedError
4845

4946
data = {
5047
"int": pd.Int64Index(np.arange(N)),
5148
"uint": pd.UInt64Index(np.arange(N)),
5249
"float": pd.Float64Index(np.random.randn(N)),
53-
"string": string_index,
50+
"object": string_index,
5451
"datetime64[ns]": pd.date_range("2011-01-01", freq="H", periods=N),
5552
"datetime64[ns, tz]": pd.date_range(
5653
"2011-01-01", freq="H", periods=N, tz="Asia/Tokyo"
5754
),
5855
"Int64": pd.array(np.arange(N), dtype="Int64"),
5956
"boolean": pd.array(np.random.randint(0, 2, N), dtype="boolean"),
60-
"string_arrow": string_arrow,
57+
"string[pyarrow]": string_arrow,
6158
}[dtype]
6259
if not unique:
6360
data = data.repeat(5)

asv_bench/benchmarks/algos/isin.py

+3-13
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ class IsIn:
2525
"category[object]",
2626
"category[int]",
2727
"str",
28-
"string",
29-
"arrow_string",
28+
"string[python]",
29+
"string[pyarrow]",
3030
]
3131
param_names = ["dtype"]
3232

@@ -50,8 +50,6 @@ def setup(self, dtype):
5050

5151
elif dtype in ["category[object]", "category[int]"]:
5252
# Note: sizes are different in this case than others
53-
np.random.seed(1234)
54-
5553
n = 5 * 10 ** 5
5654
sample_size = 100
5755

@@ -62,9 +60,7 @@ def setup(self, dtype):
6260
self.values = np.random.choice(arr, sample_size)
6361
self.series = Series(arr).astype("category")
6462

65-
elif dtype in ["str", "string", "arrow_string"]:
66-
from pandas.core.arrays.string_arrow import ArrowStringDtype # noqa: F401
67-
63+
elif dtype in ["str", "string[python]", "string[pyarrow]"]:
6864
try:
6965
self.series = Series(tm.makeStringIndex(N), dtype=dtype)
7066
except ImportError:
@@ -101,7 +97,6 @@ class IsinAlmostFullWithRandomInt:
10197
def setup(self, dtype, exponent, title):
10298
M = 3 * 2 ** (exponent - 2)
10399
# 0.77-the maximal share of occupied buckets
104-
np.random.seed(42)
105100
self.series = Series(np.random.randint(0, M, M)).astype(dtype)
106101

107102
values = np.random.randint(0, M, M).astype(dtype)
@@ -134,7 +129,6 @@ class IsinWithRandomFloat:
134129
param_names = ["dtype", "size", "title"]
135130

136131
def setup(self, dtype, size, title):
137-
np.random.seed(42)
138132
self.values = np.random.rand(size)
139133
self.series = Series(self.values).astype(dtype)
140134
np.random.shuffle(self.values)
@@ -181,7 +175,6 @@ class IsinWithArange:
181175

182176
def setup(self, dtype, M, offset_factor):
183177
offset = int(M * offset_factor)
184-
np.random.seed(42)
185178
tmp = Series(np.random.randint(offset, M + offset, 10 ** 6))
186179
self.series = tmp.astype(dtype)
187180
self.values = np.arange(M).astype(dtype)
@@ -292,10 +285,8 @@ def setup(self, dtype, MaxNumber, series_type):
292285
raise NotImplementedError
293286

294287
if series_type == "random_hits":
295-
np.random.seed(42)
296288
array = np.random.randint(0, MaxNumber, N)
297289
if series_type == "random_misses":
298-
np.random.seed(42)
299290
array = np.random.randint(0, MaxNumber, N) + MaxNumber
300291
if series_type == "monotone_hits":
301292
array = np.repeat(np.arange(MaxNumber), N // MaxNumber)
@@ -324,7 +315,6 @@ def setup(self, dtype, series_type):
324315
raise NotImplementedError
325316

326317
if series_type == "random":
327-
np.random.seed(42)
328318
vals = np.random.randint(0, 10 * N, N)
329319
if series_type == "monotone":
330320
vals = np.arange(N)

asv_bench/benchmarks/frame_ctor.py

-1
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,6 @@ class FromDictwithTimestamp:
6767

6868
def setup(self, offset):
6969
N = 10 ** 3
70-
np.random.seed(1234)
7170
idx = date_range(Timestamp("1/1/1900"), freq=offset, periods=N)
7271
df = DataFrame(np.random.randn(N, 10), index=idx)
7372
self.d = df.to_dict()

asv_bench/benchmarks/groupby.py

+5-3
Original file line numberDiff line numberDiff line change
@@ -393,7 +393,7 @@ class GroupByMethods:
393393

394394
param_names = ["dtype", "method", "application"]
395395
params = [
396-
["int", "float", "object", "datetime"],
396+
["int", "float", "object", "datetime", "uint"],
397397
[
398398
"all",
399399
"any",
@@ -442,6 +442,8 @@ def setup(self, dtype, method, application):
442442
values = rng.take(np.random.randint(0, ngroups, size=size))
443443
if dtype == "int":
444444
key = np.random.randint(0, size, size=size)
445+
elif dtype == "uint":
446+
key = np.random.randint(0, size, size=size, dtype=dtype)
445447
elif dtype == "float":
446448
key = np.concatenate(
447449
[np.random.random(ngroups) * 0.1, np.random.random(ngroups) * 10.0]
@@ -505,11 +507,11 @@ def time_frame_agg(self, dtype, method):
505507
self.df.groupby("key").agg(method)
506508

507509

508-
class CumminMax:
510+
class Cumulative:
509511
param_names = ["dtype", "method"]
510512
params = [
511513
["float64", "int64", "Float64", "Int64"],
512-
["cummin", "cummax"],
514+
["cummin", "cummax", "cumsum"],
513515
]
514516

515517
def setup(self, dtype, method):

asv_bench/benchmarks/hash_functions.py

-1
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,6 @@ class NumericSeriesIndexingShuffled:
6767

6868
def setup(self, index, N):
6969
vals = np.array(list(range(55)) + [54] + list(range(55, N - 1)))
70-
np.random.seed(42)
7170
np.random.shuffle(vals)
7271
indices = index(vals)
7372
self.data = pd.Series(np.arange(N), index=indices)

asv_bench/benchmarks/indexing.py

-3
Original file line numberDiff line numberDiff line change
@@ -368,17 +368,14 @@ def setup(self):
368368
self.df = DataFrame(index=range(self.N))
369369

370370
def time_insert(self):
371-
np.random.seed(1234)
372371
for i in range(100):
373372
self.df.insert(0, i, np.random.randn(self.N), allow_duplicates=True)
374373

375374
def time_assign_with_setitem(self):
376-
np.random.seed(1234)
377375
for i in range(100):
378376
self.df[i] = np.random.randn(self.N)
379377

380378
def time_assign_list_like_with_setitem(self):
381-
np.random.seed(1234)
382379
self.df[list(range(100))] = np.random.randn(self.N, 100)
383380

384381
def time_assign_list_of_columns_concat(self):

asv_bench/benchmarks/series_methods.py

-1
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,6 @@ class Mode:
145145
param_names = ["N", "dtype"]
146146

147147
def setup(self, N, dtype):
148-
np.random.seed(42)
149148
self.s = Series(np.random.randint(0, N, size=10 * N)).astype(dtype)
150149

151150
def time_mode(self, N, dtype):

asv_bench/benchmarks/strings.py

+1-3
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,10 @@
1212

1313

1414
class Dtypes:
15-
params = ["str", "string", "arrow_string"]
15+
params = ["str", "string[python]", "string[pyarrow]"]
1616
param_names = ["dtype"]
1717

1818
def setup(self, dtype):
19-
from pandas.core.arrays.string_arrow import ArrowStringDtype # noqa: F401
20-
2119
try:
2220
self.s = Series(tm.makeStringIndex(10 ** 5), dtype=dtype)
2321
except ImportError:

azure-pipelines.yml

+9-2
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,18 @@
11
# Adapted from https://github.com/numba/numba/blob/master/azure-pipelines.yml
22
trigger:
3-
- master
4-
- 1.2.x
3+
branches:
4+
include:
5+
- master
6+
- 1.2.x
7+
- 1.3.x
8+
paths:
9+
exclude:
10+
- 'doc/*'
511

612
pr:
713
- master
814
- 1.2.x
15+
- 1.3.x
916

1017
variables:
1118
PYTEST_WORKERS: auto

ci/code_checks.sh

+4
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,10 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
7777
invgrep -R --include="*.rst" -E "[a-zA-Z0-9]\`\`?[a-zA-Z0-9]" doc/source/
7878
RET=$(($RET + $?)) ; echo $MSG "DONE"
7979

80+
MSG='Check for unnecessary random seeds in asv benchmarks' ; echo $MSG
81+
invgrep -R --exclude pandas_vb_common.py -E 'np.random.seed' asv_bench/benchmarks/
82+
RET=$(($RET + $?)) ; echo $MSG "DONE"
83+
8084
fi
8185

8286
### CODE ###

ci/deps/azure-macos-37.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ dependencies:
2222
- numexpr
2323
- numpy=1.17.3
2424
- openpyxl
25-
- pyarrow=0.17.0
25+
- pyarrow=0.17
2626
- pytables
2727
- python-dateutil==2.7.3
2828
- pytz

doc/source/_static/ci.png

508 KB
Loading

0 commit comments

Comments
 (0)