Skip to content

Commit 86b26d2

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into boilerplate-4
2 parents e56ddf9 + 3970153 commit 86b26d2

File tree

376 files changed

+17136
-11445
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

376 files changed

+17136
-11445
lines changed

.travis.yml

+21-3
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ cache:
1414

1515
env:
1616
global:
17+
# Variable for test workers
18+
- PYTEST_WORKERS="auto"
1719
# create a github personal access token
1820
# cd pandas-dev/pandas
1921
# travis encrypt 'PANDAS_GH_TOKEN=personal_access_token' -r pandas-dev/pandas
@@ -27,12 +29,21 @@ matrix:
2729
fast_finish: true
2830

2931
include:
32+
# In allowed failures
33+
- dist: bionic
34+
python: 3.9-dev
35+
env:
36+
- JOB="3.9-dev" PATTERN="(not slow and not network and not clipboard)"
3037
- env:
3138
- JOB="3.8" ENV_FILE="ci/deps/travis-38.yaml" PATTERN="(not slow and not network and not clipboard)"
3239

3340
- env:
3441
- JOB="3.7" ENV_FILE="ci/deps/travis-37.yaml" PATTERN="(not slow and not network and not clipboard)"
3542

43+
- arch: arm64
44+
env:
45+
- JOB="3.7, arm64" PYTEST_WORKERS=8 ENV_FILE="ci/deps/travis-37-arm64.yaml" PATTERN="(not slow and not network and not clipboard)"
46+
3647
- env:
3748
- JOB="3.6, locale" ENV_FILE="ci/deps/travis-36-locale.yaml" PATTERN="((not slow and not network and not clipboard) or (single and db))" LOCALE_OVERRIDE="zh_CN.UTF-8" SQL="1"
3849
services:
@@ -53,11 +64,18 @@ matrix:
5364
services:
5465
- mysql
5566
- postgresql
67+
allow_failures:
68+
- arch: arm64
69+
env:
70+
- JOB="3.7, arm64" PYTEST_WORKERS=8 ENV_FILE="ci/deps/travis-37-arm64.yaml" PATTERN="(not slow and not network and not clipboard)"
71+
- dist: bionic
72+
python: 3.9-dev
73+
env:
74+
- JOB="3.9-dev" PATTERN="(not slow and not network)"
5675

5776
before_install:
5877
- echo "before_install"
59-
# set non-blocking IO on travis
60-
# https://github.com/travis-ci/travis-ci/issues/8920#issuecomment-352661024
78+
# Use blocking IO on travis. Ref: https://github.com/travis-ci/travis-ci/issues/8920#issuecomment-352661024
6179
- python -c 'import os,sys,fcntl; flags = fcntl.fcntl(sys.stdout, fcntl.F_GETFL); fcntl.fcntl(sys.stdout, fcntl.F_SETFL, flags&~os.O_NONBLOCK);'
6280
- source ci/travis_process_gbq_encryption.sh
6381
- export PATH="$HOME/miniconda3/bin:$PATH"
@@ -83,7 +101,7 @@ install:
83101
script:
84102
- echo "script start"
85103
- echo "$JOB"
86-
- source activate pandas-dev
104+
- if [ "$JOB" != "3.9-dev" ]; then source activate pandas-dev; fi
87105
- ci/run_tests.sh
88106

89107
after_script:

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
[![Downloads](https://anaconda.org/conda-forge/pandas/badges/downloads.svg)](https://pandas.pydata.org)
1717
[![Gitter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/pydata/pandas)
1818
[![Powered by NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](https://numfocus.org)
19+
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
1920

2021
## What is it?
2122

asv_bench/benchmarks/algorithms.py

+14-3
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,16 @@ class Factorize:
3434
params = [
3535
[True, False],
3636
[True, False],
37-
["int", "uint", "float", "string", "datetime64[ns]", "datetime64[ns, tz]"],
37+
[
38+
"int",
39+
"uint",
40+
"float",
41+
"string",
42+
"datetime64[ns]",
43+
"datetime64[ns, tz]",
44+
"Int64",
45+
"boolean",
46+
],
3847
]
3948
param_names = ["unique", "sort", "dtype"]
4049

@@ -49,13 +58,15 @@ def setup(self, unique, sort, dtype):
4958
"datetime64[ns, tz]": pd.date_range(
5059
"2011-01-01", freq="H", periods=N, tz="Asia/Tokyo"
5160
),
61+
"Int64": pd.array(np.arange(N), dtype="Int64"),
62+
"boolean": pd.array(np.random.randint(0, 2, N), dtype="boolean"),
5263
}[dtype]
5364
if not unique:
5465
data = data.repeat(5)
55-
self.idx = data
66+
self.data = data
5667

5768
def time_factorize(self, unique, sort, dtype):
58-
self.idx.factorize(sort=sort)
69+
pd.factorize(self.data, sort=sort)
5970

6071

6172
class Duplicated:

asv_bench/benchmarks/arithmetic.py

+78
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,59 @@ def time_frame_op_with_series_axis1(self, opname):
101101
getattr(operator, opname)(self.df, self.ser)
102102

103103

104+
class FrameWithFrameWide:
105+
# Many-columns, mixed dtypes
106+
107+
params = [
108+
[
109+
# GH#32779 has discussion of which operators are included here
110+
operator.add,
111+
operator.floordiv,
112+
operator.gt,
113+
]
114+
]
115+
param_names = ["op"]
116+
117+
def setup(self, op):
118+
# we choose dtypes so as to make the blocks
119+
# a) not perfectly match between right and left
120+
# b) appreciably bigger than single columns
121+
n_cols = 2000
122+
n_rows = 500
123+
124+
# construct dataframe with 2 blocks
125+
arr1 = np.random.randn(n_rows, int(n_cols / 2)).astype("f8")
126+
arr2 = np.random.randn(n_rows, int(n_cols / 2)).astype("f4")
127+
df = pd.concat(
128+
[pd.DataFrame(arr1), pd.DataFrame(arr2)], axis=1, ignore_index=True,
129+
)
130+
# should already be the case, but just to be sure
131+
df._consolidate_inplace()
132+
133+
# TODO: GH#33198 the setting here shoudlnt need two steps
134+
arr1 = np.random.randn(n_rows, int(n_cols / 4)).astype("f8")
135+
arr2 = np.random.randn(n_rows, int(n_cols / 2)).astype("i8")
136+
arr3 = np.random.randn(n_rows, int(n_cols / 4)).astype("f8")
137+
df2 = pd.concat(
138+
[pd.DataFrame(arr1), pd.DataFrame(arr2), pd.DataFrame(arr3)],
139+
axis=1,
140+
ignore_index=True,
141+
)
142+
# should already be the case, but just to be sure
143+
df2._consolidate_inplace()
144+
145+
self.left = df
146+
self.right = df2
147+
148+
def time_op_different_blocks(self, op):
149+
# blocks (and dtypes) are not aligned
150+
op(self.left, self.right)
151+
152+
def time_op_same_blocks(self, op):
153+
# blocks (and dtypes) are aligned
154+
op(self.left, self.left)
155+
156+
104157
class Ops:
105158

106159
params = [[True, False], ["default", 1]]
@@ -416,4 +469,29 @@ def time_apply_index(self, offset):
416469
offset.apply_index(self.rng)
417470

418471

472+
class BinaryOpsMultiIndex:
473+
params = ["sub", "add", "mul", "div"]
474+
param_names = ["func"]
475+
476+
def setup(self, func):
477+
date_range = pd.date_range("20200101 00:00", "20200102 0:00", freq="S")
478+
level_0_names = [str(i) for i in range(30)]
479+
480+
index = pd.MultiIndex.from_product([level_0_names, date_range])
481+
column_names = ["col_1", "col_2"]
482+
483+
self.df = pd.DataFrame(
484+
np.random.rand(len(index), 2), index=index, columns=column_names
485+
)
486+
487+
self.arg_df = pd.DataFrame(
488+
np.random.randint(1, 10, (len(level_0_names), 2)),
489+
index=level_0_names,
490+
columns=column_names,
491+
)
492+
493+
def time_binary_op_multiindex(self, func):
494+
getattr(self.df, func)(self.arg_df, level=0)
495+
496+
419497
from .pandas_vb_common import setup # noqa: F401 isort:skip

asv_bench/benchmarks/indexing.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -158,9 +158,9 @@ def time_boolean_rows_boolean(self):
158158
class DataFrameNumericIndexing:
159159
def setup(self):
160160
self.idx_dupe = np.array(range(30)) * 99
161-
self.df = DataFrame(np.random.randn(10000, 5))
161+
self.df = DataFrame(np.random.randn(100000, 5))
162162
self.df_dup = concat([self.df, 2 * self.df, 3 * self.df])
163-
self.bool_indexer = [True] * 5000 + [False] * 5000
163+
self.bool_indexer = [True] * 50000 + [False] * 50000
164164

165165
def time_iloc_dups(self):
166166
self.df_dup.iloc[self.idx_dupe]

asv_bench/benchmarks/rolling.py

+23
Original file line numberDiff line numberDiff line change
@@ -186,4 +186,27 @@ def peakmem_rolling(self, constructor, window_size, dtype, method):
186186
getattr(self.roll, method)()
187187

188188

189+
class Groupby:
190+
191+
params = ["sum", "median", "mean", "max", "min", "kurt", "sum"]
192+
193+
def setup(self, method):
194+
N = 1000
195+
df = pd.DataFrame(
196+
{
197+
"A": [str(i) for i in range(N)] * 10,
198+
"B": list(range(N)) * 10,
199+
"C": pd.date_range(start="1900-01-01", freq="1min", periods=N * 10),
200+
}
201+
)
202+
self.groupby_roll_int = df.groupby("A").rolling(window=2)
203+
self.groupby_roll_offset = df.groupby("A").rolling(window="30s", on="C")
204+
205+
def time_rolling_int(self, method):
206+
getattr(self.groupby_roll_int, method)()
207+
208+
def time_rolling_offset(self, method):
209+
getattr(self.groupby_roll_offset, method)()
210+
211+
189212
from .pandas_vb_common import setup # noqa: F401 isort:skip

azure-pipelines.yml

+3
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ trigger:
55
pr:
66
- master
77

8+
variables:
9+
PYTEST_WORKERS: auto
10+
811
jobs:
912
# Mac and Linux use the same template
1013
- template: ci/azure/posix.yml

ci/build39.sh

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#!/bin/bash -e
2+
# Special build for python3.9 until numpy puts its own wheels up
3+
4+
sudo apt-get install build-essential gcc xvfb
5+
pip install --no-deps -U pip wheel setuptools
6+
pip install python-dateutil pytz pytest pytest-xdist hypothesis
7+
pip install cython --pre # https://github.com/cython/cython/issues/3395
8+
9+
git clone https://github.com/numpy/numpy
10+
cd numpy
11+
python setup.py build_ext --inplace
12+
python setup.py install
13+
cd ..
14+
rm -rf numpy
15+
16+
python setup.py build_ext -inplace
17+
python -m pip install --no-build-isolation -e .
18+
19+
python -c "import sys; print(sys.version_info)"
20+
python -c "import pandas as pd"
21+
python -c "import hypothesis"

ci/code_checks.sh

+2-2
Original file line numberDiff line numberDiff line change
@@ -353,8 +353,8 @@ fi
353353
### DOCSTRINGS ###
354354
if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
355355

356-
MSG='Validate docstrings (GL03, GL04, GL05, GL06, GL07, GL09, GL10, SS04, SS05, PR03, PR04, PR05, PR10, EX04, RT01, RT04, RT05, SA02, SA03, SA05)' ; echo $MSG
357-
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=GL03,GL04,GL05,GL06,GL07,GL09,GL10,SS04,SS05,PR03,PR04,PR05,PR10,EX04,RT01,RT04,RT05,SA02,SA03,SA05
356+
MSG='Validate docstrings (GL03, GL04, GL05, GL06, GL07, GL09, GL10, SS04, SS05, PR03, PR04, PR05, PR10, EX04, RT01, RT04, RT05, SA02, SA03)' ; echo $MSG
357+
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=GL03,GL04,GL05,GL06,GL07,GL09,GL10,SS04,SS05,PR03,PR04,PR05,PR10,EX04,RT01,RT04,RT05,SA02,SA03
358358
RET=$(($RET + $?)) ; echo $MSG "DONE"
359359

360360
MSG='Validate correct capitalization among titles in documentation' ; echo $MSG

ci/deps/azure-37-numpydev.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ dependencies:
1414
- pytz
1515
- pip
1616
- pip:
17-
- cython>=0.29.16
17+
- cython==0.29.16 # GH#34014
1818
- "git+git://github.com/dateutil/dateutil.git"
19-
- "-f https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com"
19+
- "--extra-index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple"
2020
- "--pre"
2121
- "numpy"
2222
- "scipy"

ci/deps/travis-37-arm64.yaml

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
name: pandas-dev
2+
channels:
3+
- defaults
4+
- conda-forge
5+
dependencies:
6+
- python=3.7.*
7+
8+
# tools
9+
- cython>=0.29.13
10+
- pytest>=5.0.1
11+
- pytest-xdist>=1.21
12+
- hypothesis>=3.58.0
13+
14+
# pandas dependencies
15+
- botocore>=1.11
16+
- numpy
17+
- python-dateutil
18+
- pytz
19+
- pip
20+
- pip:
21+
- moto

ci/run_tests.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ if [[ $(uname) == "Linux" && -z $DISPLAY ]]; then
2020
XVFB="xvfb-run "
2121
fi
2222

23-
PYTEST_CMD="${XVFB}pytest -m \"$PATTERN\" -n auto --dist=loadfile -s --strict --durations=10 --junitxml=test-data.xml $TEST_ARGS $COVERAGE pandas"
23+
PYTEST_CMD="${XVFB}pytest -m \"$PATTERN\" -n $PYTEST_WORKERS --dist=loadfile -s --strict --durations=30 --junitxml=test-data.xml $TEST_ARGS $COVERAGE pandas"
2424

2525
echo $PYTEST_CMD
2626
sh -c "$PYTEST_CMD"

ci/setup_env.sh

+15-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
#!/bin/bash -e
22

3+
if [ "$JOB" == "3.9-dev" ]; then
4+
/bin/bash ci/build39.sh
5+
exit 0
6+
fi
7+
38
# edit the locale file if needed
49
if [[ "$(uname)" == "Linux" && -n "$LC_ALL" ]]; then
510
echo "Adding locale to the first line of pandas/__init__.py"
@@ -36,9 +41,17 @@ else
3641
exit 1
3742
fi
3843

39-
wget -q "https://repo.continuum.io/miniconda/Miniconda3-latest-$CONDA_OS.sh" -O miniconda.sh
44+
if [ "${TRAVIS_CPU_ARCH}" == "arm64" ]; then
45+
sudo apt-get -y install xvfb
46+
CONDA_URL="https://github.com/conda-forge/miniforge/releases/download/4.8.2-1/Miniforge3-4.8.2-1-Linux-aarch64.sh"
47+
else
48+
CONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-$CONDA_OS.sh"
49+
fi
50+
wget -q $CONDA_URL -O miniconda.sh
4051
chmod +x miniconda.sh
41-
./miniconda.sh -b
52+
53+
# Installation path is required for ARM64 platform as miniforge script installs in path $HOME/miniforge3.
54+
./miniconda.sh -b -p $MINICONDA_DIR
4255

4356
export PATH=$MINICONDA_DIR/bin:$PATH
4457

doc/source/development/contributing.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ version control to allow many people to work together on the project.
110110
Some great resources for learning Git:
111111

112112
* the `GitHub help pages <https://help.github.com/>`_.
113-
* the `NumPy's documentation <https://docs.scipy.org/doc/numpy/dev/index.html>`_.
113+
* the `NumPy's documentation <https://numpy.org/doc/stable/dev/index.html>`_.
114114
* Matthew Brett's `Pydagogue <https://matthew-brett.github.com/pydagogue/>`_.
115115

116116
Getting started with Git
@@ -974,7 +974,7 @@ it is worth getting in the habit of writing tests ahead of time so this is never
974974
Like many packages, pandas uses `pytest
975975
<https://docs.pytest.org/en/latest/>`_ and the convenient
976976
extensions in `numpy.testing
977-
<https://docs.scipy.org/doc/numpy/reference/routines.testing.html>`_.
977+
<https://numpy.org/doc/stable/reference/routines.testing.html>`_.
978978

979979
.. note::
980980

@@ -1275,8 +1275,8 @@ Performance matters and it is worth considering whether your code has introduced
12751275
performance regressions. pandas is in the process of migrating to
12761276
`asv benchmarks <https://github.com/spacetelescope/asv>`__
12771277
to enable easy monitoring of the performance of critical pandas operations.
1278-
These benchmarks are all found in the ``pandas/asv_bench`` directory. asv
1279-
supports both python2 and python3.
1278+
These benchmarks are all found in the ``pandas/asv_bench`` directory, and the
1279+
test results can be found `here <https://pandas.pydata.org/speed/pandas/#/>`__.
12801280

12811281
To use all features of asv, you will need either ``conda`` or
12821282
``virtualenv``. For more details please check the `asv installation

doc/source/development/extending.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ and re-boxes it if necessary.
219219

220220
If applicable, we highly recommend that you implement ``__array_ufunc__`` in your
221221
extension array to avoid coercion to an ndarray. See
222-
`the numpy documentation <https://docs.scipy.org/doc/numpy/reference/generated/numpy.lib.mixins.NDArrayOperatorsMixin.html>`__
222+
`the numpy documentation <https://numpy.org/doc/stable/reference/generated/numpy.lib.mixins.NDArrayOperatorsMixin.html>`__
223223
for an example.
224224

225225
As part of your implementation, we require that you defer to pandas when a pandas

0 commit comments

Comments
 (0)