Skip to content

Commit ddc8bd8

Browse files
author
Mike Phung
committed
Merge branch 'master' into fillna-other-missing-values-not-modified
2 parents c1fc307 + 6a683a2 commit ddc8bd8

File tree

359 files changed

+7764
-4024
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

359 files changed

+7764
-4024
lines changed

.github/ISSUE_TEMPLATE/documentation_improvement.md

Lines changed: 0 additions & 22 deletions
This file was deleted.
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
name: Documentation Improvement
2+
description: Report wrong or missing documentation
3+
title: "DOC: "
4+
labels: [Docs, Needs Triage]
5+
6+
body:
7+
- type: checkboxes
8+
attributes:
9+
options:
10+
- label: >
11+
I have checked that the issue still exists on the latest versions of the docs
12+
on `master` [here](https://pandas.pydata.org/docs/dev/)
13+
required: true
14+
- type: textarea
15+
id: location
16+
attributes:
17+
label: Location of the documentation
18+
description: >
19+
Please provide the location of the documentation, e.g. "pandas.read_csv" or the
20+
URL of the documentation, e.g.
21+
"https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html"
22+
placeholder: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
23+
validations:
24+
required: true
25+
- type: textarea
26+
id: problem
27+
attributes:
28+
label: Documentation problem
29+
description: >
30+
Please provide a description of what documentation you believe needs to be fixed/improved
31+
validations:
32+
required: true
33+
- type: textarea
34+
id: suggested-fix
35+
attributes:
36+
label: Suggested fix for documentation
37+
description: >
38+
Please explain the suggested fix and **why** it's better than the existing documentation
39+
validations:
40+
required: true

.github/ISSUE_TEMPLATE/submit_question.md

Lines changed: 0 additions & 24 deletions
This file was deleted.
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
name: Submit Question
2+
description: Ask a general question about pandas
3+
title: "QST: "
4+
labels: [Usage Question, Needs Triage]
5+
6+
body:
7+
- type: markdown
8+
attributes:
9+
value: >
10+
Since [StackOverflow](https://stackoverflow.com) is better suited towards answering
11+
usage questions, we ask that all usage questions are first asked on StackOverflow.
12+
- type: checkboxes
13+
attributes:
14+
options:
15+
- label: >
16+
I have searched the [[pandas] tag](https://stackoverflow.com/questions/tagged/pandas)
17+
on StackOverflow for similar questions.
18+
required: true
19+
- label: >
20+
I have asked my usage related question on [StackOverflow](https://stackoverflow.com).
21+
required: true
22+
- type: input
23+
id: question-link
24+
attributes:
25+
label: Link to question on StackOverflow
26+
validations:
27+
required: true
28+
- type: markdown
29+
attributes:
30+
value: ---
31+
- type: textarea
32+
id: question
33+
attributes:
34+
label: Question about pandas
35+
description: >
36+
**Note**: If you'd still like to submit a question, please read [this guide](
37+
https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) detailing
38+
how to provide the necessary information for us to reproduce your question.
39+
placeholder: |
40+
```python
41+
# Your code here, if applicable
42+
43+
```

.github/workflows/asv-bot.yml

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
name: "ASV Bot"
2+
3+
on:
4+
issue_comment: # Pull requests are issues
5+
types:
6+
- created
7+
8+
env:
9+
ENV_FILE: environment.yml
10+
COMMENT: ${{github.event.comment.body}}
11+
12+
jobs:
13+
autotune:
14+
name: "Run benchmarks"
15+
# TODO: Support more benchmarking options later, against different branches, against self, etc
16+
if: startsWith(github.event.comment.body, '@github-actions benchmark')
17+
runs-on: ubuntu-latest
18+
defaults:
19+
run:
20+
shell: bash -l {0}
21+
22+
concurrency:
23+
# Set concurrency to prevent abuse(full runs are ~5.5 hours !!!)
24+
# each user can only run one concurrent benchmark bot at a time
25+
# We don't cancel in progress jobs, but if you want to benchmark multiple PRs, you're gonna have
26+
# to wait
27+
group: ${{ github.actor }}-asv
28+
cancel-in-progress: false
29+
30+
steps:
31+
- name: Checkout
32+
uses: actions/checkout@v2
33+
with:
34+
fetch-depth: 0
35+
36+
- name: Cache conda
37+
uses: actions/cache@v2
38+
with:
39+
path: ~/conda_pkgs_dir
40+
key: ${{ runner.os }}-conda-${{ hashFiles('${{ env.ENV_FILE }}') }}
41+
42+
# Although asv sets up its own env, deps are still needed
43+
# during discovery process
44+
- uses: conda-incubator/setup-miniconda@v2
45+
with:
46+
activate-environment: pandas-dev
47+
channel-priority: strict
48+
environment-file: ${{ env.ENV_FILE }}
49+
use-only-tar-bz2: true
50+
51+
- name: Run benchmarks
52+
id: bench
53+
continue-on-error: true # This is a fake failure, asv will exit code 1 for regressions
54+
run: |
55+
# extracting the regex, see https://stackoverflow.com/a/36798723
56+
REGEX=$(echo "$COMMENT" | sed -n "s/^.*-b\s*\(\S*\).*$/\1/p")
57+
cd asv_bench
58+
asv check -E existing
59+
git remote add upstream https://github.com/pandas-dev/pandas.git
60+
git fetch upstream
61+
asv machine --yes
62+
asv continuous -f 1.1 -b $REGEX upstream/master HEAD
63+
echo 'BENCH_OUTPUT<<EOF' >> $GITHUB_ENV
64+
asv compare -f 1.1 upstream/master HEAD >> $GITHUB_ENV
65+
echo 'EOF' >> $GITHUB_ENV
66+
echo "REGEX=$REGEX" >> $GITHUB_ENV
67+
68+
- uses: actions/github-script@v4
69+
env:
70+
BENCH_OUTPUT: ${{env.BENCH_OUTPUT}}
71+
REGEX: ${{env.REGEX}}
72+
with:
73+
script: |
74+
const ENV_VARS = process.env
75+
const run_url = `https://github.com/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`
76+
github.issues.createComment({
77+
issue_number: context.issue.number,
78+
owner: context.repo.owner,
79+
repo: context.repo.repo,
80+
body: '\nBenchmarks completed. View runner logs here.' + run_url + '\nRegex used: '+ 'regex ' + ENV_VARS["REGEX"] + '\n' + ENV_VARS["BENCH_OUTPUT"]
81+
})

.github/workflows/autoupdate-pre-commit-config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ name: "Update pre-commit config"
22

33
on:
44
schedule:
5-
- cron: "0 7 * * 1" # At 07:00 on each Monday.
5+
- cron: "0 7 1 * *" # At 07:00 on 1st of every month.
66
workflow_dispatch:
77

88
jobs:

.github/workflows/python-dev.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ jobs:
4141
- name: Install dependencies
4242
run: |
4343
python -m pip install --upgrade pip setuptools wheel
44-
pip install git+https://github.com/numpy/numpy.git
44+
pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
4545
pip install git+https://github.com/pytest-dev/pytest.git
4646
pip install git+https://github.com/nedbat/coveragepy.git
4747
pip install cython python-dateutil pytz hypothesis pytest-xdist pytest-cov

.pre-commit-config.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,3 +164,8 @@ repos:
164164
entry: python scripts/no_bool_in_generic.py
165165
language: python
166166
files: ^pandas/core/generic\.py$
167+
- id: pandas-errors-documented
168+
name: Ensure pandas errors are documented in doc/source/reference/general_utility_functions.rst
169+
entry: python scripts/pandas_errors_documented.py
170+
language: python
171+
files: ^pandas/errors/__init__.py$

asv_bench/asv.conf.json

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,17 +46,14 @@
4646
"numba": [],
4747
"numexpr": [],
4848
"pytables": [null, ""], // platform dependent, see excludes below
49+
"pyarrow": [],
4950
"tables": [null, ""],
5051
"openpyxl": [],
5152
"xlsxwriter": [],
5253
"xlrd": [],
5354
"xlwt": [],
5455
"odfpy": [],
55-
"pytest": [],
5656
"jinja2": [],
57-
// If using Windows with python 2.7 and want to build using the
58-
// mingw toolchain (rather than MSVC), uncomment the following line.
59-
// "libpython": [],
6057
},
6158
"conda_channels": ["defaults", "conda-forge"],
6259
// Combinations of libraries/python versions can be excluded/included

asv_bench/benchmarks/dtypes.py

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -50,15 +50,26 @@ def time_pandas_dtype_invalid(self, dtype):
5050

5151
class SelectDtypes:
5252

53-
params = [
54-
tm.ALL_INT_DTYPES
55-
+ tm.ALL_EA_INT_DTYPES
56-
+ tm.FLOAT_DTYPES
57-
+ tm.COMPLEX_DTYPES
58-
+ tm.DATETIME64_DTYPES
59-
+ tm.TIMEDELTA64_DTYPES
60-
+ tm.BOOL_DTYPES
61-
]
53+
try:
54+
params = [
55+
tm.ALL_INT_NUMPY_DTYPES
56+
+ tm.ALL_INT_EA_DTYPES
57+
+ tm.FLOAT_NUMPY_DTYPES
58+
+ tm.COMPLEX_DTYPES
59+
+ tm.DATETIME64_DTYPES
60+
+ tm.TIMEDELTA64_DTYPES
61+
+ tm.BOOL_DTYPES
62+
]
63+
except AttributeError:
64+
params = [
65+
tm.ALL_INT_DTYPES
66+
+ tm.ALL_EA_INT_DTYPES
67+
+ tm.FLOAT_DTYPES
68+
+ tm.COMPLEX_DTYPES
69+
+ tm.DATETIME64_DTYPES
70+
+ tm.TIMEDELTA64_DTYPES
71+
+ tm.BOOL_DTYPES
72+
]
6273
param_names = ["dtype"]
6374

6475
def setup(self, dtype):

asv_bench/benchmarks/frame_ctor.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
import pandas as pd
44
from pandas import (
5+
Categorical,
56
DataFrame,
67
MultiIndex,
78
Series,
@@ -31,6 +32,9 @@ def setup(self):
3132
self.dict_list = frame.to_dict(orient="records")
3233
self.data2 = {i: {j: float(j) for j in range(100)} for i in range(2000)}
3334

35+
# arrays which we wont consolidate
36+
self.dict_of_categoricals = {i: Categorical(np.arange(N)) for i in range(K)}
37+
3438
def time_list_of_dict(self):
3539
DataFrame(self.dict_list)
3640

@@ -50,6 +54,10 @@ def time_nested_dict_int64(self):
5054
# nested dict, integer indexes, regression described in #621
5155
DataFrame(self.data2)
5256

57+
def time_dict_of_categoricals(self):
58+
# dict of arrays that we wont consolidate
59+
DataFrame(self.dict_of_categoricals)
60+
5361

5462
class FromSeries:
5563
def setup(self):

asv_bench/benchmarks/frame_methods.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -542,8 +542,12 @@ class Interpolate:
542542
def setup(self, downcast):
543543
N = 10000
544544
# this is the worst case, where every column has NaNs.
545-
self.df = DataFrame(np.random.randn(N, 100))
546-
self.df.values[::2] = np.nan
545+
arr = np.random.randn(N, 100)
546+
# NB: we need to set values in array, not in df.values, otherwise
547+
# the benchmark will be misleading for ArrayManager
548+
arr[::2] = np.nan
549+
550+
self.df = DataFrame(arr)
547551

548552
self.df2 = DataFrame(
549553
{

0 commit comments

Comments
 (0)