Skip to content

Commit bc3a00c

Browse files
Merge remote-tracking branch 'upstream/master' into issue-pandas-dev#38781
2 parents 86f0e86 + f51547c commit bc3a00c

File tree

253 files changed

+9774
-6581
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

253 files changed

+9774
-6581
lines changed

.github/workflows/ci.yml

+53-28
Original file line numberDiff line numberDiff line change
@@ -2,74 +2,81 @@ name: CI
22

33
on:
44
push:
5-
branches: master
5+
branches: [master]
66
pull_request:
77
branches:
88
- master
99
- 1.2.x
1010

1111
env:
1212
ENV_FILE: environment.yml
13+
PANDAS_CI: 1
1314

1415
jobs:
1516
checks:
1617
name: Checks
1718
runs-on: ubuntu-latest
18-
steps:
19-
20-
- name: Setting conda path
21-
run: echo "${HOME}/miniconda3/bin" >> $GITHUB_PATH
19+
defaults:
20+
run:
21+
shell: bash -l {0}
2222

23+
steps:
2324
- name: Checkout
2425
uses: actions/checkout@v1
2526

2627
- name: Looking for unwanted patterns
2728
run: ci/code_checks.sh patterns
2829
if: always()
2930

30-
- name: Setup environment and build pandas
31-
run: ci/setup_env.sh
32-
if: always()
31+
- name: Cache conda
32+
uses: actions/cache@v2
33+
with:
34+
path: ~/conda_pkgs_dir
35+
key: ${{ runner.os }}-conda-${{ hashFiles('${{ env.ENV_FILE }}') }}
3336

34-
- name: Linting
37+
- uses: conda-incubator/setup-miniconda@v2
38+
with:
39+
activate-environment: pandas-dev
40+
channel-priority: strict
41+
environment-file: ${{ env.ENV_FILE }}
42+
use-only-tar-bz2: true
43+
44+
- name: Environment Detail
3545
run: |
36-
source activate pandas-dev
37-
ci/code_checks.sh lint
46+
conda info
47+
conda list
48+
49+
- name: Build Pandas
50+
run: |
51+
python setup.py build_ext -j 2
52+
python -m pip install -e . --no-build-isolation --no-use-pep517
53+
54+
- name: Linting
55+
run: ci/code_checks.sh lint
3856
if: always()
3957

4058
- name: Checks on imported code
41-
run: |
42-
source activate pandas-dev
43-
ci/code_checks.sh code
59+
run: ci/code_checks.sh code
4460
if: always()
4561

4662
- name: Running doctests
47-
run: |
48-
source activate pandas-dev
49-
ci/code_checks.sh doctests
63+
run: ci/code_checks.sh doctests
5064
if: always()
5165

5266
- name: Docstring validation
53-
run: |
54-
source activate pandas-dev
55-
ci/code_checks.sh docstrings
67+
run: ci/code_checks.sh docstrings
5668
if: always()
5769

5870
- name: Typing validation
59-
run: |
60-
source activate pandas-dev
61-
ci/code_checks.sh typing
71+
run: ci/code_checks.sh typing
6272
if: always()
6373

6474
- name: Testing docstring validation script
65-
run: |
66-
source activate pandas-dev
67-
pytest --capture=no --strict-markers scripts
75+
run: pytest --capture=no --strict-markers scripts
6876
if: always()
6977

7078
- name: Running benchmarks
7179
run: |
72-
source activate pandas-dev
7380
cd asv_bench
7481
asv check -E existing
7582
git remote add upstream https://github.com/pandas-dev/pandas.git
@@ -106,7 +113,6 @@ jobs:
106113
run: |
107114
source activate pandas-dev
108115
python web/pandas_web.py web/pandas --target-path=web/build
109-
110116
- name: Build documentation
111117
run: |
112118
source activate pandas-dev
@@ -132,3 +138,22 @@ jobs:
132138
- name: Upload dev docs
133139
run: rsync -az --delete doc/build/html/ docs@${{ secrets.server_ip }}:/usr/share/nginx/pandas/pandas-docs/dev
134140
if: github.event_name == 'push'
141+
142+
data_manager:
143+
name: Test experimental data manager
144+
runs-on: ubuntu-latest
145+
steps:
146+
147+
- name: Setting conda path
148+
run: echo "${HOME}/miniconda3/bin" >> $GITHUB_PATH
149+
150+
- name: Checkout
151+
uses: actions/checkout@v1
152+
153+
- name: Setup environment and build pandas
154+
run: ci/setup_env.sh
155+
156+
- name: Run tests
157+
run: |
158+
source activate pandas-dev
159+
pytest pandas/tests/frame/methods --array-manager

.pre-commit-config.yaml

+3-2
Original file line numberDiff line numberDiff line change
@@ -145,11 +145,12 @@ repos:
145145
language: pygrep
146146
types_or: [python, cython]
147147
- id: unwanted-typing
148-
name: Check for use of comment-based annotation syntax and missing error codes
148+
name: Check for outdated annotation syntax and missing error codes
149149
entry: |
150150
(?x)
151151
\#\ type:\ (?!ignore)|
152-
\#\ type:\s?ignore(?!\[)
152+
\#\ type:\s?ignore(?!\[)|
153+
\)\ ->\ \"
153154
language: pygrep
154155
types: [python]
155156
- id: np-bool

asv_bench/benchmarks/io/excel.py

+8-1
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ class ReadExcel:
4343
params = ["xlrd", "openpyxl", "odf"]
4444
param_names = ["engine"]
4545
fname_excel = "spreadsheet.xlsx"
46+
fname_excel_xls = "spreadsheet.xls"
4647
fname_odf = "spreadsheet.ods"
4748

4849
def _create_odf(self):
@@ -63,10 +64,16 @@ def setup_cache(self):
6364
self.df = _generate_dataframe()
6465

6566
self.df.to_excel(self.fname_excel, sheet_name="Sheet1")
67+
self.df.to_excel(self.fname_excel_xls, sheet_name="Sheet1")
6668
self._create_odf()
6769

6870
def time_read_excel(self, engine):
69-
fname = self.fname_odf if engine == "odf" else self.fname_excel
71+
if engine == "xlrd":
72+
fname = self.fname_excel_xls
73+
elif engine == "odf":
74+
fname = self.fname_odf
75+
else:
76+
fname = self.fname_excel
7077
read_excel(fname, engine=engine)
7178

7279

doc/make.py

+9-6
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ def __init__(
4646
warnings_are_errors=False,
4747
):
4848
self.num_jobs = num_jobs
49+
self.include_api = include_api
4950
self.verbosity = verbosity
5051
self.warnings_are_errors = warnings_are_errors
5152

@@ -188,7 +189,14 @@ def _add_redirects(self):
188189
if not row or row[0].strip().startswith("#"):
189190
continue
190191

191-
path = os.path.join(BUILD_PATH, "html", *row[0].split("/")) + ".html"
192+
html_path = os.path.join(BUILD_PATH, "html")
193+
path = os.path.join(html_path, *row[0].split("/")) + ".html"
194+
195+
if not self.include_api and (
196+
os.path.join(html_path, "reference") in path
197+
or os.path.join(html_path, "generated") in path
198+
):
199+
continue
192200

193201
try:
194202
title = self._get_page_title(row[1])
@@ -198,11 +206,6 @@ def _add_redirects(self):
198206
# sphinx specific stuff
199207
title = "this page"
200208

201-
if os.path.exists(path):
202-
raise RuntimeError(
203-
f"Redirection would overwrite an existing file: {path}"
204-
)
205-
206209
with open(path, "w") as moved_page_fd:
207210
html = f"""\
208211
<html>

doc/source/conf.py

+11-8
Original file line numberDiff line numberDiff line change
@@ -77,29 +77,32 @@
7777
try:
7878
import nbconvert
7979
except ImportError:
80-
logger.warn("nbconvert not installed. Skipping notebooks.")
80+
logger.warning("nbconvert not installed. Skipping notebooks.")
8181
exclude_patterns.append("**/*.ipynb")
8282
else:
8383
try:
8484
nbconvert.utils.pandoc.get_pandoc_version()
8585
except nbconvert.utils.pandoc.PandocMissing:
86-
logger.warn("Pandoc not installed. Skipping notebooks.")
86+
logger.warning("Pandoc not installed. Skipping notebooks.")
8787
exclude_patterns.append("**/*.ipynb")
8888

8989
# sphinx_pattern can be '-api' to exclude the API pages,
9090
# the path to a file, or a Python object
9191
# (e.g. '10min.rst' or 'pandas.DataFrame.head')
9292
source_path = os.path.dirname(os.path.abspath(__file__))
9393
pattern = os.environ.get("SPHINX_PATTERN")
94+
single_doc = pattern is not None and pattern != "-api"
95+
include_api = pattern != "-api"
9496
if pattern:
9597
for dirname, dirs, fnames in os.walk(source_path):
98+
reldir = os.path.relpath(dirname, source_path)
9699
for fname in fnames:
97100
if os.path.splitext(fname)[-1] in (".rst", ".ipynb"):
98101
fname = os.path.relpath(os.path.join(dirname, fname), source_path)
99102

100103
if fname == "index.rst" and os.path.abspath(dirname) == source_path:
101104
continue
102-
elif pattern == "-api" and dirname == "reference":
105+
elif pattern == "-api" and reldir.startswith("reference"):
103106
exclude_patterns.append(fname)
104107
elif pattern != "-api" and fname != pattern:
105108
exclude_patterns.append(fname)
@@ -109,11 +112,11 @@
109112
with open(os.path.join(source_path, "index.rst"), "w") as f:
110113
f.write(
111114
t.render(
112-
include_api=pattern is None,
113-
single_doc=(pattern if pattern is not None and pattern != "-api" else None),
115+
include_api=include_api,
116+
single_doc=(pattern if single_doc else None),
114117
)
115118
)
116-
autosummary_generate = True if pattern is None else ["index"]
119+
autosummary_generate = True if include_api else ["index"]
117120
autodoc_typehints = "none"
118121

119122
# numpydoc
@@ -315,7 +318,7 @@
315318
# ... and each of its public methods
316319
moved_api_pages.append((f"{old}.{method}", f"{new}.{method}"))
317320

318-
if pattern is None:
321+
if include_api:
319322
html_additional_pages = {
320323
"generated/" + page[0]: "api_redirect.html" for page in moved_api_pages
321324
}
@@ -411,7 +414,7 @@
411414
# latex_use_modindex = True
412415

413416

414-
if pattern is None:
417+
if include_api:
415418
intersphinx_mapping = {
416419
"dateutil": ("https://dateutil.readthedocs.io/en/latest/", None),
417420
"matplotlib": ("https://matplotlib.org/", None),

doc/source/ecosystem.rst

+8
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,14 @@ Featuretools is a Python library for automated feature engineering built on top
8585

8686
Compose is a machine learning tool for labeling data and prediction engineering. It allows you to structure the labeling process by parameterizing prediction problems and transforming time-driven relational data into target values with cutoff times that can be used for supervised learning.
8787

88+
`STUMPY <https://github.com/TDAmeritrade/stumpy>`__
89+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
90+
91+
STUMPY is a powerful and scalable Python library for modern time series analysis.
92+
At its core, STUMPY efficiently computes something called a
93+
`matrix profile <https://stumpy.readthedocs.io/en/latest/Tutorial_The_Matrix_Profile.html>`__,
94+
which can be used for a wide variety of time series data mining tasks.
95+
8896
.. _ecosystem.visualization:
8997

9098
Visualization

doc/source/getting_started/comparison/comparison_with_spreadsheets.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ General terminology translation
3535
``DataFrame``
3636
~~~~~~~~~~~~~
3737

38-
A ``DataFrame`` in pandas is analogous to an Excel worksheet. While an Excel worksheet can contain
38+
A ``DataFrame`` in pandas is analogous to an Excel worksheet. While an Excel workbook can contain
3939
multiple worksheets, pandas ``DataFrame``\s exist independently.
4040

4141
``Series``

doc/source/getting_started/intro_tutorials/04_plotting.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ I want each of the columns in a separate subplot.
151151
152152
Separate subplots for each of the data columns are supported by the ``subplots`` argument
153153
of the ``plot`` functions. The builtin options available in each of the pandas plot
154-
functions that are worthwhile to have a look.
154+
functions are worth reviewing.
155155

156156
.. raw:: html
157157

doc/source/user_guide/enhancingperf.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,7 @@ We've gotten another big improvement. Let's check again where the time is spent:
247247

248248
.. ipython:: python
249249
250-
%%prun -l 4 apply_integrate_f(df["a"].to_numpy(), df["b"].to_numpy(), df["N"].to_numpy())
250+
%prun -l 4 apply_integrate_f(df["a"].to_numpy(), df["b"].to_numpy(), df["N"].to_numpy())
251251
252252
As one might expect, the majority of the time is now spent in ``apply_integrate_f``,
253253
so if we wanted to make anymore efficiencies we must continue to concentrate our

doc/source/whatsnew/v0.8.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,7 @@ New plotting methods
176176
Vytautas Jancauskas, the 2012 GSOC participant, has added many new plot
177177
types. For example, ``'kde'`` is a new option:
178178

179-
.. ipython:: python
179+
.. code-block:: python
180180
181181
s = pd.Series(
182182
np.concatenate((np.random.randn(1000), np.random.randn(1000) * 0.5 + 3))

doc/source/whatsnew/v1.2.1.rst

+17-11
Original file line numberDiff line numberDiff line change
@@ -14,18 +14,24 @@ including other versions of pandas.
1414

1515
Fixed regressions
1616
~~~~~~~~~~~~~~~~~
17-
- The deprecated attributes ``_AXIS_NAMES`` and ``_AXIS_NUMBERS`` of :class:`DataFrame` and :class:`Series` will no longer show up in ``dir`` or ``inspect.getmembers`` calls (:issue:`38740`)
1817
- Fixed regression in :meth:`to_csv` that created corrupted zip files when there were more rows than ``chunksize`` (:issue:`38714`)
19-
- Fixed a regression in ``groupby().rolling()`` where :class:`MultiIndex` levels were dropped (:issue:`38523`)
2018
- Fixed regression in repr of float-like strings of an ``object`` dtype having trailing 0's truncated after the decimal (:issue:`38708`)
2119
- Fixed regression in :meth:`DataFrame.groupby()` with :class:`Categorical` grouping column not showing unused categories for ``grouped.indices`` (:issue:`38642`)
2220
- Fixed regression in :meth:`DataFrame.any` and :meth:`DataFrame.all` not returning a result for tz-aware ``datetime64`` columns (:issue:`38723`)
2321
- Fixed regression in :meth:`DataFrame.__setitem__` raising ``ValueError`` when expanding :class:`DataFrame` and new column is from type ``"0 - name"`` (:issue:`39010`)
2422
- Fixed regression in :meth:`.GroupBy.sem` where the presence of non-numeric columns would cause an error instead of being dropped (:issue:`38774`)
23+
- Fixed regression in :meth:`DataFrame.loc.__setitem__` raising ``ValueError`` when :class:`DataFrame` has unsorted :class:`MultiIndex` columns and indexer is a scalar (:issue:`38601`)
2524
- Fixed regression in :func:`read_excel` with non-rawbyte file handles (:issue:`38788`)
26-
- Bug in :meth:`read_csv` with ``float_precision="high"`` caused segfault or wrong parsing of long exponent strings. This resulted in a regression in some cases as the default for ``float_precision`` was changed in pandas 1.2.0 (:issue:`38753`)
2725
- Fixed regression in :meth:`Rolling.skew` and :meth:`Rolling.kurt` modifying the object inplace (:issue:`38908`)
2826
- Fixed regression in :meth:`read_csv` and other read functions were the encoding error policy (``errors``) did not default to ``"replace"`` when no encoding was specified (:issue:`38989`)
27+
- Fixed regression in :meth:`DataFrame.apply` with ``axis=1`` using str accessor in apply function (:issue:`38979`)
28+
- Fixed regression in :meth:`DataFrame.replace` raising ``ValueError`` when :class:`DataFrame` has dtype ``bytes`` (:issue:`38900`)
29+
- Fixed regression in :meth:`DataFrameGroupBy.diff` raising for ``int8`` and ``int16`` columns (:issue:`39050`)
30+
- Fixed regression in :meth:`Series.fillna` that raised ``RecursionError`` with ``datetime64[ns, UTC]`` dtype (:issue:`38851`)
31+
- Fixed regression that raised ``AttributeError`` with PyArrow versions [0.16.0, 1.0.0) (:issue:`38801`)
32+
- Fixed regression in :meth:`DataFrame.groupby` when aggregating an :class:`ExtensionDType` that could fail for non-numeric values (:issue:`38980`)
33+
- Fixed regression in :meth:`DataFrame.loc.__setitem__` raising ``KeyError`` with :class:`MultiIndex` and list-like columns indexer enlarging :class:`DataFrame` (:issue:`39147`)
34+
- Fixed regression in comparisons between ``NaT`` and ``datetime.date`` objects incorrectly returning ``True`` (:issue:`39151`)
2935

3036
.. ---------------------------------------------------------------------------
3137
@@ -34,14 +40,8 @@ Fixed regressions
3440
Bug fixes
3541
~~~~~~~~~
3642

37-
I/O
38-
^^^
39-
40-
- Bumped minimum fastparquet version to 0.4.0 to avoid ``AttributeError`` from numba (:issue:`38344`)
41-
- Bumped minimum pymysql version to 0.8.1 to avoid test failures (:issue:`38344`)
42-
- Fixed ``AttributeError`` with PyArrow versions [0.16.0, 1.0.0) (:issue:`38801`)
43-
44-
-
43+
- Bug in :meth:`read_csv` with ``float_precision="high"`` caused segfault or wrong parsing of long exponent strings. This resulted in a regression in some cases as the default for ``float_precision`` was changed in pandas 1.2.0 (:issue:`38753`)
44+
- Bug in :func:`read_csv` not closing an opened file handle when a ``csv.Error`` or ``UnicodeDecodeError`` occurred while initializing (:issue:`39024`)
4545
-
4646

4747
.. ---------------------------------------------------------------------------
@@ -50,8 +50,14 @@ I/O
5050

5151
Other
5252
~~~~~
53+
54+
- The deprecated attributes ``_AXIS_NAMES`` and ``_AXIS_NUMBERS`` of :class:`DataFrame` and :class:`Series` will no longer show up in ``dir`` or ``inspect.getmembers`` calls (:issue:`38740`)
55+
- Bumped minimum fastparquet version to 0.4.0 to avoid ``AttributeError`` from numba (:issue:`38344`)
56+
- Bumped minimum pymysql version to 0.8.1 to avoid test failures (:issue:`38344`)
5357
- Fixed build failure on MacOS 11 in Python 3.9.1 (:issue:`38766`)
5458
- Added reference to backwards incompatible ``check_freq`` arg of :func:`testing.assert_frame_equal` and :func:`testing.assert_series_equal` in :ref:`pandas 1.1.0 whats new <whatsnew_110.api_breaking.testing.check_freq>` (:issue:`34050`)
59+
-
60+
-
5561

5662
.. ---------------------------------------------------------------------------
5763

0 commit comments

Comments
 (0)