Skip to content

Commit 7bcb004

Browse files
committed
Merge branch 'upstream-master'
2 parents 0f48b5b + c688a0f commit 7bcb004

File tree

244 files changed

+6675
-2243
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

244 files changed

+6675
-2243
lines changed

.pre-commit-config.yaml

-12
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,3 @@ repos:
3030
- id: isort
3131
language: python_venv
3232
exclude: ^pandas/__init__\.py$|^pandas/core/api\.py$
33-
- repo: https://github.com/pre-commit/mirrors-mypy
34-
rev: v0.730
35-
hooks:
36-
- id: mypy
37-
args:
38-
# As long as a some files are excluded from check-untyped-defs
39-
# we have to exclude it from the pre-commit hook as the configuration
40-
# is based on modules but the hook runs on files.
41-
- --no-check-untyped-defs
42-
- --follow-imports
43-
- skip
44-
files: pandas/

Makefile

+7
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,10 @@ doc:
2525
cd doc; \
2626
python make.py clean; \
2727
python make.py html
28+
29+
check:
30+
python3 scripts/validate_unwanted_patterns.py \
31+
--validation-type="private_function_across_module" \
32+
--included-file-extensions="py" \
33+
--excluded-file-paths=pandas/tests,asv_bench/,pandas/_vendored \
34+
pandas/

ci/code_checks.sh

+23-4
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,14 @@ if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then
116116
fi
117117
RET=$(($RET + $?)) ; echo $MSG "DONE"
118118

119+
MSG='Check for use of private module attribute access' ; echo $MSG
120+
if [[ "$GITHUB_ACTIONS" == "true" ]]; then
121+
$BASE_DIR/scripts/validate_unwanted_patterns.py --validation-type="private_function_across_module" --included-file-extensions="py" --excluded-file-paths=pandas/tests,asv_bench/,pandas/_vendored --format="##[error]{source_path}:{line_number}:{msg}" pandas/
122+
else
123+
$BASE_DIR/scripts/validate_unwanted_patterns.py --validation-type="private_function_across_module" --included-file-extensions="py" --excluded-file-paths=pandas/tests,asv_bench/,pandas/_vendored pandas/
124+
fi
125+
RET=$(($RET + $?)) ; echo $MSG "DONE"
126+
119127
echo "isort --version-number"
120128
isort --version-number
121129

@@ -179,6 +187,10 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
179187
invgrep -R --include="*.py" -E "super\(\w*, (self|cls)\)" pandas
180188
RET=$(($RET + $?)) ; echo $MSG "DONE"
181189

190+
MSG='Check for use of builtin filter function' ; echo $MSG
191+
invgrep -R --include="*.py" -P '(?<!def)[\(\s]filter\(' pandas
192+
RET=$(($RET + $?)) ; echo $MSG "DONE"
193+
182194
# Check for the following code in testing: `np.testing` and `np.array_equal`
183195
MSG='Check for invalid testing' ; echo $MSG
184196
invgrep -r -E --include '*.py' --exclude testing.py '(numpy|np)(\.testing|\.array_equal)' pandas/tests/
@@ -226,15 +238,22 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
226238
invgrep -R --include=*.{py,pyx} '!r}' pandas
227239
RET=$(($RET + $?)) ; echo $MSG "DONE"
228240

241+
# -------------------------------------------------------------------------
242+
# Type annotations
243+
229244
MSG='Check for use of comment-based annotation syntax' ; echo $MSG
230245
invgrep -R --include="*.py" -P '# type: (?!ignore)' pandas
231246
RET=$(($RET + $?)) ; echo $MSG "DONE"
232247

233-
# https://github.com/python/mypy/issues/7384
234-
# MSG='Check for missing error codes with # type: ignore' ; echo $MSG
235-
# invgrep -R --include="*.py" -P '# type: ignore(?!\[)' pandas
236-
# RET=$(($RET + $?)) ; echo $MSG "DONE"
248+
MSG='Check for missing error codes with # type: ignore' ; echo $MSG
249+
invgrep -R --include="*.py" -P '# type:\s?ignore(?!\[)' pandas
250+
RET=$(($RET + $?)) ; echo $MSG "DONE"
251+
252+
MSG='Check for use of Union[Series, DataFrame] instead of FrameOrSeriesUnion alias' ; echo $MSG
253+
invgrep -R --include="*.py" --exclude=_typing.py -E 'Union\[.*(Series.*DataFrame|DataFrame.*Series).*\]' pandas
254+
RET=$(($RET + $?)) ; echo $MSG "DONE"
237255

256+
# -------------------------------------------------------------------------
238257
MSG='Check for use of foo.__class__ instead of type(foo)' ; echo $MSG
239258
invgrep -R --include=*.{py,pyx} '\.__class__' pandas
240259
RET=$(($RET + $?)) ; echo $MSG "DONE"

ci/deps/azure-37-locale_slow.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ dependencies:
1818
- lxml
1919
- matplotlib=3.0.0
2020
- numpy=1.16.*
21-
- openpyxl=2.5.7
21+
- openpyxl=2.6.0
2222
- python-dateutil
2323
- python-blosc
2424
- pytz=2017.3

ci/deps/azure-37-minimum_versions.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ dependencies:
1919
- numba=0.46.0
2020
- numexpr=2.6.8
2121
- numpy=1.16.5
22-
- openpyxl=2.5.7
22+
- openpyxl=2.6.0
2323
- pytables=3.4.4
2424
- python-dateutil=2.7.3
2525
- pytz=2017.3

doc/source/development/contributing.rst

+1
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,7 @@ You will need `Build Tools for Visual Studio 2017
204204
You DO NOT need to install Visual Studio 2019.
205205
You only need "Build Tools for Visual Studio 2019" found by
206206
scrolling down to "All downloads" -> "Tools for Visual Studio 2019".
207+
In the installer, select the "C++ build tools" workload.
207208

208209
**Mac OS**
209210

doc/source/development/contributing_docstring.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -32,18 +32,18 @@ The next example gives an idea of what a docstring looks like:
3232
Parameters
3333
----------
3434
num1 : int
35-
First number to add
35+
First number to add.
3636
num2 : int
37-
Second number to add
37+
Second number to add.
3838
3939
Returns
4040
-------
4141
int
42-
The sum of `num1` and `num2`
42+
The sum of `num1` and `num2`.
4343
4444
See Also
4545
--------
46-
subtract : Subtract one integer from another
46+
subtract : Subtract one integer from another.
4747
4848
Examples
4949
--------
@@ -998,4 +998,4 @@ mapping function names to docstrings. Wherever possible, we prefer using
998998

999999
See ``pandas.core.generic.NDFrame.fillna`` for an example template, and
10001000
``pandas.core.series.Series.fillna`` and ``pandas.core.generic.frame.fillna``
1001-
for the filled versions.
1001+
for the filled versions.

doc/source/getting_started/install.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -274,7 +274,7 @@ html5lib 1.0.1 HTML parser for read_html (see :ref
274274
lxml 4.3.0 HTML parser for read_html (see :ref:`note <optional_html>`)
275275
matplotlib 2.2.3 Visualization
276276
numba 0.46.0 Alternative execution engine for rolling operations
277-
openpyxl 2.5.7 Reading / writing for xlsx files
277+
openpyxl 2.6.0 Reading / writing for xlsx files
278278
pandas-gbq 0.12.0 Google Big Query access
279279
psycopg2 2.7 PostgreSQL engine for sqlalchemy
280280
pyarrow 0.15.0 Parquet, ORC, and feather reading / writing

doc/source/getting_started/overview.rst

+8-8
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ Package overview
99
**pandas** is a `Python <https://www.python.org>`__ package providing fast,
1010
flexible, and expressive data structures designed to make working with
1111
"relational" or "labeled" data both easy and intuitive. It aims to be the
12-
fundamental high-level building block for doing practical, **real world** data
12+
fundamental high-level building block for doing practical, **real-world** data
1313
analysis in Python. Additionally, it has the broader goal of becoming **the
14-
most powerful and flexible open source data analysis / manipulation tool
14+
most powerful and flexible open source data analysis/manipulation tool
1515
available in any language**. It is already well on its way toward this goal.
1616

1717
pandas is well suited for many different kinds of data:
@@ -21,7 +21,7 @@ pandas is well suited for many different kinds of data:
2121
- Ordered and unordered (not necessarily fixed-frequency) time series data.
2222
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
2323
column labels
24-
- Any other form of observational / statistical data sets. The data actually
24+
- Any other form of observational / statistical data sets. The data
2525
need not be labeled at all to be placed into a pandas data structure
2626

2727
The two primary data structures of pandas, :class:`Series` (1-dimensional)
@@ -57,7 +57,7 @@ Here are just a few of the things that pandas does well:
5757
Excel files, databases, and saving / loading data from the ultrafast **HDF5
5858
format**
5959
- **Time series**-specific functionality: date range generation and frequency
60-
conversion, moving window statistics, date shifting and lagging.
60+
conversion, moving window statistics, date shifting, and lagging.
6161

6262
Many of these principles are here to address the shortcomings frequently
6363
experienced using other languages / scientific research environments. For data
@@ -101,12 +101,12 @@ fashion.
101101

102102
Also, we would like sensible default behaviors for the common API functions
103103
which take into account the typical orientation of time series and
104-
cross-sectional data sets. When using ndarrays to store 2- and 3-dimensional
104+
cross-sectional data sets. When using the N-dimensional array (ndarrays) to store 2- and 3-dimensional
105105
data, a burden is placed on the user to consider the orientation of the data
106106
set when writing functions; axes are considered more or less equivalent (except
107107
when C- or Fortran-contiguousness matters for performance). In pandas, the axes
108108
are intended to lend more semantic meaning to the data; i.e., for a particular
109-
data set there is likely to be a "right" way to orient the data. The goal,
109+
data set, there is likely to be a "right" way to orient the data. The goal,
110110
then, is to reduce the amount of mental effort required to code up data
111111
transformations in downstream functions.
112112

@@ -148,8 +148,8 @@ pandas possible. Thanks to `all of our contributors <https://github.com/pandas-d
148148
If you're interested in contributing, please visit the :ref:`contributing guide <contributing>`.
149149

150150
pandas is a `NumFOCUS <https://www.numfocus.org/open-source-projects/>`__ sponsored project.
151-
This will help ensure the success of development of pandas as a world-class open-source
152-
project, and makes it possible to `donate <https://pandas.pydata.org/donate.html>`__ to the project.
151+
This will help ensure the success of the development of pandas as a world-class open-source
152+
project and makes it possible to `donate <https://pandas.pydata.org/donate.html>`__ to the project.
153153

154154
Project governance
155155
------------------

doc/source/reference/frame.rst

+16
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ Attributes and underlying data
3737
DataFrame.shape
3838
DataFrame.memory_usage
3939
DataFrame.empty
40+
DataFrame.set_flags
4041

4142
Conversion
4243
~~~~~~~~~~
@@ -276,6 +277,21 @@ Time Series-related
276277
DataFrame.tz_convert
277278
DataFrame.tz_localize
278279

280+
.. _api.frame.flags:
281+
282+
Flags
283+
~~~~~
284+
285+
Flags refer to attributes of the pandas object. Properties of the dataset (like
286+
the date is was recorded, the URL it was accessed from, etc.) should be stored
287+
in :attr:`DataFrame.attrs`.
288+
289+
.. autosummary::
290+
:toctree: api/
291+
292+
Flags
293+
294+
279295
.. _api.frame.metadata:
280296

281297
Metadata

doc/source/reference/general_utility_functions.rst

+1
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ Exceptions and warnings
3737

3838
errors.AccessorRegistrationWarning
3939
errors.DtypeWarning
40+
errors.DuplicateLabelError
4041
errors.EmptyDataError
4142
errors.InvalidIndexError
4243
errors.MergeError

doc/source/reference/series.rst

+15
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ Attributes
3939
Series.empty
4040
Series.dtypes
4141
Series.name
42+
Series.flags
43+
Series.set_flags
4244

4345
Conversion
4446
----------
@@ -527,6 +529,19 @@ Sparse-dtype specific methods and attributes are provided under the
527529
Series.sparse.from_coo
528530
Series.sparse.to_coo
529531

532+
.. _api.series.flags:
533+
534+
Flags
535+
~~~~~
536+
537+
Flags refer to attributes of the pandas object. Properties of the dataset (like
538+
the date is was recorded, the URL it was accessed from, etc.) should be stored
539+
in :attr:`Series.attrs`.
540+
541+
.. autosummary::
542+
:toctree: api/
543+
544+
Flags
530545

531546
.. _api.series.metadata:
532547

doc/source/user_guide/computation.rst

+3
Original file line numberDiff line numberDiff line change
@@ -361,6 +361,9 @@ compute the mean absolute deviation on a rolling basis:
361361
@savefig rolling_apply_ex.png
362362
s.rolling(window=60).apply(mad, raw=True).plot(style='k')
363363
364+
Using the Numba engine
365+
~~~~~~~~~~~~~~~~~~~~~~
366+
364367
.. versionadded:: 1.0
365368

366369
Additionally, :meth:`~Rolling.apply` can leverage `Numba <https://numba.pydata.org/>`__

0 commit comments

Comments
 (0)