Skip to content

Commit e736b0a

Browse files
Merge upstream/master
2 parents 325170d + e81faa1 commit e736b0a

File tree

249 files changed

+7064
-6415
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

249 files changed

+7064
-6415
lines changed

.travis.yml

+19-16
Original file line numberDiff line numberDiff line change
@@ -30,31 +30,34 @@ matrix:
3030
- python: 3.5
3131

3232
include:
33-
- dist: trusty
34-
env:
33+
- env:
3534
- JOB="3.8" ENV_FILE="ci/deps/travis-38.yaml" PATTERN="(not slow and not network)"
3635

37-
- dist: trusty
38-
env:
36+
- env:
3937
- JOB="3.7" ENV_FILE="ci/deps/travis-37.yaml" PATTERN="(not slow and not network)"
4038

41-
- dist: trusty
42-
env:
43-
- JOB="3.6, locale" ENV_FILE="ci/deps/travis-36-locale.yaml" PATTERN="((not slow and not network) or (single and db))" LOCALE_OVERRIDE="zh_CN.UTF-8"
39+
- env:
40+
- JOB="3.6, locale" ENV_FILE="ci/deps/travis-36-locale.yaml" PATTERN="((not slow and not network) or (single and db))" LOCALE_OVERRIDE="zh_CN.UTF-8" SQL="1"
41+
services:
42+
- mysql
43+
- postgresql
4444

45-
- dist: trusty
46-
env:
47-
- JOB="3.6, coverage" ENV_FILE="ci/deps/travis-36-cov.yaml" PATTERN="((not slow and not network) or (single and db))" PANDAS_TESTING_MODE="deprecate" COVERAGE=true
45+
- env:
46+
- JOB="3.6, coverage" ENV_FILE="ci/deps/travis-36-cov.yaml" PATTERN="((not slow and not network) or (single and db))" PANDAS_TESTING_MODE="deprecate" COVERAGE=true SQL="1"
47+
services:
48+
- mysql
49+
- postgresql
4850

4951
# In allow_failures
50-
- dist: trusty
51-
env:
52-
- JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" PATTERN="slow"
52+
- env:
53+
- JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" PATTERN="slow" SQL="1"
54+
services:
55+
- mysql
56+
- postgresql
5357

5458
allow_failures:
55-
- dist: trusty
56-
env:
57-
- JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" PATTERN="slow"
59+
- env:
60+
- JOB="3.6, slow" ENV_FILE="ci/deps/travis-36-slow.yaml" PATTERN="slow" SQL="1"
5861

5962
before_install:
6063
- echo "before_install"

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ Here are just a few of the things that pandas does well:
124124
and saving/loading data from the ultrafast [**HDF5 format**][hdfstore]
125125
- [**Time series**][timeseries]-specific functionality: date range
126126
generation and frequency conversion, moving window statistics,
127-
moving window linear regressions, date shifting and lagging, etc.
127+
date shifting and lagging.
128128

129129

130130
[missing-data]: https://pandas.pydata.org/pandas-docs/stable/missing_data.html#working-with-missing-data

asv_bench/benchmarks/dtypes.py

+22
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from .pandas_vb_common import (
66
datetime_dtypes,
77
extension_dtypes,
8+
lib,
89
numeric_dtypes,
910
string_dtypes,
1011
)
@@ -40,4 +41,25 @@ def time_pandas_dtype_invalid(self, dtype):
4041
pass
4142

4243

44+
class InferDtypes:
45+
param_names = ["dtype"]
46+
data_dict = {
47+
"np-object": np.array([1] * 100000, dtype="O"),
48+
"py-object": [1] * 100000,
49+
"np-null": np.array([1] * 50000 + [np.nan] * 50000),
50+
"py-null": [1] * 50000 + [None] * 50000,
51+
"np-int": np.array([1] * 100000, dtype=int),
52+
"np-floating": np.array([1.0] * 100000, dtype=float),
53+
"empty": [],
54+
"bytes": [b"a"] * 100000,
55+
}
56+
params = list(data_dict.keys())
57+
58+
def time_infer_skipna(self, dtype):
59+
lib.infer_dtype(self.data_dict[dtype], skipna=True)
60+
61+
def time_infer(self, dtype):
62+
lib.infer_dtype(self.data_dict[dtype], skipna=False)
63+
64+
4365
from .pandas_vb_common import setup # noqa: F401 isort:skip

ci/azure/posix.yml

+2-9
Original file line numberDiff line numberDiff line change
@@ -69,20 +69,13 @@ jobs:
6969
displayName: 'Build versions'
7070

7171
- task: PublishTestResults@2
72+
condition: succeededOrFailed()
7273
inputs:
74+
failTaskOnFailedTests: true
7375
testResultsFiles: 'test-data.xml'
7476
testRunTitle: ${{ format('{0}-$(CONDA_PY)', parameters.name) }}
7577
displayName: 'Publish test results'
7678

77-
- powershell: |
78-
$(Get-Content "test-data.xml" | Out-String) -match 'failures="(.*?)"'
79-
if ($matches[1] -eq 0) {
80-
Write-Host "No test failures in test-data"
81-
} else {
82-
Write-Error "$($matches[1]) tests failed" # will produce $LASTEXITCODE=1
83-
}
84-
displayName: 'Check for test failures'
85-
8679
- script: |
8780
source activate pandas-dev
8881
python ci/print_skipped.py

ci/azure/windows.yml

+12-10
Original file line numberDiff line numberDiff line change
@@ -23,32 +23,34 @@ jobs:
2323
Write-Host "##vso[task.prependpath]$env:CONDA\Scripts"
2424
Write-Host "##vso[task.prependpath]$HOME/miniconda3/bin"
2525
displayName: 'Add conda to PATH'
26+
2627
- script: conda update -q -n base conda
2728
displayName: 'Update conda'
29+
2830
- bash: |
2931
conda env create -q --file ci\\deps\\azure-windows-$(CONDA_PY).yaml
3032
displayName: 'Create anaconda environment'
33+
3134
- bash: |
3235
source activate pandas-dev
3336
conda list
34-
ci\\incremental\\build.cmd
37+
python setup.py build_ext -q -i
38+
python -m pip install --no-build-isolation -e .
3539
displayName: 'Build'
40+
3641
- bash: |
3742
source activate pandas-dev
3843
ci/run_tests.sh
3944
displayName: 'Test'
45+
4046
- task: PublishTestResults@2
47+
condition: succeededOrFailed()
4148
inputs:
49+
failTaskOnFailedTests: true
4250
testResultsFiles: 'test-data.xml'
43-
testRunTitle: 'Windows-$(CONDA_PY)'
44-
- powershell: |
45-
$(Get-Content "test-data.xml" | Out-String) -match 'failures="(.*?)"'
46-
if ($matches[1] -eq 0) {
47-
Write-Host "No test failures in test-data"
48-
} else {
49-
Write-Error "$($matches[1]) tests failed" # will produce $LASTEXITCODE=1
50-
}
51-
displayName: 'Check for test failures'
51+
testRunTitle: ${{ format('{0}-$(CONDA_PY)', parameters.name) }}
52+
displayName: 'Publish test results'
53+
5254
- bash: |
5355
source activate pandas-dev
5456
python ci/print_skipped.py

ci/code_checks.sh

+7-3
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ function invgrep {
3939
}
4040

4141
if [[ "$GITHUB_ACTIONS" == "true" ]]; then
42-
FLAKE8_FORMAT="##[error]%(path)s:%(row)s:%(col)s:%(code):%(text)s"
42+
FLAKE8_FORMAT="##[error]%(path)s:%(row)s:%(col)s:%(code)s:%(text)s"
4343
INVGREP_PREPEND="##[error]"
4444
else
4545
FLAKE8_FORMAT="default"
@@ -52,7 +52,7 @@ if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then
5252
black --version
5353

5454
MSG='Checking black formatting' ; echo $MSG
55-
black . --check
55+
black . --check
5656
RET=$(($RET + $?)) ; echo $MSG "DONE"
5757

5858
# `setup.cfg` contains the list of error codes that are being ignored in flake8
@@ -104,7 +104,7 @@ if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then
104104
isort --version-number
105105

106106
# Imports - Check formatting using isort see setup.cfg for settings
107-
MSG='Check import format using isort ' ; echo $MSG
107+
MSG='Check import format using isort' ; echo $MSG
108108
ISORT_CMD="isort --recursive --check-only pandas asv_bench"
109109
if [[ "$GITHUB_ACTIONS" == "true" ]]; then
110110
eval $ISORT_CMD | awk '{print "##[error]" $0}'; RET=$(($RET + ${PIPESTATUS[0]}))
@@ -203,6 +203,10 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
203203
invgrep -R --include=*.{py,pyx} '\.__class__' pandas
204204
RET=$(($RET + $?)) ; echo $MSG "DONE"
205205

206+
MSG='Check for use of xrange instead of range' ; echo $MSG
207+
invgrep -R --include=*.{py,pyx} 'xrange' pandas
208+
RET=$(($RET + $?)) ; echo $MSG "DONE"
209+
206210
MSG='Check that no file in the repo contains trailing whitespaces' ; echo $MSG
207211
INVGREP_APPEND=" <- trailing whitespaces found"
208212
invgrep -RI --exclude=\*.{svg,c,cpp,html,js} --exclude-dir=env "\s$" *

ci/incremental/build.cmd

-9
This file was deleted.

ci/run_tests.sh

+2-2
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,6 @@ sh -c "$PYTEST_CMD"
3838

3939
if [[ "$COVERAGE" && $? == 0 && "$TRAVIS_BRANCH" == "master" ]]; then
4040
echo "uploading coverage"
41-
echo "bash <(curl -s https://codecov.io/bash) -Z -c -F $TYPE -f $COVERAGE_FNAME"
42-
bash <(curl -s https://codecov.io/bash) -Z -c -F $TYPE -f $COVERAGE_FNAME
41+
echo "bash <(curl -s https://codecov.io/bash) -Z -c -f $COVERAGE_FNAME"
42+
bash <(curl -s https://codecov.io/bash) -Z -c -f $COVERAGE_FNAME
4343
fi

ci/setup_env.sh

+2-1
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,8 @@ echo "conda list"
140140
conda list
141141

142142
# Install DB for Linux
143-
if [ "${TRAVIS_OS_NAME}" == "linux" ]; then
143+
144+
if [[ -n ${SQL:0} ]]; then
144145
echo "installing dbs"
145146
mysql -e 'create database pandas_nosetest;'
146147
psql -c 'create database pandas_nosetest;' -U postgres

doc/redirects.csv

+1-1
Original file line numberDiff line numberDiff line change
@@ -777,7 +777,7 @@ generated/pandas.io.formats.style.Styler.to_excel,../reference/api/pandas.io.for
777777
generated/pandas.io.formats.style.Styler.use,../reference/api/pandas.io.formats.style.Styler.use
778778
generated/pandas.io.formats.style.Styler.where,../reference/api/pandas.io.formats.style.Styler.where
779779
generated/pandas.io.json.build_table_schema,../reference/api/pandas.io.json.build_table_schema
780-
generated/pandas.io.json.json_normalize,../reference/api/pandas.io.json.json_normalize
780+
generated/pandas.io.json.json_normalize,../reference/api/pandas.json_normalize
781781
generated/pandas.io.stata.StataReader.data_label,../reference/api/pandas.io.stata.StataReader.data_label
782782
generated/pandas.io.stata.StataReader.value_labels,../reference/api/pandas.io.stata.StataReader.value_labels
783783
generated/pandas.io.stata.StataReader.variable_labels,../reference/api/pandas.io.stata.StataReader.variable_labels

doc/source/_static/favicon.ico

-3.81 KB
Binary file not shown.

doc/source/conf.py

+6-2
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,11 @@
204204
# Theme options are theme-specific and customize the look and feel of a theme
205205
# further. For a list of options available for each theme, see the
206206
# documentation.
207-
# html_theme_options = {}
207+
html_theme_options = {
208+
"external_links": [],
209+
"github_url": "https://github.com/pandas-dev/pandas",
210+
"twitter_url": "https://twitter.com/pandas_dev",
211+
}
208212

209213
# Add any paths that contain custom themes here, relative to this directory.
210214
# html_theme_path = ["themes"]
@@ -228,7 +232,7 @@
228232
# The name of an image file (within the static path) to use as favicon of the
229233
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
230234
# pixels large.
231-
html_favicon = os.path.join(html_static_path[0], "favicon.ico")
235+
html_favicon = "../../web/pandas/static/img/favicon.ico"
232236

233237
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
234238
# using the given strftime format.

doc/source/getting_started/overview.rst

+1-2
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,7 @@ Here are just a few of the things that pandas does well:
5757
Excel files, databases, and saving / loading data from the ultrafast **HDF5
5858
format**
5959
- **Time series**-specific functionality: date range generation and frequency
60-
conversion, moving window statistics, moving window linear regressions,
61-
date shifting and lagging, etc.
60+
conversion, moving window statistics, date shifting and lagging.
6261

6362
Many of these principles are here to address the shortcomings frequently
6463
experienced using other languages / scientific research environments. For data

doc/source/reference/io.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -50,13 +50,13 @@ JSON
5050
:toctree: api/
5151

5252
read_json
53+
json_normalize
5354

5455
.. currentmodule:: pandas.io.json
5556

5657
.. autosummary::
5758
:toctree: api/
5859

59-
json_normalize
6060
build_table_schema
6161

6262
.. currentmodule:: pandas

doc/source/user_guide/io.rst

+11-13
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
3535
binary;`SPSS <https://en.wikipedia.org/wiki/SPSS>`__;:ref:`read_spss<io.spss_reader>`;
3636
binary;`Python Pickle Format <https://docs.python.org/3/library/pickle.html>`__;:ref:`read_pickle<io.pickle>`;:ref:`to_pickle<io.pickle>`
3737
SQL;`SQL <https://en.wikipedia.org/wiki/SQL>`__;:ref:`read_sql<io.sql>`;:ref:`to_sql<io.sql>`
38-
SQL;`Google Big Query <https://en.wikipedia.org/wiki/BigQuery>`__;:ref:`read_gbq<io.bigquery>`;:ref:`to_gbq<io.bigquery>`
38+
SQL;`Google BigQuery <https://en.wikipedia.org/wiki/BigQuery>`__;:ref:`read_gbq<io.bigquery>`;:ref:`to_gbq<io.bigquery>`
3939

4040
:ref:`Here <io.perf>` is an informal performance comparison for some of these IO methods.
4141

@@ -2136,27 +2136,26 @@ into a flat table.
21362136

21372137
.. ipython:: python
21382138
2139-
from pandas.io.json import json_normalize
21402139
data = [{'id': 1, 'name': {'first': 'Coleen', 'last': 'Volk'}},
21412140
{'name': {'given': 'Mose', 'family': 'Regner'}},
21422141
{'id': 2, 'name': 'Faye Raker'}]
2143-
json_normalize(data)
2142+
pd.json_normalize(data)
21442143
21452144
.. ipython:: python
21462145
21472146
data = [{'state': 'Florida',
21482147
'shortname': 'FL',
21492148
'info': {'governor': 'Rick Scott'},
2150-
'counties': [{'name': 'Dade', 'population': 12345},
2151-
{'name': 'Broward', 'population': 40000},
2152-
{'name': 'Palm Beach', 'population': 60000}]},
2149+
'county': [{'name': 'Dade', 'population': 12345},
2150+
{'name': 'Broward', 'population': 40000},
2151+
{'name': 'Palm Beach', 'population': 60000}]},
21532152
{'state': 'Ohio',
21542153
'shortname': 'OH',
21552154
'info': {'governor': 'John Kasich'},
2156-
'counties': [{'name': 'Summit', 'population': 1234},
2157-
{'name': 'Cuyahoga', 'population': 1337}]}]
2155+
'county': [{'name': 'Summit', 'population': 1234},
2156+
{'name': 'Cuyahoga', 'population': 1337}]}]
21582157
2159-
json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])
2158+
pd.json_normalize(data, 'county', ['state', 'shortname', ['info', 'governor']])
21602159
21612160
The max_level parameter provides more control over which level to end normalization.
21622161
With max_level=1 the following snippet normalizes until 1st nesting level of the provided dict.
@@ -2169,7 +2168,7 @@ With max_level=1 the following snippet normalizes until 1st nesting level of the
21692168
'Name': 'Name001'}},
21702169
'Image': {'a': 'b'}
21712170
}]
2172-
json_normalize(data, max_level=1)
2171+
pd.json_normalize(data, max_level=1)
21732172
21742173
.. _io.jsonl:
21752174

@@ -4764,10 +4763,10 @@ Parquet supports partitioning of data based on the values of one or more columns
47644763
.. ipython:: python
47654764
47664765
df = pd.DataFrame({'a': [0, 0, 1, 1], 'b': [0, 1, 0, 1]})
4767-
df.to_parquet(fname='test', engine='pyarrow',
4766+
df.to_parquet(path='test', engine='pyarrow',
47684767
partition_cols=['a'], compression=None)
47694768
4770-
The `fname` specifies the parent directory to which data will be saved.
4769+
The `path` specifies the parent directory to which data will be saved.
47714770
The `partition_cols` are the column names by which the dataset will be partitioned.
47724771
Columns are partitioned in the order they are given. The partition splits are
47734772
determined by the unique values in the partition columns.
@@ -4829,7 +4828,6 @@ See also some :ref:`cookbook examples <cookbook.sql>` for some advanced strategi
48294828
The key functions are:
48304829

48314830
.. autosummary::
4832-
:toctree: ../reference/api/
48334831

48344832
read_sql_table
48354833
read_sql_query

doc/source/user_guide/text.rst

+13-2
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ These are places where the behavior of ``StringDtype`` objects differ from
7474
l. For ``StringDtype``, :ref:`string accessor methods<api.series.str>`
7575
that return **numeric** output will always return a nullable integer dtype,
7676
rather than either int or float dtype, depending on the presence of NA values.
77+
Methods returning **boolean** output will return a nullable boolean dtype.
7778

7879
.. ipython:: python
7980
@@ -89,12 +90,22 @@ l. For ``StringDtype``, :ref:`string accessor methods<api.series.str>`
8990
s.astype(object).str.count("a")
9091
s.astype(object).dropna().str.count("a")
9192
92-
When NA values are present, the output dtype is float64.
93+
When NA values are present, the output dtype is float64. Similarly for
94+
methods returning boolean values.
95+
96+
.. ipython:: python
97+
98+
s.str.isdigit()
99+
s.str.match("a")
93100
94101
2. Some string methods, like :meth:`Series.str.decode` are not available
95102
on ``StringArray`` because ``StringArray`` only holds strings, not
96103
bytes.
97-
104+
3. In comparision operations, :class:`arrays.StringArray` and ``Series`` backed
105+
by a ``StringArray`` will return an object with :class:`BooleanDtype`,
106+
rather than a ``bool`` dtype object. Missing values in a ``StringArray``
107+
will propagate in comparision operations, rather than always comparing
108+
unequal like :attr:`numpy.nan`.
98109

99110
Everything else that follows in the rest of this document applies equally to
100111
``string`` and ``object`` dtype.

0 commit comments

Comments
 (0)