Skip to content

Commit 4b7f880

Browse files
committed
Merge branch 'main' of https://github.com/pandas-dev/pandas into arrow-to-csv
2 parents d0e7d86 + 3c01ce2 commit 4b7f880

File tree

163 files changed

+2490
-1244
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

163 files changed

+2490
-1244
lines changed

.github/CODEOWNERS

+1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ web/ @datapythonista
99

1010
# docs
1111
doc/cheatsheet @Dr-Irv
12+
doc/source/development @noatamir
1213

1314
# pandas
1415
pandas/_libs/ @WillAyd

.github/actions/build_pandas/action.yml

+1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ runs:
1212
run: |
1313
micromamba info
1414
micromamba list
15+
pip list --pre
1516
shell: bash -el {0}
1617

1718
- name: Uninstall existing Pandas installation

.github/workflows/wheels.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ jobs:
110110
path: ./dist
111111

112112
- name: Build wheels
113-
uses: pypa/cibuildwheel@v2.13.1
113+
uses: pypa/cibuildwheel@v2.14.1
114114
# TODO: Build wheels from sdist again
115115
# There's some sort of weird race condition?
116116
# within Github that makes the sdist be missing files

README.md

+7-11
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,13 @@
55
-----------------
66

77
# pandas: powerful Python data analysis toolkit
8-
[![PyPI Latest Release](https://img.shields.io/pypi/v/pandas.svg)](https://pypi.org/project/pandas/)
9-
[![Conda Latest Release](https://anaconda.org/conda-forge/pandas/badges/version.svg)](https://anaconda.org/anaconda/pandas/)
10-
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3509134.svg)](https://doi.org/10.5281/zenodo.3509134)
11-
[![Package Status](https://img.shields.io/pypi/status/pandas.svg)](https://pypi.org/project/pandas/)
12-
[![License](https://img.shields.io/pypi/l/pandas.svg)](https://github.com/pandas-dev/pandas/blob/main/LICENSE)
13-
[![Coverage](https://codecov.io/github/pandas-dev/pandas/coverage.svg?branch=main)](https://codecov.io/gh/pandas-dev/pandas)
14-
[![Downloads](https://static.pepy.tech/personalized-badge/pandas?period=month&units=international_system&left_color=black&right_color=orange&left_text=PyPI%20downloads%20per%20month)](https://pepy.tech/project/pandas)
15-
[![Slack](https://img.shields.io/badge/join_Slack-information-brightgreen.svg?logo=slack)](https://pandas.pydata.org/docs/dev/development/community.html?highlight=slack#community-slack)
16-
[![Powered by NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](https://numfocus.org)
17-
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
18-
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
8+
9+
| | |
10+
| --- | --- |
11+
| Testing | [![CI - Test](https://github.com/pandas-dev/pandas/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/pandas-dev/pandas/actions/workflows/unit-tests.yml) [![Coverage](https://codecov.io/github/pandas-dev/pandas/coverage.svg?branch=main)](https://codecov.io/gh/pandas-dev/pandas) |
12+
| Package | [![PyPI Latest Release](https://img.shields.io/pypi/v/pandas.svg)](https://pypi.org/project/pandas/) [![PyPI Downloads](https://img.shields.io/pypi/dm/pandas.svg?label=PyPI%20downloads)](https://pypi.org/project/pandas/) [![Conda Latest Release](https://anaconda.org/conda-forge/pandas/badges/version.svg)](https://anaconda.org/conda-forge/pandas) [![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/pandas.svg?label=Conda%20downloads)](https://anaconda.org/conda-forge/pandas) |
13+
| Meta | [![Powered by NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](https://numfocus.org) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3509134.svg)](https://doi.org/10.5281/zenodo.3509134) [![License - BSD 3-Clause](https://img.shields.io/pypi/l/pandas.svg)](https://github.com/pandas-dev/pandas/blob/main/LICENSE) [![Slack](https://img.shields.io/badge/join_Slack-information-brightgreen.svg?logo=slack)](https://pandas.pydata.org/docs/dev/development/community.html?highlight=slack#community-slack) |
14+
1915

2016
## What is it?
2117

asv_bench/benchmarks/frame_methods.py

+11-5
Original file line numberDiff line numberDiff line change
@@ -388,7 +388,6 @@ def time_isnull_obj(self):
388388
class Fillna:
389389
params = (
390390
[True, False],
391-
["pad", "bfill"],
392391
[
393392
"float64",
394393
"float32",
@@ -400,9 +399,9 @@ class Fillna:
400399
"timedelta64[ns]",
401400
],
402401
)
403-
param_names = ["inplace", "method", "dtype"]
402+
param_names = ["inplace", "dtype"]
404403

405-
def setup(self, inplace, method, dtype):
404+
def setup(self, inplace, dtype):
406405
N, M = 10000, 100
407406
if dtype in ("datetime64[ns]", "datetime64[ns, tz]", "timedelta64[ns]"):
408407
data = {
@@ -420,9 +419,16 @@ def setup(self, inplace, method, dtype):
420419
if dtype == "Int64":
421420
values = values.round()
422421
self.df = DataFrame(values, dtype=dtype)
422+
self.fill_values = self.df.iloc[self.df.first_valid_index()].to_dict()
423+
424+
def time_fillna(self, inplace, dtype):
425+
self.df.fillna(value=self.fill_values, inplace=inplace)
426+
427+
def time_ffill(self, inplace, dtype):
428+
self.df.ffill(inplace=inplace)
423429

424-
def time_frame_fillna(self, inplace, method, dtype):
425-
self.df.fillna(inplace=inplace, method=method)
430+
def time_bfill(self, inplace, dtype):
431+
self.df.bfill(inplace=inplace)
426432

427433

428434
class Dropna:

asv_bench/benchmarks/groupby.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -423,24 +423,24 @@ def time_fill_value(self):
423423
self.df.groupby("g").shift(fill_value=99)
424424

425425

426-
class FillNA:
426+
class Fillna:
427427
def setup(self):
428428
N = 100
429429
self.df = DataFrame(
430430
{"group": [1] * N + [2] * N, "value": [np.nan, 1.0] * N}
431431
).set_index("group")
432432

433433
def time_df_ffill(self):
434-
self.df.groupby("group").fillna(method="ffill")
434+
self.df.groupby("group").ffill()
435435

436436
def time_df_bfill(self):
437-
self.df.groupby("group").fillna(method="bfill")
437+
self.df.groupby("group").bfill()
438438

439439
def time_srs_ffill(self):
440-
self.df.groupby("group")["value"].fillna(method="ffill")
440+
self.df.groupby("group")["value"].ffill()
441441

442442
def time_srs_bfill(self):
443-
self.df.groupby("group")["value"].fillna(method="bfill")
443+
self.df.groupby("group")["value"].bfill()
444444

445445

446446
class GroupByMethods:

asv_bench/benchmarks/io/csv.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -341,7 +341,7 @@ def setup(self, sep, thousands, engine):
341341
if thousands is not None:
342342
fmt = f":{thousands}"
343343
fmt = "{" + fmt + "}"
344-
df = df.applymap(lambda x: fmt.format(x))
344+
df = df.map(lambda x: fmt.format(x))
345345
df.to_csv(self.fname, sep=sep)
346346

347347
def time_thousands(self, sep, thousands, engine):

asv_bench/benchmarks/io/excel.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -57,9 +57,9 @@ def time_write_excel_style(self, engine):
5757
bio.seek(0)
5858
with ExcelWriter(bio, engine=engine) as writer:
5959
df_style = self.df.style
60-
df_style.applymap(lambda x: "border: red 1px solid;")
61-
df_style.applymap(lambda x: "color: blue")
62-
df_style.applymap(lambda x: "border-color: green black", subset=["float1"])
60+
df_style.map(lambda x: "border: red 1px solid;")
61+
df_style.map(lambda x: "color: blue")
62+
df_style.map(lambda x: "border-color: green black", subset=["float1"])
6363
df_style.to_excel(writer, sheet_name="Sheet1")
6464

6565

asv_bench/benchmarks/io/style.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ def _apply_func(s):
6666
self.st = self.df.style.apply(_apply_func, axis=1)
6767

6868
def _style_classes(self):
69-
classes = self.df.applymap(lambda v: ("cls-1" if v > 0 else ""))
69+
classes = self.df.map(lambda v: ("cls-1" if v > 0 else ""))
7070
classes.index, classes.columns = self.df.index, self.df.columns
7171
self.st = self.df.style.set_td_classes(classes)
7272

@@ -80,7 +80,7 @@ def _style_format(self):
8080
)
8181

8282
def _style_apply_format_hide(self):
83-
self.st = self.df.style.applymap(lambda v: "color: red;")
83+
self.st = self.df.style.map(lambda v: "color: red;")
8484
self.st.format("{:.3f}")
8585
self.st.hide(self.st.index[1:], axis=0)
8686
self.st.hide(self.st.columns[1:], axis=1)

asv_bench/benchmarks/join_merge.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -171,12 +171,12 @@ def time_join_dataframes_cross(self, sort):
171171

172172
class JoinIndex:
173173
def setup(self):
174-
N = 50000
174+
N = 5000
175175
self.left = DataFrame(
176-
np.random.randint(1, N / 500, (N, 2)), columns=["jim", "joe"]
176+
np.random.randint(1, N / 50, (N, 2)), columns=["jim", "joe"]
177177
)
178178
self.right = DataFrame(
179-
np.random.randint(1, N / 500, (N, 2)), columns=["jolie", "jolia"]
179+
np.random.randint(1, N / 50, (N, 2)), columns=["jolie", "jolia"]
180180
).set_index("jolie")
181181

182182
def time_left_outer_join_index(self):

asv_bench/benchmarks/reindex.py

-18
Original file line numberDiff line numberDiff line change
@@ -66,24 +66,6 @@ def time_reindex_method(self, method, constructor):
6666
self.ts.reindex(self.idx, method=method)
6767

6868

69-
class Fillna:
70-
params = ["pad", "backfill"]
71-
param_names = ["method"]
72-
73-
def setup(self, method):
74-
N = 100000
75-
self.idx = date_range("1/1/2000", periods=N, freq="1min")
76-
ts = Series(np.random.randn(N), index=self.idx)[::2]
77-
self.ts_reindexed = ts.reindex(self.idx)
78-
self.ts_float32 = self.ts_reindexed.astype("float32")
79-
80-
def time_reindexed(self, method):
81-
self.ts_reindexed.fillna(method=method)
82-
83-
def time_float_32(self, method):
84-
self.ts_float32.fillna(method=method)
85-
86-
8769
class LevelAlign:
8870
def setup(self):
8971
self.index = MultiIndex(

asv_bench/benchmarks/reshape.py

+19
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,25 @@ def time_transpose(self, dtype):
9393
self.df.T
9494

9595

96+
class ReshapeMaskedArrayDtype(ReshapeExtensionDtype):
97+
params = ["Int64", "Float64"]
98+
param_names = ["dtype"]
99+
100+
def setup(self, dtype):
101+
lev = pd.Index(list("ABCDEFGHIJ"))
102+
ri = pd.Index(range(1000))
103+
mi = MultiIndex.from_product([lev, ri], names=["foo", "bar"])
104+
105+
values = np.random.randn(10_000).astype(int)
106+
107+
ser = pd.Series(values, dtype=dtype, index=mi)
108+
df = ser.unstack("bar")
109+
# roundtrips -> df.stack().equals(ser)
110+
111+
self.ser = ser
112+
self.df = df
113+
114+
96115
class Unstack:
97116
params = ["int", "category"]
98117

asv_bench/benchmarks/series_methods.py

+11-6
Original file line numberDiff line numberDiff line change
@@ -81,18 +81,18 @@ class Fillna:
8181
params = [
8282
[
8383
"datetime64[ns]",
84+
"float32",
8485
"float64",
8586
"Float64",
8687
"Int64",
8788
"int64[pyarrow]",
8889
"string",
8990
"string[pyarrow]",
9091
],
91-
[None, "pad", "backfill"],
9292
]
93-
param_names = ["dtype", "method"]
93+
param_names = ["dtype"]
9494

95-
def setup(self, dtype, method):
95+
def setup(self, dtype):
9696
N = 10**6
9797
if dtype == "datetime64[ns]":
9898
data = date_range("2000-01-01", freq="S", periods=N)
@@ -114,9 +114,14 @@ def setup(self, dtype, method):
114114
self.ser = ser
115115
self.fill_value = fill_value
116116

117-
def time_fillna(self, dtype, method):
118-
value = self.fill_value if method is None else None
119-
self.ser.fillna(value=value, method=method)
117+
def time_fillna(self, dtype):
118+
self.ser.fillna(value=self.fill_value)
119+
120+
def time_ffill(self, dtype):
121+
self.ser.ffill()
122+
123+
def time_bfill(self, dtype):
124+
self.ser.bfill()
120125

121126

122127
class SearchSorted:

ci/code_checks.sh

+1-51
Original file line numberDiff line numberDiff line change
@@ -63,40 +63,17 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
6363

6464
MSG='Partially validate docstrings (EX01)' ; echo $MSG
6565
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=EX01 --ignore_functions \
66-
pandas.Series.backfill \
67-
pandas.Series.pad \
68-
pandas.Series.hist \
69-
pandas.errors.AccessorRegistrationWarning \
70-
pandas.errors.AttributeConflictWarning \
71-
pandas.errors.DataError \
7266
pandas.errors.IncompatibilityWarning \
7367
pandas.errors.InvalidComparison \
74-
pandas.errors.IntCastingNaNError \
7568
pandas.errors.LossySetitemError \
76-
pandas.errors.MergeError \
7769
pandas.errors.NoBufferPresent \
78-
pandas.errors.NullFrequencyError \
79-
pandas.errors.NumbaUtilError \
8070
pandas.errors.OptionError \
81-
pandas.errors.OutOfBoundsDatetime \
82-
pandas.errors.OutOfBoundsTimedelta \
83-
pandas.errors.ParserError \
8471
pandas.errors.PerformanceWarning \
8572
pandas.errors.PyperclipException \
8673
pandas.errors.PyperclipWindowsException \
8774
pandas.errors.UnsortedIndexError \
8875
pandas.errors.UnsupportedFunctionCall \
89-
pandas.test \
9076
pandas.NaT \
91-
pandas.io.formats.style.Styler.to_html \
92-
pandas.read_feather \
93-
pandas.DataFrame.to_feather \
94-
pandas.read_parquet \
95-
pandas.read_orc \
96-
pandas.read_sas \
97-
pandas.read_spss \
98-
pandas.read_sql_query \
99-
pandas.read_gbq \
10077
pandas.io.stata.StataReader.data_label \
10178
pandas.io.stata.StataReader.value_labels \
10279
pandas.io.stata.StataReader.variable_labels \
@@ -112,19 +89,9 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
11289
pandas.DatetimeIndex.snap \
11390
pandas.api.indexers.BaseIndexer \
11491
pandas.api.indexers.VariableOffsetWindowIndexer \
115-
pandas.io.formats.style.Styler.set_caption \
116-
pandas.io.formats.style.Styler.set_sticky \
117-
pandas.io.formats.style.Styler.set_uuid \
118-
pandas.io.formats.style.Styler.clear \
119-
pandas.io.formats.style.Styler.highlight_null \
120-
pandas.io.formats.style.Styler.highlight_max \
121-
pandas.io.formats.style.Styler.highlight_min \
122-
pandas.io.formats.style.Styler.bar \
123-
pandas.io.formats.style.Styler.to_string \
12492
pandas.api.extensions.ExtensionDtype \
12593
pandas.api.extensions.ExtensionArray \
126-
pandas.arrays.PandasArray \
127-
pandas.api.extensions.ExtensionArray._accumulate \
94+
pandas.arrays.NumpyExtensionArray \
12895
pandas.api.extensions.ExtensionArray._concat_same_type \
12996
pandas.api.extensions.ExtensionArray._formatter \
13097
pandas.api.extensions.ExtensionArray._from_factorized \
@@ -133,25 +100,8 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
133100
pandas.api.extensions.ExtensionArray._hash_pandas_object \
134101
pandas.api.extensions.ExtensionArray._reduce \
135102
pandas.api.extensions.ExtensionArray._values_for_factorize \
136-
pandas.api.extensions.ExtensionArray.dropna \
137-
pandas.api.extensions.ExtensionArray.equals \
138-
pandas.api.extensions.ExtensionArray.factorize \
139-
pandas.api.extensions.ExtensionArray.fillna \
140-
pandas.api.extensions.ExtensionArray.insert \
141103
pandas.api.extensions.ExtensionArray.interpolate \
142-
pandas.api.extensions.ExtensionArray.isin \
143-
pandas.api.extensions.ExtensionArray.isna \
144104
pandas.api.extensions.ExtensionArray.ravel \
145-
pandas.api.extensions.ExtensionArray.searchsorted \
146-
pandas.api.extensions.ExtensionArray.shift \
147-
pandas.api.extensions.ExtensionArray.unique \
148-
pandas.api.extensions.ExtensionArray.ndim \
149-
pandas.api.extensions.ExtensionArray.shape \
150-
pandas.api.extensions.ExtensionArray.tolist \
151-
pandas.DataFrame.pad \
152-
pandas.DataFrame.swapaxes \
153-
pandas.DataFrame.plot \
154-
pandas.DataFrame.to_gbq \
155105
pandas.DataFrame.__dataframe__
156106
RET=$(($RET + $?)) ; echo $MSG "DONE"
157107

doc/source/development/contributing_codebase.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -475,7 +475,7 @@ be located.
475475

476476
8) Is your test for one of the pandas-provided ExtensionArrays (``Categorical``,
477477
``DatetimeArray``, ``TimedeltaArray``, ``PeriodArray``, ``IntervalArray``,
478-
``PandasArray``, ``FloatArray``, ``BoolArray``, ``StringArray``)?
478+
``NumpyExtensionArray``, ``FloatArray``, ``BoolArray``, ``StringArray``)?
479479
This test likely belongs in one of:
480480

481481
- tests.arrays

doc/source/development/contributing_documentation.rst

+1-3
Original file line numberDiff line numberDiff line change
@@ -189,9 +189,7 @@ to speed up the documentation build. You can override this::
189189
python make.py html --num-jobs 4
190190

191191
Open the following file in a web browser to see the full documentation you
192-
just built::
193-
194-
doc/build/html/index.html
192+
just built ``doc/build/html/index.html``.
195193

196194
And you'll have the satisfaction of seeing your new and improved documentation!
197195

0 commit comments

Comments
 (0)