Skip to content

Commit df3991e

Browse files
Merge remote-tracking branch 'upstream/main' into bisect
2 parents ecd922b + 4fe2f31 commit df3991e

File tree

118 files changed

+2853
-862
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

118 files changed

+2853
-862
lines changed

.github/actions/build_pandas/action.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ runs:
66

77
- name: Environment Detail
88
run: |
9-
conda info
10-
conda list
9+
micromamba info
10+
micromamba list
1111
shell: bash -el {0}
1212

1313
- name: Build Pandas

.github/actions/setup-conda/action.yml

+8-9
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ inputs:
66
environment-name:
77
description: Name to use for the Conda environment
88
default: test
9-
python-version:
10-
description: Python version to install
9+
extra-specs:
10+
description: Extra packages to install
1111
required: false
1212
pyarrow-version:
1313
description: If set, overrides the PyArrow version in the Conda environment to the given string.
@@ -24,14 +24,13 @@ runs:
2424
if: ${{ inputs.pyarrow-version }}
2525

2626
- name: Install ${{ inputs.environment-file }}
27-
uses: conda-incubator/[email protected]
27+
uses: mamba-org/provision-with-micromamba@v12
2828
with:
2929
environment-file: ${{ inputs.environment-file }}
30-
activate-environment: ${{ inputs.environment-name }}
31-
python-version: ${{ inputs.python-version }}
32-
channel-priority: ${{ runner.os == 'macOS' && 'flexible' || 'strict' }}
30+
environment-name: ${{ inputs.environment-name }}
31+
extra-specs: ${{ inputs.extra-specs }}
3332
channels: conda-forge
34-
mamba-version: "0.24"
35-
use-mamba: true
36-
use-only-tar-bz2: true
33+
channel-priority: ${{ runner.os == 'macOS' && 'flexible' || 'strict' }}
3734
condarc-file: ci/condarc.yml
35+
cache-env: true
36+
cache-downloads: true

.github/workflows/asv-bot.yml

-6
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,6 @@ jobs:
3333
with:
3434
fetch-depth: 0
3535

36-
- name: Cache conda
37-
uses: actions/cache@v3
38-
with:
39-
path: ~/conda_pkgs_dir
40-
key: ${{ runner.os }}-conda-${{ hashFiles('${{ env.ENV_FILE }}') }}
41-
4236
# Although asv sets up its own env, deps are still needed
4337
# during discovery process
4438
- name: Set up Conda

.github/workflows/code-checks.yml

-12
Original file line numberDiff line numberDiff line change
@@ -52,12 +52,6 @@ jobs:
5252
with:
5353
fetch-depth: 0
5454

55-
- name: Cache conda
56-
uses: actions/cache@v3
57-
with:
58-
path: ~/conda_pkgs_dir
59-
key: ${{ runner.os }}-conda-${{ hashFiles('${{ env.ENV_FILE }}') }}
60-
6155
- name: Set up Conda
6256
uses: ./.github/actions/setup-conda
6357

@@ -117,12 +111,6 @@ jobs:
117111
with:
118112
fetch-depth: 0
119113

120-
- name: Cache conda
121-
uses: actions/cache@v3
122-
with:
123-
path: ~/conda_pkgs_dir
124-
key: ${{ runner.os }}-conda-${{ hashFiles('${{ env.ENV_FILE }}') }}
125-
126114
- name: Set up Conda
127115
uses: ./.github/actions/setup-conda
128116

.github/workflows/sdist.yml

+3-2
Original file line numberDiff line numberDiff line change
@@ -62,9 +62,10 @@ jobs:
6262
- name: Set up Conda
6363
uses: ./.github/actions/setup-conda
6464
with:
65-
environment-file: ""
65+
environment-file: false
6666
environment-name: pandas-sdist
67-
python-version: ${{ matrix.python-version }}
67+
extra-specs: |
68+
python =${{ matrix.python-version }}
6869
6970
- name: Install pandas from sdist
7071
run: |

.github/workflows/ubuntu.yml

-9
Original file line numberDiff line numberDiff line change
@@ -134,15 +134,6 @@ jobs:
134134
with:
135135
fetch-depth: 0
136136

137-
- name: Cache conda
138-
uses: actions/cache@v3
139-
env:
140-
CACHE_NUMBER: 0
141-
with:
142-
path: ~/conda_pkgs_dir
143-
key: ${{ runner.os }}-conda-${{ env.CACHE_NUMBER }}-${{
144-
hashFiles('${{ env.ENV_FILE }}') }}
145-
146137
- name: Extra installs
147138
# xsel for clipboard tests
148139
run: sudo apt-get update && sudo apt-get install -y xsel ${{ env.EXTRA_APT }}

asv_bench/benchmarks/indexing.py

+22-8
Original file line numberDiff line numberDiff line change
@@ -157,25 +157,39 @@ def time_boolean_rows_boolean(self):
157157

158158

159159
class DataFrameNumericIndexing:
160-
def setup(self):
160+
161+
params = [
162+
(Int64Index, UInt64Index, Float64Index),
163+
("unique_monotonic_inc", "nonunique_monotonic_inc"),
164+
]
165+
param_names = ["index_dtype", "index_structure"]
166+
167+
def setup(self, index, index_structure):
168+
N = 10**5
169+
indices = {
170+
"unique_monotonic_inc": index(range(N)),
171+
"nonunique_monotonic_inc": index(
172+
list(range(55)) + [54] + list(range(55, N - 1))
173+
),
174+
}
161175
self.idx_dupe = np.array(range(30)) * 99
162-
self.df = DataFrame(np.random.randn(100000, 5))
176+
self.df = DataFrame(np.random.randn(N, 5), index=indices[index_structure])
163177
self.df_dup = concat([self.df, 2 * self.df, 3 * self.df])
164-
self.bool_indexer = [True] * 50000 + [False] * 50000
178+
self.bool_indexer = [True] * (N // 2) + [False] * (N - N // 2)
165179

166-
def time_iloc_dups(self):
180+
def time_iloc_dups(self, index, index_structure):
167181
self.df_dup.iloc[self.idx_dupe]
168182

169-
def time_loc_dups(self):
183+
def time_loc_dups(self, index, index_structure):
170184
self.df_dup.loc[self.idx_dupe]
171185

172-
def time_iloc(self):
186+
def time_iloc(self, index, index_structure):
173187
self.df.iloc[:100, 0]
174188

175-
def time_loc(self):
189+
def time_loc(self, index, index_structure):
176190
self.df.loc[:100, 0]
177191

178-
def time_bool_indexer(self):
192+
def time_bool_indexer(self, index, index_structure):
179193
self.df[self.bool_indexer]
180194

181195

asv_bench/benchmarks/io/excel.py

+19
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,25 @@ def time_write_excel(self, engine):
4747
writer.save()
4848

4949

50+
class WriteExcelStyled:
51+
params = ["openpyxl", "xlsxwriter"]
52+
param_names = ["engine"]
53+
54+
def setup(self, engine):
55+
self.df = _generate_dataframe()
56+
57+
def time_write_excel_style(self, engine):
58+
bio = BytesIO()
59+
bio.seek(0)
60+
writer = ExcelWriter(bio, engine=engine)
61+
df_style = self.df.style
62+
df_style.applymap(lambda x: "border: red 1px solid;")
63+
df_style.applymap(lambda x: "color: blue")
64+
df_style.applymap(lambda x: "border-color: green black", subset=["float1"])
65+
df_style.to_excel(writer, sheet_name="Sheet1")
66+
writer.save()
67+
68+
5069
class ReadExcel:
5170

5271
params = ["xlrd", "openpyxl", "odf"]

asv_bench/benchmarks/io/sql.py

+28-2
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ def setup(self, connection):
3939
index=tm.makeStringIndex(N),
4040
)
4141
self.df.loc[1000:3000, "float_with_nan"] = np.nan
42+
self.df["date"] = self.df["datetime"].dt.date
43+
self.df["time"] = self.df["datetime"].dt.time
4244
self.df["datetime_string"] = self.df["datetime"].astype(str)
4345
self.df.to_sql(self.table_name, self.con, if_exists="replace")
4446

@@ -53,7 +55,16 @@ class WriteSQLDtypes:
5355

5456
params = (
5557
["sqlalchemy", "sqlite"],
56-
["float", "float_with_nan", "string", "bool", "int", "datetime"],
58+
[
59+
"float",
60+
"float_with_nan",
61+
"string",
62+
"bool",
63+
"int",
64+
"date",
65+
"time",
66+
"datetime",
67+
],
5768
)
5869
param_names = ["connection", "dtype"]
5970

@@ -78,6 +89,8 @@ def setup(self, connection, dtype):
7889
index=tm.makeStringIndex(N),
7990
)
8091
self.df.loc[1000:3000, "float_with_nan"] = np.nan
92+
self.df["date"] = self.df["datetime"].dt.date
93+
self.df["time"] = self.df["datetime"].dt.time
8194
self.df["datetime_string"] = self.df["datetime"].astype(str)
8295
self.df.to_sql(self.table_name, self.con, if_exists="replace")
8396

@@ -105,6 +118,8 @@ def setup(self):
105118
index=tm.makeStringIndex(N),
106119
)
107120
self.df.loc[1000:3000, "float_with_nan"] = np.nan
121+
self.df["date"] = self.df["datetime"].dt.date
122+
self.df["time"] = self.df["datetime"].dt.time
108123
self.df["datetime_string"] = self.df["datetime"].astype(str)
109124
self.df.to_sql(self.table_name, self.con, if_exists="replace")
110125

@@ -122,7 +137,16 @@ def time_read_sql_table_parse_dates(self):
122137

123138
class ReadSQLTableDtypes:
124139

125-
params = ["float", "float_with_nan", "string", "bool", "int", "datetime"]
140+
params = [
141+
"float",
142+
"float_with_nan",
143+
"string",
144+
"bool",
145+
"int",
146+
"date",
147+
"time",
148+
"datetime",
149+
]
126150
param_names = ["dtype"]
127151

128152
def setup(self, dtype):
@@ -141,6 +165,8 @@ def setup(self, dtype):
141165
index=tm.makeStringIndex(N),
142166
)
143167
self.df.loc[1000:3000, "float_with_nan"] = np.nan
168+
self.df["date"] = self.df["datetime"].dt.date
169+
self.df["time"] = self.df["datetime"].dt.time
144170
self.df["datetime_string"] = self.df["datetime"].astype(str)
145171
self.df.to_sql(self.table_name, self.con, if_exists="replace")
146172

+64
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
import numpy as np
2+
3+
import pandas as pd
4+
from pandas import offsets
5+
6+
7+
class DatetimeStrftime:
8+
timeout = 1500
9+
params = [1000, 10000]
10+
param_names = ["obs"]
11+
12+
def setup(self, obs):
13+
d = "2018-11-29"
14+
dt = "2018-11-26 11:18:27.0"
15+
self.data = pd.DataFrame(
16+
{
17+
"dt": [np.datetime64(dt)] * obs,
18+
"d": [np.datetime64(d)] * obs,
19+
"r": [np.random.uniform()] * obs,
20+
}
21+
)
22+
23+
def time_frame_date_to_str(self, obs):
24+
self.data["d"].astype(str)
25+
26+
def time_frame_date_formatting_default(self, obs):
27+
self.data["d"].dt.strftime(date_format="%Y-%m-%d")
28+
29+
def time_frame_date_formatting_custom(self, obs):
30+
self.data["d"].dt.strftime(date_format="%Y---%m---%d")
31+
32+
def time_frame_datetime_to_str(self, obs):
33+
self.data["dt"].astype(str)
34+
35+
def time_frame_datetime_formatting_default_date_only(self, obs):
36+
self.data["dt"].dt.strftime(date_format="%Y-%m-%d")
37+
38+
def time_frame_datetime_formatting_default(self, obs):
39+
self.data["dt"].dt.strftime(date_format="%Y-%m-%d %H:%M:%S")
40+
41+
def time_frame_datetime_formatting_default_with_float(self, obs):
42+
self.data["dt"].dt.strftime(date_format="%Y-%m-%d %H:%M:%S.%f")
43+
44+
def time_frame_datetime_formatting_custom(self, obs):
45+
self.data["dt"].dt.strftime(date_format="%Y-%m-%d --- %H:%M:%S")
46+
47+
48+
class BusinessHourStrftime:
49+
timeout = 1500
50+
params = [1000, 10000]
51+
param_names = ["obs"]
52+
53+
def setup(self, obs):
54+
self.data = pd.DataFrame(
55+
{
56+
"off": [offsets.BusinessHour()] * obs,
57+
}
58+
)
59+
60+
def time_frame_offset_str(self, obs):
61+
self.data["off"].apply(str)
62+
63+
def time_frame_offset_repr(self, obs):
64+
self.data["off"].apply(repr)

doc/source/getting_started/tutorials.rst

+1
Original file line numberDiff line numberDiff line change
@@ -118,3 +118,4 @@ Various tutorials
118118
* `Pandas and Python: Top 10, by Manish Amde <https://manishamde.github.io/blog/2013/03/07/pandas-and-python-top-10/>`_
119119
* `Pandas DataFrames Tutorial, by Karlijn Willems <https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python>`_
120120
* `A concise tutorial with real life examples <https://tutswiki.com/pandas-cookbook/chapter1/>`_
121+
* `430+ Searchable Pandas recipes by Isshin Inada <https://skytowner.com/explore/pandas_recipes_reference>`_

doc/source/reference/general_functions.rst

+1
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ Data manipulations
2323
merge_asof
2424
concat
2525
get_dummies
26+
from_dummies
2627
factorize
2728
unique
2829
wide_to_long

doc/source/user_guide/io.rst

+19-6
Original file line numberDiff line numberDiff line change
@@ -2559,16 +2559,29 @@ Let's look at a few examples.
25592559

25602560
Read a URL with no options:
25612561

2562-
.. ipython:: python
2562+
.. code-block:: ipython
25632563
2564-
url = "https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list"
2565-
dfs = pd.read_html(url)
2566-
dfs
2564+
In [320]: "https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list"
2565+
In [321]: pd.read_html(url)
2566+
Out[321]:
2567+
[ Bank NameBank CityCity StateSt ... Acquiring InstitutionAI Closing DateClosing FundFund
2568+
0 Almena State Bank Almena KS ... Equity Bank October 23, 2020 10538
2569+
1 First City Bank of Florida Fort Walton Beach FL ... United Fidelity Bank, fsb October 16, 2020 10537
2570+
2 The First State Bank Barboursville WV ... MVB Bank, Inc. April 3, 2020 10536
2571+
3 Ericson State Bank Ericson NE ... Farmers and Merchants Bank February 14, 2020 10535
2572+
4 City National Bank of New Jersey Newark NJ ... Industrial Bank November 1, 2019 10534
2573+
.. ... ... ... ... ... ... ...
2574+
558 Superior Bank, FSB Hinsdale IL ... Superior Federal, FSB July 27, 2001 6004
2575+
559 Malta National Bank Malta OH ... North Valley Bank May 3, 2001 4648
2576+
560 First Alliance Bank & Trust Co. Manchester NH ... Southern New Hampshire Bank & Trust February 2, 2001 4647
2577+
561 National State Bank of Metropolis Metropolis IL ... Banterra Bank of Marion December 14, 2000 4646
2578+
562 Bank of Honolulu Honolulu HI ... Bank of the Orient October 13, 2000 4645
2579+
2580+
[563 rows x 7 columns]]
25672581
25682582
.. note::
25692583

2570-
The data from the above URL changes every Monday so the resulting data above
2571-
and the data below may be slightly different.
2584+
The data from the above URL changes every Monday so the resulting data above may be slightly different.
25722585

25732586
Read in the content of the file from the above URL and pass it to ``read_html``
25742587
as a string:

0 commit comments

Comments
 (0)