Skip to content

ENH: Include df.attrs metadata in to_csv output #53740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 123 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
123 commits
Select commit Hold shift + click to select a range
c679b9f
base fixture and test for added function
canthonyscott Jun 8, 2023
1e5747e
Framework for to_csv to function with comment writing
canthonyscott Jun 8, 2023
9d8727b
Feature added, comment lines can be written out to csv files
canthonyscott Jun 8, 2023
64d10a8
Feature added, comment lines can be written out to csv files
canthonyscott Jun 8, 2023
1baa312
Update 02_read_write.rst (#53559)
okjustin Jun 8, 2023
d066909
DOC: Fixing EX01 - Added examples (#53561)
DeaMariaLeon Jun 8, 2023
d372f5b
CI/DEPS: Add xfail(strict=False) to related unstable sorting changes …
mroeschke Jun 8, 2023
7275637
prevent errors when comment is supplied, but comment_lines is None
canthonyscott Jun 8, 2023
d3a2d90
removed file write call that was used for testing
canthonyscott Jun 9, 2023
8f5d12a
Change for comments to be sources from df.attrs -- Not sure how to ha…
canthonyscott Jun 15, 2023
3d3e9fc
renamed func
canthonyscott Jun 16, 2023
00059f5
when saving as csv with comma delim, remove commas from the attr outp…
canthonyscott Jun 16, 2023
14abcee
moving tests to new location and not using static data files
canthonyscott Jun 16, 2023
b6eea23
fix precommit
canthonyscott Jun 16, 2023
0d2b004
Added fixtures for testing comments
canthonyscott Jun 16, 2023
efc9269
removed spacing
canthonyscott Jun 16, 2023
c1c1266
refactored tests to test writing outputs wit df.attrs as comment lines
canthonyscott Jun 16, 2023
11fcdc4
removed todo line
canthonyscott Jun 20, 2023
b6bb7a9
updated docstring
canthonyscott Jun 20, 2023
d42347a
updated docstring
canthonyscott Jun 20, 2023
0a8efaf
fixed failing cicd checks
canthonyscott Jun 20, 2023
c77f0df
DOC: Fixing EX01 - Added examples (#53564)
DeaMariaLeon Jun 8, 2023
5c64661
Upload nightlies to new location (#53341)
jarrodmillman Jun 8, 2023
8eefe00
CLN: Cleanup after CoW setitem PRs (#53142)
phofl Jun 9, 2023
47ce2da
DOC: Fixing EX01 - Added examples (#53573)
DeaMariaLeon Jun 9, 2023
73d0fda
TST: Use more pytest fixtures (#53567)
mroeschke Jun 9, 2023
477ce16
TST: Use fixtures instead of TestPlotBase (#53550)
mroeschke Jun 9, 2023
fe2c2ea
DOC: create a table for period aliases (#53530)
natmokval Jun 9, 2023
3a5eec7
DOC: Fixing EX01 - Added examples (#53575)
DeaMariaLeon Jun 9, 2023
d98825f
DEPR `fill_method` and `limit` keywords in `pct_change` (#53520)
Charlie-XIAO Jun 9, 2023
d6d9580
TST: Use monkeypatch instead of custom func for env variables (#53563)
mroeschke Jun 9, 2023
c5ab7c5
CI: Ensure nightly wheels are uploaded instead of just sdist (#53578)
mroeschke Jun 9, 2023
6c5bfaf
TYP: Fix typing issues, mainly Something => IndexLabel (#53469)
behrenhoff Jun 10, 2023
d1a2178
Corrected minor typos in the paragraph below 'Formatting the Display'…
kyunghei Jun 11, 2023
a01d4dd
DOC: Fixing EX01 - Added examples (#53619)
DeaMariaLeon Jun 12, 2023
5f25ad1
CI: Fix the deprecation bot (#53593)
lithomas1 Jun 12, 2023
8d48cd1
Bump pypa/cibuildwheel from 2.13.0 to 2.13.1 (#53614)
dependabot[bot] Jun 12, 2023
ab35d79
TST: Mark numba & threading tests as single cpu (#53608)
mroeschke Jun 12, 2023
4ecf52c
DOC: pd.Period and pd.period_range should document that they accept d…
ABizzinotto Jun 12, 2023
de0ebc2
DOC: Fixed inconsistencies in pandas.DataFrame.pivot_table docstring …
tpaxman Jun 12, 2023
4d91b05
ENH: Implement PandasArray, DTA, TDA interpolate (#53594)
jbrockmendel Jun 12, 2023
3cd40a1
PERF: Series.str.split(expand=True) for pyarrow-backed strings (#53585)
lukemanley Jun 12, 2023
25df9d6
DOC: Multi-conditional examples added to .loc docstring (#53572)
sweisss Jun 12, 2023
a4615ed
DEPR: pd.value_counts (#53493)
jbrockmendel Jun 12, 2023
da61d30
REF: move interpolate validation to core.missing (#53580)
jbrockmendel Jun 12, 2023
29dd953
ENH: Groupby.transform support string input with engine=numba (#53579)
lithomas1 Jun 12, 2023
400afad
DEPR: NDFrame.interpolate with ffill/bfill methods (#53607)
jbrockmendel Jun 12, 2023
ca70207
BUG: clean_fill_method failing to raise (#53620)
jbrockmendel Jun 12, 2023
37cf956
PDEP-9: Allow third-party projects to register pandas connectors with…
datapythonista Jun 13, 2023
ab1f011
TST: Add test for apply cast types GH#9506 (#53591)
liang3zy22 Jun 13, 2023
7d2da2d
BUG: DataFrame.stack with sort=True and unsorted MultiIndex levels (#…
rhshadrach Jun 13, 2023
bbc05c6
ENH: Series.str.join for ArrowDtype(pa.string()) (#53646)
lukemanley Jun 13, 2023
5f68c6c
DEPR: interpolate with object dtype (#53638)
jbrockmendel Jun 13, 2023
d1f3c02
BUG: interpolate with complex dtype (#53635)
jbrockmendel Jun 13, 2023
72e8b60
DOC: Added links to GitHub tutorials/resources for Forking Workflow i…
ssharp0 Jun 13, 2023
84f850c
TST: Refactor slow test (#53610)
mroeschke Jun 13, 2023
4c3b663
BUG: RangeIndex.union(sort=True) with another RangeIndex (#53495)
mroeschke Jun 13, 2023
ebb8653
ENH: Series.explode to support pyarrow-backed list types (#53602)
lukemanley Jun 13, 2023
b13d943
CI: Build pandas even if doctests fail (#53657)
mroeschke Jun 14, 2023
59566c4
BUG: groupby sum turning `inf+inf` and `(-inf)+(-inf)` into `nan` (#5…
Charlie-XIAO Jun 14, 2023
1007fb6
DEPR: method, limit in NDFrame.replace (#53492)
jbrockmendel Jun 14, 2023
2866c6b
PERF: Series.str.get_dummies for ArrowDtype(pa.string()) (#53655)
lukemanley Jun 14, 2023
6bc71ed
TYP: core.missing (#53625)
jbrockmendel Jun 14, 2023
b848ad4
CI: Attempt to fix wheel builds (#53670)
lithomas1 Jun 14, 2023
85678ee
DOC: Fixing EX01 - Added examples (#53647)
DeaMariaLeon Jun 14, 2023
b05c02c
CI/TST: Mark test_to_read_gcs as single_cpu (#53677)
mroeschke Jun 15, 2023
5fcc77c
BUG/CoW: is_range_indexer can't handle very large arrays (#53672)
lithomas1 Jun 15, 2023
8ad75e7
ENH: Allow numba aggregations to return non-float64 results (#53444)
lithomas1 Jun 15, 2023
839405d
DOC: Move Whatsnew for CoW fix (#53690)
lithomas1 Jun 15, 2023
9ce58a5
CI: Adjust tests for release of numpy 1.25 (#53715)
lithomas1 Jun 18, 2023
aea5aa9
DEPR: deprecate obj argument in GroupBy.get_group (#53571)
natmokval Jun 19, 2023
3c2c36d
Use _values_for_factorize by default for hashing ExtensionArrays (#53…
jorisvandenbossche Jun 19, 2023
3f3192a
CI: ignore experimental warnings from numba (#53726)
MarcoGorelli Jun 19, 2023
f10297a
CI: fix `pytest scripts` (#53727)
MarcoGorelli Jun 19, 2023
77dd5fd
TST: add test for reindexing rows with matching index uses shallow co…
jorisvandenbossche Jun 19, 2023
64521af
BUG: IntervalIndex.get_indexer raising for read only array (#53703)
phofl Jun 19, 2023
cc8aea2
ERR: Shorten traceback in groupby _cython_agg_general (#52992)
rhshadrach Jun 19, 2023
ff81745
CoW: Add lazy copy mechanism to DataFrame constructor for dict of Ind…
phofl Jun 20, 2023
7d4a3d9
CI: fix numba typing (#53730)
MarcoGorelli Jun 20, 2023
7f27fb3
DEPR: deprecate unit parameters ’T’, 't', 'L', and 'l' (#53557)
natmokval Jun 20, 2023
4671a1f
CoW: Return read-only array in Index.values (#53704)
phofl Jun 20, 2023
fac98dd
DOC: Fixing EX01 - Added examples (#53689)
DeaMariaLeon Jun 20, 2023
525ece3
TST: Use more pytest fixtures (#53679)
mroeschke Jun 20, 2023
8b12efd
BUG: resampling empty series loses time zone from dtype (#53736)
Charlie-XIAO Jun 20, 2023
09afa76
BUG: concat coercing arrow to object with null type (#53702)
phofl Jun 20, 2023
a4ff516
PERF: concatenation of MultiIndexed objects (MultiIndex.append) (#53697)
lukemanley Jun 20, 2023
956f6e1
TST: Add test for duplcated columns and usecols GH#11823 (#53683)
liang3zy22 Jun 20, 2023
97a95ff
BUG: series with complex nan (#53682)
Charlie-XIAO Jun 20, 2023
bdce851
BUG: astype('category') on dataframe backed by non-writeable arrays r…
lukemanley Jun 20, 2023
8d74e20
TST: xfail `test_rolling_var_numerical_issues` on Mac (#53661)
Jython1415 Jun 20, 2023
404aadc
BUG: Adds missing raises for numpy.timedelta64[M/Y] in pandas.Timedel…
mcgeestocks Jun 20, 2023
2a02838
DEPR: Deprecate literal json string input to read_json (#53409)
rmhowe425 Jun 20, 2023
14077f2
DOC: Add to_sql example of conflict with `method` parameter (#53264)
mroeschke Jun 20, 2023
a514e86
BUG: reindex with expansion and non-nanosecond dtype (#53505)
mroeschke Jun 20, 2023
f416bd2
BUG: convert_dtype(dtype_backend=nullable_numpy) with ArrowDtype (#53…
mroeschke Jun 20, 2023
2fef2ed
BUG: Indexing a timestamp ArrowDtype Index (#53652)
mroeschke Jun 20, 2023
6ede1d8
TYP: reshape.merge (#53752)
jbrockmendel Jun 21, 2023
5710d1a
TST: Use more pytest fixtures (#53750)
mroeschke Jun 21, 2023
cf058cb
BUG: DataFrame construction with dictionary ArrowDtype columns (#53654)
mroeschke Jun 21, 2023
acafaf1
DOC: point out that only period aliases are valid for the method asfr…
natmokval Jun 21, 2023
d7fe739
DOC: Fixing EX01 - Added examples (#53741)
DeaMariaLeon Jun 21, 2023
a59b1cf
DOC: operator wrapper names do not match math operators (#53765)
rsm-23 Jun 21, 2023
ff612aa
DOC: Fixing EX01 - Added examples (#53725)
DeaMariaLeon Jun 21, 2023
834010a
DEPR: Deprecate DataFrame.last and Series.last (#53710)
rmhowe425 Jun 21, 2023
d700bcd
BUG / CoW: Series.transform not respecting CoW (#53747)
phofl Jun 21, 2023
826f205
Revert "BUG: DataFrame.stack with sort=True and unsorted MultiIndex l…
rhshadrach Jun 21, 2023
d8260bf
TST: refactor data path for xml tests (#53766)
fangchenli Jun 21, 2023
9a1e1a0
TST: Reduce memory pressure of plotting tests (#53660)
mroeschke Jun 21, 2023
8ff4879
DOC note pytest bump (#53768)
MarcoGorelli Jun 21, 2023
6890cf2
COMPAT: Remove np.compat (#53774)
mroeschke Jun 21, 2023
59881f3
Added suggested new line to fix doc code example (#53775)
rahulsiloniya Jun 21, 2023
bf76e30
TST: Make test_complibs deterministic (#53754)
mroeschke Jun 22, 2023
7c2bdd2
TST: Refactor some slow tests (#53784)
mroeschke Jun 22, 2023
40aa4b6
TYP: reshape.merge (#53780)
jbrockmendel Jun 22, 2023
70f0558
TYP: annotate testing decorators with pytest.MarkDecorator (#53794)
fangchenli Jun 22, 2023
bb923c2
TST/CLN: use fixture for data path in all xml tests (#53790)
fangchenli Jun 22, 2023
e4ba598
REF: remove unused merge args (#53789)
jbrockmendel Jun 22, 2023
8c68943
BUG: combine_first ignoring others columns if other is empty (#53792)
phofl Jun 22, 2023
4c254b5
PERF: concat in no-reindexing case (#53772)
jbrockmendel Jun 22, 2023
b0baa2e
BUG: bad display for complex series with nan (#53764)
Charlie-XIAO Jun 22, 2023
931dc4b
CLN: assorted (#53742)
jbrockmendel Jun 22, 2023
9615631
DOC: Fixing EX01 - Added examples (#53796)
DeaMariaLeon Jun 23, 2023
b865253
Update governance.md (#53814)
computerscienceiscool Jun 23, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/code-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ jobs:

- name: Install pandas in editable mode
id: build-editable
if: ${{ steps.build.outcome == 'success' && always() }}
uses: ./.github/actions/build_pandas
with:
editable: true
Expand Down
60 changes: 47 additions & 13 deletions .github/workflows/deprecation-tracking-bot.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
# This bot updates the issue with number DEPRECATION_TRACKER_ISSUE
# with the PR number that issued the deprecation.

# It runs on commits to main, and will trigger if the PR linked to a merged commit has the "Deprecate" label
name: Deprecations Bot

on:
pull_request:
push:
branches:
- main
types:
[closed]


permissions:
Expand All @@ -15,17 +17,49 @@ jobs:
deprecation_update:
permissions:
issues: write
if: >-
contains(github.event.pull_request.labels.*.name, 'Deprecate') && github.event.pull_request.merged == true
runs-on: ubuntu-22.04
env:
DEPRECATION_TRACKER_ISSUE: 50578
steps:
- name: Checkout
run: |
echo "Adding deprecation PR number to deprecation tracking issue"
export PR=${{ github.event.pull_request.number }}
BODY=$(curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" https://api.github.com/repos/${{ github.repository }}/issues/${DEPRECATION_TRACKER_ISSUE} |
python3 -c "import sys, json, os; x = {'body': json.load(sys.stdin)['body']}; pr = os.environ['PR']; x['body'] += f'\n- [ ] #{pr}'; print(json.dumps(x))")
echo ${BODY}
curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" -X PATCH -d "${BODY}" https://api.github.com/repos/${{ github.repository }}/issues/${DEPRECATION_TRACKER_ISSUE}
- uses: actions/github-script@v6
id: update-deprecation-issue
with:
script: |
body = await github.rest.issues.get({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: ${{ env.DEPRECATION_TRACKER_ISSUE }},
})
body = body["data"]["body"];
linkedPRs = await github.rest.repos.listPullRequestsAssociatedWithCommit({
owner: context.repo.owner,
repo: context.repo.repo,
commit_sha: '${{ github.sha }}'
})
linkedPRs = linkedPRs["data"];
console.log(linkedPRs);
if (linkedPRs.length > 0) {
console.log("Found linked PR");
linkedPR = linkedPRs[0]
isDeprecation = false
for (label of linkedPR["labels"]) {
if (label["name"] == "Deprecate") {
isDeprecation = true;
break;
}
}

PR_NUMBER = linkedPR["number"];

body += ("\n- [ ] #" + PR_NUMBER);
if (isDeprecation) {
console.log("PR is a deprecation PR. Printing new body of issue");
console.log(body);
github.rest.issues.update({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: ${{ env.DEPRECATION_TRACKER_ISSUE }},
body: body
})
}
}
8 changes: 4 additions & 4 deletions .github/workflows/unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ jobs:
/opt/python/cp39-cp39/bin/python -m venv ~/virtualenvs/pandas-dev
. ~/virtualenvs/pandas-dev/bin/activate
python -m pip install -U pip wheel setuptools meson[ninja]==1.0.1 meson-python==0.13.1
python -m pip install --no-cache-dir versioneer[toml] cython numpy python-dateutil pytz pytest>=7.0.0 pytest-xdist>=2.2.0 pytest-asyncio>=0.17 hypothesis>=6.46.1
python -m pip install --no-cache-dir versioneer[toml] cython numpy python-dateutil pytz pytest>=7.3.2 pytest-xdist>=2.2.0 pytest-asyncio>=0.17 hypothesis>=6.46.1
python -m pip install --no-cache-dir --no-build-isolation -e .
python -m pip list --no-cache-dir
export PANDAS_CI=1
Expand Down Expand Up @@ -268,7 +268,7 @@ jobs:
/opt/python/cp39-cp39/bin/python -m venv ~/virtualenvs/pandas-dev
. ~/virtualenvs/pandas-dev/bin/activate
python -m pip install -U pip wheel setuptools meson-python==0.13.1 meson[ninja]==1.0.1
python -m pip install --no-cache-dir versioneer[toml] cython numpy python-dateutil pytz pytest>=7.0.0 pytest-xdist>=2.2.0 pytest-asyncio>=0.17 hypothesis>=6.46.1
python -m pip install --no-cache-dir versioneer[toml] cython numpy python-dateutil pytz pytest>=7.3.2 pytest-xdist>=2.2.0 pytest-asyncio>=0.17 hypothesis>=6.46.1
python -m pip install --no-cache-dir --no-build-isolation -e .
python -m pip list --no-cache-dir

Expand Down Expand Up @@ -337,10 +337,10 @@ jobs:
run: |
python --version
python -m pip install --upgrade pip setuptools wheel
python -m pip install --pre --extra-index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
python -m pip install --pre --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple numpy
python -m pip install git+https://github.com/nedbat/coveragepy.git
python -m pip install versioneer[toml]
python -m pip install python-dateutil pytz cython hypothesis>=6.46.1 pytest>=7.0.0 pytest-xdist>=2.2.0 pytest-cov pytest-asyncio>=0.17
python -m pip install python-dateutil pytz cython hypothesis>=6.46.1 pytest>=7.3.2 pytest-xdist>=2.2.0 pytest-cov pytest-asyncio>=0.17
python -m pip list

- name: Build Pandas
Expand Down
13 changes: 8 additions & 5 deletions .github/workflows/wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,12 @@ jobs:
path: ./dist

- name: Build wheels
uses: pypa/[email protected]
with:
package-dir: ./dist/${{ needs.build_sdist.outputs.sdist_file }}
uses: pypa/[email protected]
# TODO: Build wheels from sdist again
# There's some sort of weird race condition?
# within Github that makes the sdist be missing files
#with:
# package-dir: ./dist/${{ needs.build_sdist.outputs.sdist_file }}
env:
CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}

Expand All @@ -137,7 +140,7 @@ jobs:
shell: pwsh
run: |
$TST_CMD = @"
python -m pip install pytz six numpy python-dateutil tzdata>=2022.1 hypothesis>=6.46.1 pytest>=7.0.0 pytest-xdist>=2.2.0 pytest-asyncio>=0.17;
python -m pip install pytz six numpy python-dateutil tzdata>=2022.1 hypothesis>=6.46.1 pytest>=7.3.2 pytest-xdist>=2.2.0 pytest-asyncio>=0.17;
python -m pip install --find-links=pandas\wheelhouse --no-index pandas;
python -c `'import pandas as pd; pd.test()`';
"@
Expand All @@ -156,7 +159,7 @@ jobs:
PANDAS_STAGING_UPLOAD_TOKEN: ${{ secrets.PANDAS_STAGING_UPLOAD_TOKEN }}
PANDAS_NIGHTLY_UPLOAD_TOKEN: ${{ secrets.PANDAS_NIGHTLY_UPLOAD_TOKEN }}
# trigger an upload to
# https://anaconda.org/scipy-wheels-nightly/pandas
# https://anaconda.org/scientific-python-nightly-wheels/pandas
# for cron jobs or "Run workflow" (restricted to main branch).
# Tags will upload to
# https://anaconda.org/multibuild-wheels-staging/pandas
Expand Down
97 changes: 88 additions & 9 deletions asv_bench/benchmarks/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,38 @@
},
}

# These aggregations don't have a kernel implemented for them yet
_numba_unsupported_methods = [
"all",
"any",
"bfill",
"count",
"cumcount",
"cummax",
"cummin",
"cumprod",
"cumsum",
"describe",
"diff",
"ffill",
"first",
"head",
"last",
"median",
"nunique",
"pct_change",
"prod",
"quantile",
"rank",
"sem",
"shift",
"size",
"skew",
"tail",
"unique",
"value_counts",
]


class ApplyDictReturn:
def setup(self):
Expand Down Expand Up @@ -453,9 +485,10 @@ class GroupByMethods:
],
["direct", "transformation"],
[1, 5],
["cython", "numba"],
]

def setup(self, dtype, method, application, ncols):
def setup(self, dtype, method, application, ncols, engine):
if method in method_blocklist.get(dtype, {}):
raise NotImplementedError # skip benchmark

Expand All @@ -474,6 +507,19 @@ def setup(self, dtype, method, application, ncols):
# DataFrameGroupBy doesn't have these methods
raise NotImplementedError

# Numba currently doesn't support
# multiple transform functions or strs for transform,
# grouping on multiple columns
# and we lack kernels for a bunch of methods
if (
engine == "numba"
and method in _numba_unsupported_methods
or ncols > 1
or application == "transformation"
or dtype == "datetime"
):
raise NotImplementedError

if method == "describe":
ngroups = 20
elif method == "skew":
Expand Down Expand Up @@ -505,17 +551,30 @@ def setup(self, dtype, method, application, ncols):
if len(cols) == 1:
cols = cols[0]

# Not everything supports the engine keyword yet
kwargs = {}
if engine == "numba":
kwargs["engine"] = engine

if application == "transformation":
self.as_group_method = lambda: df.groupby("key")[cols].transform(method)
self.as_field_method = lambda: df.groupby(cols)["key"].transform(method)
self.as_group_method = lambda: df.groupby("key")[cols].transform(
method, **kwargs
)
self.as_field_method = lambda: df.groupby(cols)["key"].transform(
method, **kwargs
)
else:
self.as_group_method = getattr(df.groupby("key")[cols], method)
self.as_field_method = getattr(df.groupby(cols)["key"], method)
self.as_group_method = partial(
getattr(df.groupby("key")[cols], method), **kwargs
)
self.as_field_method = partial(
getattr(df.groupby(cols)["key"], method), **kwargs
)

def time_dtype_as_group(self, dtype, method, application, ncols):
def time_dtype_as_group(self, dtype, method, application, ncols, engine):
self.as_group_method()

def time_dtype_as_field(self, dtype, method, application, ncols):
def time_dtype_as_field(self, dtype, method, application, ncols, engine):
self.as_field_method()


Expand All @@ -532,8 +591,12 @@ class GroupByCythonAgg:
[
"sum",
"prod",
"min",
"max",
# TODO: uncomment min/max
# Currently, min/max implemented very inefficiently
# because it re-uses the Window min/max kernel
# so it will time out ASVs
# "min",
# "max",
"mean",
"median",
"var",
Expand All @@ -554,6 +617,22 @@ def time_frame_agg(self, dtype, method):
self.df.groupby("key").agg(method)


class GroupByNumbaAgg(GroupByCythonAgg):
"""
Benchmarks specifically targeting our numba aggregation algorithms
(using a big enough dataframe with simple key, so a large part of the
time is actually spent in the grouped aggregation).
"""

def setup(self, dtype, method):
if method in _numba_unsupported_methods:
raise NotImplementedError
super().setup(dtype, method)

def time_frame_agg(self, dtype, method):
self.df.groupby("key").agg(method, engine="numba")


class GroupByCythonAggEaDtypes:
"""
Benchmarks specifically targeting our cython aggregation algorithms
Expand Down
26 changes: 26 additions & 0 deletions asv_bench/benchmarks/multiindex_object.py
Original file line number Diff line number Diff line change
Expand Up @@ -396,4 +396,30 @@ def time_putmask_all_different(self):
self.midx.putmask(self.mask, self.midx_values_different)


class Append:
params = ["datetime64[ns]", "int64", "string"]
param_names = ["dtype"]

def setup(self, dtype):
N1 = 1000
N2 = 500
left_level1 = range(N1)
right_level1 = range(N1, N1 + N1)

if dtype == "datetime64[ns]":
level2 = date_range(start="2000-01-01", periods=N2)
elif dtype == "int64":
level2 = range(N2)
elif dtype == "string":
level2 = tm.makeStringIndex(N2)
else:
raise NotImplementedError

self.left = MultiIndex.from_product([left_level1, level2])
self.right = MultiIndex.from_product([right_level1, level2])

def time_append(self, dtype):
self.left.append(self.right)


from .pandas_vb_common import setup # noqa: F401 isort:skip
Loading