Skip to content

Benchmark for indexing with .loc for sorted/unsorted DatetimeIndex #46193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 2, 2022

Conversation

weikhor
Copy link
Contributor

@weikhor weikhor commented Mar 1, 2022

@jreback jreback added the Benchmark Performance (ASV) benchmarks label Mar 2, 2022
@jreback jreback added this to the 1.5 milestone Mar 2, 2022
@jreback jreback merged commit b221a80 into pandas-dev:main Mar 2, 2022
@jreback
Copy link
Contributor

jreback commented Mar 2, 2022

thanks @weikhor

@mroeschke
Copy link
Member

@weikhor this looks like it's failing on the main branch, could you push another PR to fix?

2T02:52:46.5036611Z ##[error][ 20.22%] ··· indexing.DatetimeIndexIndexing.time_loc_sorted              failed
2022-03-02T02:52:46.5037899Z [ 20.22%] ···· Traceback (most recent call last):
2022-03-02T02:52:46.5038316Z                  File "/home/runner/work/pandas/pandas/pandas/core/indexes/base.py", line 3738, in get_loc
2022-03-02T02:52:46.5038698Z                    return self._engine.get_loc(casted_key)
2022-03-02T02:52:46.5039115Z                  File "pandas/_libs/index.pyx", line 516, in pandas._libs.index.DatetimeEngine.get_loc
2022-03-02T02:52:46.5039798Z                  File "pandas/_libs/index.pyx", line 545, in pandas._libs.index.DatetimeEngine.get_loc
2022-03-02T02:52:46.5040514Z                  File "pandas/_libs/index.pyx", line 197, in pandas._libs.index.IndexEngine._get_loc_duplicates
2022-03-02T02:52:46.5040908Z                KeyError: 1465628400000000000
2022-03-02T02:52:46.5041148Z                
2022-03-02T02:52:46.5041464Z                The above exception was the direct cause of the following exception:
2022-03-02T02:52:46.5041769Z                
2022-03-02T02:52:46.5042039Z                Traceback (most recent call last):
2022-03-02T02:52:46.5042561Z                  File "/home/runner/work/pandas/pandas/pandas/core/indexes/datetimes.py", line 680, in get_loc
2022-03-02T02:52:46.5042976Z                    return Index.get_loc(self, key, method, tolerance)
2022-03-02T02:52:46.5043366Z                  File "/home/runner/work/pandas/pandas/pandas/core/indexes/base.py", line 3740, in get_loc
2022-03-02T02:52:46.5043737Z                    raise KeyError(key) from err
2022-03-02T02:52:46.5044212Z                KeyError: Timestamp('2016-06-11 00:00:00-0700', tz='US/Pacific')
2022-03-02T02:52:46.5044492Z                
2022-03-02T02:52:46.5044816Z                The above exception was the direct cause of the following exception:
2022-03-02T02:52:46.5045122Z                
2022-03-02T02:52:46.5045378Z                Traceback (most recent call last):
2022-03-02T02:52:46.5045979Z                  File "/usr/share/miniconda/envs/pandas-dev/lib/python3.8/site-packages/asv/benchmark.py", line 1184, in main_run_server
2022-03-02T02:52:46.5046378Z                    main_run(run_args)
2022-03-02T02:52:46.5046943Z                  File "/usr/share/miniconda/envs/pandas-dev/lib/python3.8/site-packages/asv/benchmark.py", line 1058, in main_run
2022-03-02T02:52:46.5047358Z                    result = benchmark.do_run()
2022-03-02T02:52:46.5047909Z                  File "/usr/share/miniconda/envs/pandas-dev/lib/python3.8/site-packages/asv/benchmark.py", line 537, in do_run
2022-03-02T02:52:46.5048333Z                    return self.run(*self._current_params)
2022-03-02T02:52:46.5048912Z                  File "/usr/share/miniconda/envs/pandas-dev/lib/python3.8/site-packages/asv/benchmark.py", line 627, in run
2022-03-02T02:52:46.5049361Z                    samples, number = self.benchmark_timing(timer, min_repeat, max_repeat,
2022-03-02T02:52:46.5050123Z                  File "/usr/share/miniconda/envs/pandas-dev/lib/python3.8/site-packages/asv/benchmark.py", line 694, in benchmark_timing
2022-03-02T02:52:46.5050549Z                    timing = timer.timeit(number)
2022-03-02T02:52:46.5051075Z                  File "/usr/share/miniconda/envs/pandas-dev/lib/python3.8/timeit.py", line 177, in timeit
2022-03-02T02:52:46.5051442Z                    timing = self.inner(it, self.timer)
2022-03-02T02:52:46.5051836Z                  File "<timeit-src>", line 6, in inner
2022-03-02T02:52:46.5052235Z                  File "/home/runner/work/pandas/pandas/asv_bench/benchmarks/indexing.py", line 309, in time_loc_sorted
2022-03-02T02:52:46.5052701Z                    self.df_sort.loc["2016-6-11"]
2022-03-02T02:52:46.5053069Z                  File "/home/runner/work/pandas/pandas/pandas/core/indexing.py", line 1048, in __getitem__
2022-03-02T02:52:46.5053481Z                    return self._getitem_axis(maybe_callable, axis=axis)
2022-03-02T02:52:46.5053904Z                  File "/home/runner/work/pandas/pandas/pandas/core/indexing.py", line 1284, in _getitem_axis
2022-03-02T02:52:46.6076973Z                    return self._get_label(key, axis=axis)
2022-03-02T02:52:46.6077429Z                  File "/home/runner/work/pandas/pandas/pandas/core/indexing.py", line 1235, in _get_label
2022-03-02T02:52:46.6077820Z                    return self.obj.xs(label, axis=axis)
2022-03-02T02:52:46.6078203Z                  File "/home/runner/work/pandas/pandas/pandas/core/generic.py", line 3859, in xs
2022-03-02T02:52:46.6078543Z                    loc = index.get_loc(key)
2022-03-02T02:52:46.6078920Z                  File "/home/runner/work/pandas/pandas/pandas/core/indexes/datetimes.py", line 682, in get_loc
2022-03-02T02:52:46.6079763Z                    raise KeyError(orig_key) from err
2022-03-02T02:52:46.6080503Z                KeyError: '2016-6-11'
2022-03-02T02:52:46.6080638Z 
2022-03-02T02:52:46.6081608Z ##[error][ 20.27%] ··· ...xing.DatetimeIndexIndexing.time_loc_unsorted             failed
2022-03-02T02:52:46.6082751Z [ 20.27%] ···· Traceback (most recent call last):
2022-03-02T02:52:46.6083105Z                  File "/home/runner/work/pandas/pandas/pandas/core/indexes/base.py", line 3738, in get_loc
2022-03-02T02:52:46.6083465Z                    return self._engine.get_loc(casted_key)
2022-03-02T02:52:46.6083840Z                  File "pandas/_libs/index.pyx", line 516, in pandas._libs.index.DatetimeEngine.get_loc
2022-03-02T02:52:46.6084281Z                  File "pandas/_libs/index.pyx", line 545, in pandas._libs.index.DatetimeEngine.get_loc
2022-03-02T02:52:46.6084718Z                  File "pandas/_libs/index.pyx", line 203, in pandas._libs.index.IndexEngine._get_loc_duplicates
2022-03-02T02:52:46.6085181Z                  File "pandas/_libs/index.pyx", line 211, in pandas._libs.index.IndexEngine._maybe_get_bool_indexer
2022-03-02T02:52:46.6085606Z                  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index._unpack_bool_indexer
2022-03-02T02:52:46.6085928Z                KeyError: 1465628400000000000
2022-03-02T02:52:46.6086134Z                
2022-03-02T02:52:46.6086427Z                The above exception was the direct cause of the following exception:
2022-03-02T02:52:46.6086705Z                
2022-03-02T02:52:46.6086931Z                Traceback (most recent call last):
2022-03-02T02:52:46.6087296Z                  File "/home/runner/work/pandas/pandas/pandas/core/indexes/datetimes.py", line 680, in get_loc
2022-03-02T02:52:46.6087679Z                    return Index.get_loc(self, key, method, tolerance)
2022-03-02T02:52:46.6088052Z                  File "/home/runner/work/pandas/pandas/pandas/core/indexes/base.py", line 3740, in get_loc
2022-03-02T02:52:46.6088379Z                    raise KeyError(key) from err
2022-03-02T02:52:46.6088793Z                KeyError: Timestamp('2016-06-11 00:00:00-0700', tz='US/Pacific')
2022-03-02T02:52:46.6089054Z                
2022-03-02T02:52:46.6089329Z                The above exception was the direct cause of the following exception:
2022-03-02T02:52:46.6089607Z                
2022-03-02T02:52:46.6089982Z                Traceback (most recent call last):
2022-03-02T02:52:46.6090549Z                  File "/usr/share/miniconda/envs/pandas-dev/lib/python3.8/site-packages/asv/benchmark.py", line 1184, in main_run_server
2022-03-02T02:52:46.6090904Z                    main_run(run_args)
2022-03-02T02:52:46.6091419Z                  File "/usr/share/miniconda/envs/pandas-dev/lib/python3.8/site-packages/asv/benchmark.py", line 1058, in main_run
2022-03-02T02:52:46.6091793Z                    result = benchmark.do_run()
2022-03-02T02:52:46.6092301Z                  File "/usr/share/miniconda/envs/pandas-dev/lib/python3.8/site-packages/asv/benchmark.py", line 537, in do_run
2022-03-02T02:52:46.6092695Z                    return self.run(*self._current_params)
2022-03-02T02:52:46.6093220Z                  File "/usr/share/miniconda/envs/pandas-dev/lib/python3.8/site-packages/asv/benchmark.py", line 627, in run
2022-03-02T02:52:46.6093651Z                    samples, number = self.benchmark_timing(timer, min_repeat, max_repeat,
2022-03-02T02:52:46.6094239Z                  File "/usr/share/miniconda/envs/pandas-dev/lib/python3.8/site-packages/asv/benchmark.py", line 694, in benchmark_timing
2022-03-02T02:52:46.6094625Z                    timing = timer.timeit(number)
2022-03-02T02:52:46.6095105Z                  File "/usr/share/miniconda/envs/pandas-dev/lib/python3.8/timeit.py", line 177, in timeit
2022-03-02T02:52:46.6095458Z                    timing = self.inner(it, self.timer)
2022-03-02T02:52:46.6095805Z                  File "<timeit-src>", line 6, in inner
2022-03-02T02:52:46.6096180Z                  File "/home/runner/work/pandas/pandas/asv_bench/benchmarks/indexing.py", line 306, in time_loc_unsorted
2022-03-02T02:52:46.6096708Z                    self.df.loc["2016-6-11"]
2022-03-02T02:52:46.6097052Z                  File "/home/runner/work/pandas/pandas/pandas/core/indexing.py", line 1048, in __getitem__
2022-03-02T02:52:46.6097415Z                    return self._getitem_axis(maybe_callable, axis=axis)
2022-03-02T02:52:46.6097804Z                  File "/home/runner/work/pandas/pandas/pandas/core/indexing.py", line 1284, in _getitem_axis
2022-03-02T02:52:46.6098159Z                    return self._get_label(key, axis=axis)
2022-03-02T02:52:46.6098502Z                  File "/home/runner/work/pandas/pandas/pandas/core/indexing.py", line 1235, in _get_label
2022-03-02T02:52:46.6098850Z                    return self.obj.xs(label, axis=axis)
2022-03-02T02:52:46.6099196Z                  File "/home/runner/work/pandas/pandas/pandas/core/generic.py", line 3859, in xs
2022-03-02T02:52:55.5493277Z                    loc = index.get_loc(key)
2022-03-02T02:52:55.5494144Z                  File "/home/runner/work/pandas/pandas/pandas/core/indexes/datetimes.py", line 682, in get_loc
2022-03-02T02:52:55.5494680Z                    raise KeyError(orig_key) from err
2022-03-02T02:52:55.5495421Z                KeyError: '2016-6-11'

@@ -290,12 +290,24 @@ def setup(self):
self.dti = dti
self.dti2 = dti2

index = np.random.choice(dti, 10000, replace=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this a new class benchmark and make this deterministic (i.e. no random)

@weikhor
Copy link
Contributor Author

weikhor commented Mar 2, 2022

oo no. My bad. I make change to this as soon as possible.

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ASV: Add benchmark for indexing with .loc for sorted/unsorted DatetimeIndex
3 participants