Skip to content

PERF: perform a unique intersection with IntervalIndex if at least one side is unique #26711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 10, 2019

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Jun 7, 2019

Resolves #26709

       before           after         ratio
     [3ff4f38f]       [30bd5369]
     <datetime_iter~1>
-       749±100μs         365±70μs     0.49  index_object.IntervalIndexMethod.time_intersection_duplicate(1000)
-        59.6±3ms       8.04±0.2ms     0.13  index_object.IntervalIndexMethod.time_intersection_duplicate(100000)
           failed       2.50±0.01s      n/a  index_object.IntervalIndexMethod.time_intersection_duplicate(10000000)
  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@qwhelan qwhelan mentioned this pull request Jun 7, 2019
@qwhelan qwhelan force-pushed the unique_intersection branch from e69d711 to 5a231b8 Compare June 7, 2019 19:29
@TomAugspurger TomAugspurger added Interval Interval data type Performance Memory or execution speed performance labels Jun 7, 2019
@qwhelan qwhelan force-pushed the unique_intersection branch from 5a231b8 to ea74a35 Compare June 7, 2019 22:39
@makbigc
Copy link
Contributor

makbigc commented Jun 8, 2019

@qwhelan Thanks for your quick respond.

Maybe adding check for duplicate nan in either side can resolve.
if self.isna().sum() > 1 or other.isna().sum() > 1:

@jreback jreback added this to the 0.25.0 milestone Jun 8, 2019
@jreback
Copy link
Contributor

jreback commented Jun 8, 2019

can you merge master & try @makbigc suggestion

@qwhelan qwhelan force-pushed the unique_intersection branch from ea74a35 to 66d323b Compare June 10, 2019 02:13
@codecov
Copy link

codecov bot commented Jun 10, 2019

Codecov Report

Merging #26711 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26711      +/-   ##
==========================================
- Coverage   91.71%    91.7%   -0.02%     
==========================================
  Files         178      178              
  Lines       50740    50742       +2     
==========================================
- Hits        46538    46533       -5     
- Misses       4202     4209       +7
Flag Coverage Δ
#multiple 90.3% <100%> (-0.01%) ⬇️
#single 41.21% <0%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/interval.py 96.11% <100%> (-0.32%) ⬇️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.88% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.84% <0%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f3e8e8...66d323b. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Jun 10, 2019

Codecov Report

Merging #26711 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26711      +/-   ##
==========================================
- Coverage   91.71%    91.7%   -0.02%     
==========================================
  Files         178      178              
  Lines       50740    50742       +2     
==========================================
- Hits        46538    46533       -5     
- Misses       4202     4209       +7
Flag Coverage Δ
#multiple 90.3% <100%> (-0.01%) ⬇️
#single 41.21% <0%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/interval.py 96.11% <100%> (-0.32%) ⬇️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.88% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.84% <0%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f3e8e8...66d323b. Read the comment docs.

@qwhelan
Copy link
Contributor Author

qwhelan commented Jun 10, 2019

@makbigc Thanks, your suggestion works. The calls to .is_unique appear to give us that condition for free, so I've only added self.isna().sum() <= 1

@jreback Done and passing

@jreback jreback merged commit 9473a1f into pandas-dev:master Jun 10, 2019
@jreback
Copy link
Contributor

jreback commented Jun 10, 2019

thanks @qwhelan and @makbigc

@qwhelan qwhelan deleted the unique_intersection branch July 20, 2019 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Interval Interval data type Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broken benchmarks
4 participants