Fix makeIntIndex, benchmark get loc #19483

toobaz · 2018-01-31T23:45:46Z

tests passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff

makeIntIndex currently produces a monotonic index, and hence tests common to different classes actually test very different things (i.e. the sorted vs. unsorted code paths). I had to fix this (and to fix the tests which erroneously relied on the specific content of makeIntIndex(k)) in order to add meaningful performance tests for get_loc.

jreback · 2018-01-31T23:59:08Z

pandas/tests/indexing/test_floats.py

@@ -206,9 +207,9 @@ def test_scalar_integer(self):
        # test how scalar float indexers work on int indexes

        # integer index
-        for index in [tm.makeIntIndex, tm.makeRangeIndex]:


rather than change this, wouldn't it be better to simply fix the make* ones? this is pretty much an anti-patter and the reason we have the helpers.

The only way to "fix" makeIntIndex for this test to work is to leave it as it is, but as I stated, it limits comparability across index types, since others return unsorted stuff.

This test does something different than using the makeIntIndex to get "some index filled with integers": it assumes that the result has a specific content. I see this as an anti-pattern.

why are you removing testing of RangeIndex? (yes you convert to it, but its not very explicit)

It's just simpler, but OK, I can make it explicit

(done), ping

codecov · 2018-02-01T02:20:48Z

Codecov Report

Merging #19483 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #19483      +/-   ##
==========================================
- Coverage   91.67%   91.67%   -0.01%     
==========================================
  Files         148      148              
  Lines       48543    48541       -2     
==========================================
- Hits        44502    44499       -3     
- Misses       4041     4042       +1

Flag	Coverage Δ
#multiple	`90.03% <100%> (-0.01%)`	⬇️
#single	`41.71% <100%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/util/testing.py	`83.64% <100%> (-0.21%)`	⬇️
pandas/core/indexes/multi.py	`95.06% <0%> (-0.09%)`	⬇️
pandas/core/window.py	`96.32% <0%> (ø)`	⬆️
pandas/core/reshape/pivot.py	`96.97% <0%> (+0.62%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4eb0cec...51d6911. Read the comment docs.

jreback · 2018-02-01T11:20:55Z

pandas/util/testing.py

@@ -1560,7 +1560,9 @@ def makeBoolIndex(k=10, name=None):


 def makeIntIndex(k=10, name=None):
-    return Index(lrange(k), name=name)
+    if k == 0:


why is this needed?

... because otherwise the index comes with an element anyway. But I'll reformulate

jreback · 2018-02-01T11:21:21Z

pandas/tests/indexing/test_floats.py

@@ -206,9 +207,9 @@ def test_scalar_integer(self):
        # test how scalar float indexers work on int indexes

        # integer index
-        for index in [tm.makeIntIndex, tm.makeRangeIndex]:


why are you removing testing of RangeIndex? (yes you convert to it, but its not very explicit)

jreback · 2018-02-02T11:12:06Z

pandas/util/testing.py

@@ -1560,11 +1560,11 @@ def makeBoolIndex(k=10, name=None):


 def makeIntIndex(k=10, name=None):
-    return Index(lrange(k), name=name)


why are you changing this? sure this is an int index, but the ordering is not expected. Revert this, and if you really need it then make a fixture in the appropriate places

the ordering is not expected.

Why do you expect an index of int (and not of float, str) to be sorted?

What don't you understand in the description of this PR?

Sure, I can write all the fixtures you like. It just doesn't make any sense.

because for testing using something like arange is the expected input.

this has nothing to do with the description. you are free to use sorted, unique, not-sorted whatever in a test, but that should be explicit. you are changing a global default.

because for testing using something like arange is the expected input.

If you expect that, fine to me: shall I then change makeFloatIndex and makeStringIndex to return sorted indexes, so that when I shuffle them it is "explicit" that they are unsorted? I couldn't care less if standard test indexes are sorted or not, as long as they all are and I don't need to add useless lines of codes in tests.

Currently not only it's not "explicit" if tests run on sorted indexes or not - it actually depends on the type.

In general when you run a test on different types you want to change only the type. Not other characteristics of the index. Then sure, there are some unavoidable differences (e.g. a RangeIndex is sorted)... but apart from that, they should be as comparable as possible, so that tests can really test the different code paths.

because for testing using something like arange is the expected input.

Actually, the concept itself of "expected input" (referring to content) is completely wrong in a test suite.

pls do it my way. you are changing a function that is public facing for no real reason.

pls do it my way. you are changing a function that is public facing for no real reason.

Oh come on Jeff, the function is clearly not part of the API. I gave "real reasons", you didn't. I even proposed an alternative, which you didn't even comment... you're certainly not going to convince me by asking "pls do it my way".

By the way, you are a core dev, you can always merge and then change (e.g. the new asv tests), so if you don't like my PR and don't like to discuss... you don't actually need to waste your time here, just merge and fix to your liking.

jreback · 2018-02-04T15:39:12Z

pandas/util/testing.py

@@ -1560,11 +1560,11 @@ def makeBoolIndex(k=10, name=None):


 def makeIntIndex(k=10, name=None):
-    return Index(lrange(k), name=name)


pls do it my way. you are changing a function that is public facing for no real reason.

toobaz · 2018-02-05T13:43:37Z

@jreback You realize that the asv test is broken in that it never tests unsorted Int64Index, yeah?

jreback · 2018-02-05T13:45:29Z

and that change should be in the asv itself

toobaz · 2018-02-05T13:52:14Z

fine

jorisvandenbossche · 2018-02-08T10:54:22Z

@jreback if you make changes to a PR before merging (which is OK, I also do that from time to time), please push to this branch on github and merge here, otherwise it is really un-transparent to see what has been merged (the merge script has even an option to do this).

Author: Pietro Battiston <[email protected]> Closes pandas-dev#19483 from toobaz/test_get_loc and squashes the following commits: 51d6911 [Pietro Battiston] TST: benchmark get_loc in various cases d424f63 [Pietro Battiston] TST: produce unsorted integer index (consistently with other types)

jreback requested changes Jan 31, 2018

View reviewed changes

jreback added Testing pandas testing functions or related to the test suite Performance Memory or execution speed performance labels Jan 31, 2018

jreback requested changes Feb 1, 2018

View reviewed changes

toobaz added 2 commits February 1, 2018 13:41

TST: produce unsorted integer index (consistently with other types)

d424f63

TST: benchmark get_loc in various cases

51d6911

toobaz force-pushed the test_get_loc branch from ae55b7d to 51d6911 Compare February 1, 2018 12:43

jreback requested changes Feb 2, 2018

View reviewed changes

jreback requested changes Feb 4, 2018

View reviewed changes

jreback closed this in d5a7e7c Feb 5, 2018

jreback added this to the 0.23.0 milestone Feb 5, 2018

toobaz deleted the test_get_loc branch February 5, 2018 13:43

toobaz mentioned this pull request Feb 5, 2018

PERF: improve get_loc on unsorted, non-unique indexes #19539

Merged

4 tasks

		@@ -1560,11 +1560,11 @@ def makeBoolIndex(k=10, name=None):


		def makeIntIndex(k=10, name=None):
		return Index(lrange(k), name=name)

Uh oh!

Fix makeIntIndex, benchmark get loc #19483

Fix makeIntIndex, benchmark get loc #19483

Uh oh!

Conversation

toobaz commented Jan 31, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toobaz Feb 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toobaz Feb 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toobaz commented Feb 5, 2018

Uh oh!

jreback commented Feb 5, 2018

Uh oh!

toobaz commented Feb 5, 2018

Uh oh!

jorisvandenbossche commented Feb 8, 2018

Uh oh!

Uh oh!

toobaz Feb 1, 2018 •

edited

Loading

codecov bot commented Feb 1, 2018 •

edited

Loading

toobaz Feb 2, 2018 •

edited

Loading