Skip to content

Fix makeIntIndex, benchmark get loc #19483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Jan 31, 2018

  • tests passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

makeIntIndex currently produces a monotonic index, and hence tests common to different classes actually test very different things (i.e. the sorted vs. unsorted code paths). I had to fix this (and to fix the tests which erroneously relied on the specific content of makeIntIndex(k)) in order to add meaningful performance tests for get_loc.

@@ -206,9 +207,9 @@ def test_scalar_integer(self):
# test how scalar float indexers work on int indexes

# integer index
for index in [tm.makeIntIndex, tm.makeRangeIndex]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than change this, wouldn't it be better to simply fix the make* ones? this is pretty much an anti-patter and the reason we have the helpers.

Copy link
Member Author

@toobaz toobaz Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way to "fix" makeIntIndex for this test to work is to leave it as it is, but as I stated, it limits comparability across index types, since others return unsorted stuff.

This test does something different than using the makeIntIndex to get "some index filled with integers": it assumes that the result has a specific content. I see this as an anti-pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you removing testing of RangeIndex? (yes you convert to it, but its not very explicit)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just simpler, but OK, I can make it explicit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done), ping

@jreback jreback added Testing pandas testing functions or related to the test suite Performance Memory or execution speed performance labels Jan 31, 2018
@codecov
Copy link

codecov bot commented Feb 1, 2018

Codecov Report

Merging #19483 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19483      +/-   ##
==========================================
- Coverage   91.67%   91.67%   -0.01%     
==========================================
  Files         148      148              
  Lines       48543    48541       -2     
==========================================
- Hits        44502    44499       -3     
- Misses       4041     4042       +1
Flag Coverage Δ
#multiple 90.03% <100%> (-0.01%) ⬇️
#single 41.71% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/util/testing.py 83.64% <100%> (-0.21%) ⬇️
pandas/core/indexes/multi.py 95.06% <0%> (-0.09%) ⬇️
pandas/core/window.py 96.32% <0%> (ø) ⬆️
pandas/core/reshape/pivot.py 96.97% <0%> (+0.62%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4eb0cec...51d6911. Read the comment docs.

@@ -1560,7 +1560,9 @@ def makeBoolIndex(k=10, name=None):


def makeIntIndex(k=10, name=None):
return Index(lrange(k), name=name)
if k == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... because otherwise the index comes with an element anyway. But I'll reformulate

@@ -206,9 +207,9 @@ def test_scalar_integer(self):
# test how scalar float indexers work on int indexes

# integer index
for index in [tm.makeIntIndex, tm.makeRangeIndex]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you removing testing of RangeIndex? (yes you convert to it, but its not very explicit)

@@ -1560,11 +1560,11 @@ def makeBoolIndex(k=10, name=None):


def makeIntIndex(k=10, name=None):
return Index(lrange(k), name=name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you changing this? sure this is an int index, but the ordering is not expected. Revert this, and if you really need it then make a fixture in the appropriate places

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ordering is not expected.

Why do you expect an index of int (and not of float, str) to be sorted?

What don't you understand in the description of this PR?

Sure, I can write all the fixtures you like. It just doesn't make any sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because for testing using something like arange is the expected input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has nothing to do with the description. you are free to use sorted, unique, not-sorted whatever in a test, but that should be explicit. you are changing a global default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because for testing using something like arange is the expected input.

If you expect that, fine to me: shall I then change makeFloatIndex and makeStringIndex to return sorted indexes, so that when I shuffle them it is "explicit" that they are unsorted? I couldn't care less if standard test indexes are sorted or not, as long as they all are and I don't need to add useless lines of codes in tests.

Currently not only it's not "explicit" if tests run on sorted indexes or not - it actually depends on the type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general when you run a test on different types you want to change only the type. Not other characteristics of the index. Then sure, there are some unavoidable differences (e.g. a RangeIndex is sorted)... but apart from that, they should be as comparable as possible, so that tests can really test the different code paths.

Copy link
Member Author

@toobaz toobaz Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because for testing using something like arange is the expected input.

Actually, the concept itself of "expected input" (referring to content) is completely wrong in a test suite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls do it my way. you are changing a function that is public facing for no real reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls do it my way. you are changing a function that is public facing for no real reason.

Oh come on Jeff, the function is clearly not part of the API. I gave "real reasons", you didn't. I even proposed an alternative, which you didn't even comment... you're certainly not going to convince me by asking "pls do it my way".

By the way, you are a core dev, you can always merge and then change (e.g. the new asv tests), so if you don't like my PR and don't like to discuss... you don't actually need to waste your time here, just merge and fix to your liking.

@@ -1560,11 +1560,11 @@ def makeBoolIndex(k=10, name=None):


def makeIntIndex(k=10, name=None):
return Index(lrange(k), name=name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls do it my way. you are changing a function that is public facing for no real reason.

@jreback jreback closed this in d5a7e7c Feb 5, 2018
@jreback jreback added this to the 0.23.0 milestone Feb 5, 2018
@toobaz
Copy link
Member Author

toobaz commented Feb 5, 2018

@jreback You realize that the asv test is broken in that it never tests unsorted Int64Index, yeah?

@toobaz toobaz deleted the test_get_loc branch February 5, 2018 13:43
@jreback
Copy link
Contributor

jreback commented Feb 5, 2018

and that change should be in the asv itself

@toobaz
Copy link
Member Author

toobaz commented Feb 5, 2018

fine

@jorisvandenbossche
Copy link
Member

@jreback if you make changes to a PR before merging (which is OK, I also do that from time to time), please push to this branch on github and merge here, otherwise it is really un-transparent to see what has been merged (the merge script has even an option to do this).

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
Author: Pietro Battiston <[email protected]>

Closes pandas-dev#19483 from toobaz/test_get_loc and squashes the following commits:

51d6911 [Pietro Battiston] TST: benchmark get_loc in various cases
d424f63 [Pietro Battiston] TST: produce unsorted integer index (consistently with other types)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants