Skip to content

Fix makeIntIndex, benchmark get loc #19483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions asv_bench/benchmarks/index_object.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,11 @@ def setup(self, dtype):
self.idx = getattr(tm, 'make{}Index'.format(dtype))(N)
self.array_mask = (np.arange(N) % 3) == 0
self.series_mask = Series(self.array_mask)
self.sorted = self.idx.sort_values()
half = N // 2
self.non_unique = self.idx[:half].append(self.idx[:half])
self.non_unique_sorted = self.sorted[:half].append(self.sorted[:half])
self.key = self.sorted[N // 4]

def time_boolean_array(self, dtype):
self.idx[self.array_mask]
Expand All @@ -163,6 +168,18 @@ def time_slice(self, dtype):
def time_slice_step(self, dtype):
self.idx[::2]

def time_get_loc(self, dtype):
self.idx.get_loc(self.key)

def time_get_loc_sorted(self, dtype):
self.sorted.get_loc(self.key)

def time_get_loc_non_unique(self, dtype):
self.non_unique.get_loc(self.key)

def time_get_loc_non_unique_sorted(self, dtype):
self.non_unique_sorted.get_loc(self.key)


class Float64IndexMethod(object):
# GH 13166
Expand Down
16 changes: 9 additions & 7 deletions pandas/tests/indexes/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -830,15 +830,16 @@ def test_map_with_tuples(self):

# Test that returning a single tuple from an Index
# returns an Index.
boolean_index = tm.makeIntIndex(3).map(lambda x: (x,))
expected = Index([(0,), (1,), (2,)])
tm.assert_index_equal(boolean_index, expected)
idx = tm.makeIntIndex(3)
result = tm.makeIntIndex(3).map(lambda x: (x,))
expected = Index([(i,) for i in idx])
tm.assert_index_equal(result, expected)

# Test that returning a tuple from a map of a single index
# returns a MultiIndex object.
boolean_index = tm.makeIntIndex(3).map(lambda x: (x, x == 1))
expected = MultiIndex.from_tuples([(0, False), (1, True), (2, False)])
tm.assert_index_equal(boolean_index, expected)
result = idx.map(lambda x: (x, x == 1))
expected = MultiIndex.from_tuples([(i, i == 1) for i in idx])
tm.assert_index_equal(result, expected)

# Test that returning a single object from a MultiIndex
# returns an Index.
Expand Down Expand Up @@ -870,7 +871,8 @@ def test_map_tseries_indices_return_index(self):
def test_map_dictlike(self, mapper):
# GH 12756
expected = Index(['foo', 'bar', 'baz'])
result = tm.makeIntIndex(3).map(mapper(expected.values, [0, 1, 2]))
idx = tm.makeIntIndex(3)
result = idx.map(mapper(expected.values, idx))
tm.assert_index_equal(result, expected)

for name in self.indices.keys():
Expand Down
15 changes: 7 additions & 8 deletions pandas/tests/indexing/test_floats.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@

from warnings import catch_warnings
import numpy as np
from pandas import Series, DataFrame, Index, Float64Index
from pandas import (Series, DataFrame, Index, Float64Index, Int64Index,
RangeIndex)
from pandas.util.testing import assert_series_equal, assert_almost_equal
import pandas.util.testing as tm

Expand Down Expand Up @@ -206,9 +207,8 @@ def test_scalar_integer(self):
# test how scalar float indexers work on int indexes

# integer index
for index in [tm.makeIntIndex, tm.makeRangeIndex]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than change this, wouldn't it be better to simply fix the make* ones? this is pretty much an anti-patter and the reason we have the helpers.

Copy link
Member Author

@toobaz toobaz Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way to "fix" makeIntIndex for this test to work is to leave it as it is, but as I stated, it limits comparability across index types, since others return unsorted stuff.

This test does something different than using the makeIntIndex to get "some index filled with integers": it assumes that the result has a specific content. I see this as an anti-pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you removing testing of RangeIndex? (yes you convert to it, but its not very explicit)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just simpler, but OK, I can make it explicit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done), ping

for i in [Int64Index(range(5)), RangeIndex(5)]:

i = index(5)
for s in [Series(np.arange(len(i))),
DataFrame(np.random.randn(len(i), len(i)),
index=i, columns=i)]:
Expand Down Expand Up @@ -362,9 +362,9 @@ def test_slice_integer(self):
# these coerce to a like integer
# oob indicates if we are out of bounds
# of positional indexing
for index, oob in [(tm.makeIntIndex(5), False),
(tm.makeRangeIndex(5), False),
(tm.makeIntIndex(5) + 10, True)]:
for index, oob in [(Int64Index(range(5)), False),
(RangeIndex(5), False),
(Int64Index(range(5)) + 10, True)]:

# s is an in-range index
s = Series(range(5), index=index)
Expand Down Expand Up @@ -486,9 +486,8 @@ def f():
def test_slice_integer_frame_getitem(self):

# similar to above, but on the getitem dim (of a DataFrame)
for index in [tm.makeIntIndex, tm.makeRangeIndex]:
for index in [Int64Index(range(5)), RangeIndex(5)]:

index = index(5)
s = DataFrame(np.random.randn(5, 2), index=index)

def f(idxr):
Expand Down
4 changes: 2 additions & 2 deletions pandas/util/testing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1560,11 +1560,11 @@ def makeBoolIndex(k=10, name=None):


def makeIntIndex(k=10, name=None):
return Index(lrange(k), name=name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you changing this? sure this is an int index, but the ordering is not expected. Revert this, and if you really need it then make a fixture in the appropriate places

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ordering is not expected.

Why do you expect an index of int (and not of float, str) to be sorted?

What don't you understand in the description of this PR?

Sure, I can write all the fixtures you like. It just doesn't make any sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because for testing using something like arange is the expected input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has nothing to do with the description. you are free to use sorted, unique, not-sorted whatever in a test, but that should be explicit. you are changing a global default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because for testing using something like arange is the expected input.

If you expect that, fine to me: shall I then change makeFloatIndex and makeStringIndex to return sorted indexes, so that when I shuffle them it is "explicit" that they are unsorted? I couldn't care less if standard test indexes are sorted or not, as long as they all are and I don't need to add useless lines of codes in tests.

Currently not only it's not "explicit" if tests run on sorted indexes or not - it actually depends on the type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general when you run a test on different types you want to change only the type. Not other characteristics of the index. Then sure, there are some unavoidable differences (e.g. a RangeIndex is sorted)... but apart from that, they should be as comparable as possible, so that tests can really test the different code paths.

Copy link
Member Author

@toobaz toobaz Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because for testing using something like arange is the expected input.

Actually, the concept itself of "expected input" (referring to content) is completely wrong in a test suite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls do it my way. you are changing a function that is public facing for no real reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls do it my way. you are changing a function that is public facing for no real reason.

Oh come on Jeff, the function is clearly not part of the API. I gave "real reasons", you didn't. I even proposed an alternative, which you didn't even comment... you're certainly not going to convince me by asking "pls do it my way".

By the way, you are a core dev, you can always merge and then change (e.g. the new asv tests), so if you don't like my PR and don't like to discuss... you don't actually need to waste your time here, just merge and fix to your liking.

return Index(lrange(1, k) + [0], name=name)[:k]


def makeUIntIndex(k=10, name=None):
return Index([2**63 + i for i in lrange(k)], name=name)
return Index([2**63 + i for i in (lrange(1, k) + [0])], name=name)[:k]


def makeRangeIndex(k=10, name=None):
Expand Down