-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENG/BUG: Changed NDFrame.shift to work with arbitrary axis/ndims. #4874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
looks good...can you add a test matrix, e.g. cycle thru all axes for each object, then test if it works (ideally also with different types of axes, e.g. string, datetime, period, integer)? you can use the test harness in tests/test_generic.py (just leave the exisitng tests where they are), but need a comprehensive test .... |
@jreback So was thinking about the test matrix, was thinking something like this: targets = {}
targets['dataframe'] = tm.makeDataFrame
targets['time_dataframe'] = tm.makeTimeDataFrame
class ShiftBattery(object):
def test_shift(self):
obj = self._construct()
obj.shift(1)
def test_shift_negative(self):
obj = self._construct()
obj.shift(-3)
def setup_battery(targets, battery):
battery_name = battery.__name__
# create a unittest.TestCase subclass for each target
for target, maker in targets.items():
cls_name = "Test" + battery_name + '_' + target
cls = makeClass(cls_name, battery, maker)
globals()[cls_name] = cls
def makeClass(cls_name, battery, maker):
cls = type(cls_name, (unittest.TestCase, battery), {})
cls._construct = lambda self: maker()
return cls
setup_battery(targets, ShiftBattery) The idea being that we setup a bunch of targets with different configurations (dtypes, multiindex, etc) and then run them through a battery of tests that should be common to all of them. The output would look like:
yay nay? edit: modified code |
that's fancy! you can also just do it in a function, where you run thru the index types, something like this: https://github.com/pydata/pandas/pull/4850/files#L12R1557 but what you are doing is fine too (as you create separate tests so easy to see where stuff fails) |
Constructing test cases that dynamically makes the tests harder to follow |
I was apprehensive about the indirection too, but I couldn't get test generators/for loops to output enough useful info for each sub-test. Take this example: ind = pd.date_range(start="2000", freq="D", periods=10)
targets = {}
targets['int_series'] = lambda : pd.Series(range(10))
targets['bool_series'] = lambda : pd.Series(np.random.randn(10) > 0, index=ind)
targets['float_series'] = lambda : pd.Series(np.random.randn(10))
targets['string_series'] = lambda : pd.Series(list('asdfqwerzx'), index=ind)
class ShiftBattery(object):
def test_check(self):
obj = self._construct()
if obj.is_time_series:
assert False, "Don't support time series"
setup_battery(targets, ShiftBattery)
class TestFor(unittest.TestCase):
def test_for(self):
def check(obj):
if obj.is_time_series:
assert False, "Don't support time series"
for obj in targets.values():
check(obj())
class TestGenerator(object):
def test_for(self):
def check(obj):
if obj.is_time_series:
assert False, "Don't support time series"
for obj in targets.values():
yield check, obj() For loop:
Generator
Battery:
The Battery test is the only one that runs all tests and outputs meaningful nose output. Plus, the Battery doesn't have to know about the targets and can remain simple. |
I would prefer not to inject into globals, when you can use regular inheritance to do the same exact thing. class ShiftBattery(object):
def test_shift(self):
self.obj.shift(1)
def test_shift_negative(self):
self.obj.shift(-3)
class TestShiftDataFrame(ShiftBattery, unittest.TestCase):
def setUp(self):
self.obj = DataFrame(range(10))
class TestShiftPanel(ShiftBattery, unittest.TestCase):
def setUp(self):
self.obj = tm.makePanel() Sometimes it's nice to make it clear that something works the same way across all objects: class TestShift(unittest.TestCase):
def setUp(self):
self.objects = [tm.makePanel(), tm.makeCustomDataFrame(5, 5)]
def test_shift1(self):
for obj in self.objects:
obj.shift(1) |
As an added bonus, if later you realize that there's an edge case for Panel or Series or something, you can add a test just to that class. |
Alternatively, you can create a helper method for everything class TestShift(unittest.TestCase):
def _check_int_shift(self, obj):
self.obj.shift(1)
self.obj.shift(-3)
def test_int_series(self):
self._check_int_shift(self, int_series)
def test_bool_series(self):
self._check_bool_shift(self, bool_series) |
@jreback can you take a look at https://gist.github.com/dalejung/6705707. It's just a small util I've been using when generating tests, but I'm not sure if it should go into pandas and if so where. |
you could put that in that idiom is used in lots of locations in the library.... if you want to split this off, and then search and replace where something like this is used would be great here's some places its used now (might be more):
also in |
cool cool. Is slicer the correct term for this? I couldn't figure out what to call a tuple of slices. I like slicer. On Sep 25, 2013, at 4:51 PM, jreback [email protected] wrote:
|
you may want to change the args a bit as sometimes I just pass in an indexer directly for the axis in question (this is an internal function, so can prob just add a kw, or maybe make it the first arg) e.g.
|
You should change _missing to Ellipsis. Avoids an additional global.
|
Also, are you sure that works under Python 3? Some comparisons were |
Wouldn't using |
@dalejung I think it will produce the same thing if you set stop to None (because you no longer need to test for start <= 0) and if you pass start > 0, then you want to be slicing def slicer(start, stop=None, axis=None):
assert axis is not None, "Must pass an axis to slicer"
assert isinstance(axis, int), "Must pass an integer axis to slicer"
if isinstance(start, slice):
key = start
else:
if stop is None and start > 0:
stop, start = start, stop
key = slice(start, stop)
slices = [slice(None) for x in range(axis+1)]
slices[axis] = key
return tuple(slices) |
@jtratner this is a purleey internal method simply to make easy constructing multiple axis slices (rather than 'manually' doing this in places) |
@jtratner the original behaved a bit differently, slicer(3, axis=0) => (slice(None, 3, None,)
slicer(3, None, axis=0) => (slice(3, None, None),) # explict None doesn't flip. I'm not really tied to that behavior. Going back to missing sentinels, are we standardized on |
I'm weakly -1 on using Ellipsis. What's wrong with None? |
@dalejung you saw my comment above? this has to be able to take an indexer (that doesn't happen to be a slice) |
@dalejung Yep .. i see .. hm |
@cpcloud well, it's not meant for unimplemented paths. It's for setting default behaviors. def func(x=None):
# if x is missing, default to 10
if x is None:
x = 10
... The problem with the above setup is that you can't tell the difference between def func(x=_missing):
if x is _missing:
x = 10 handles both |
@dalejung how's this coming? waiting for axis slicer, IIRC? |
@dalejung hows this coming along? |
@dalejung want to rebase this? |
@dalejung this should be pretty easy to do now (the underlying pls rebase |
@dalejung still working on this? |
@jreback Actually, the underlying |
hmm I think the intent was to integrate Panel.shift |
closes #4867. First part of changing
Panel.shift
is to makeNDFrame.shift
work with any axis and number of dimensions.Until Panel uses it, the only case of this is
DataFrame.shift(axis='columns')
which is added as a test.