ENG/BUG: Changed NDFrame.shift to work with arbitrary axis/ndims. #4874

dalejung · 2013-09-18T14:47:33Z

closes #4867. First part of changing Panel.shift is to make NDFrame.shift work with any axis and number of dimensions.

Until Panel uses it, the only case of this is DataFrame.shift(axis='columns') which is added as a test.

jreback · 2013-09-18T14:52:26Z

looks good...can you add a test matrix, e.g. cycle thru all axes for each object, then test if it works (ideally also with different types of axes, e.g. string, datetime, period, integer)?

you can use the test harness in tests/test_generic.py (just leave the exisitng tests where they are), but need a comprehensive test ....

dalejung · 2013-09-19T22:12:05Z

@jreback So was thinking about the test matrix, was thinking something like this:

targets = {}
targets['dataframe'] = tm.makeDataFrame
targets['time_dataframe'] = tm.makeTimeDataFrame

class ShiftBattery(object):
    def test_shift(self):
        obj = self._construct()
        obj.shift(1)

    def test_shift_negative(self):
        obj = self._construct()
        obj.shift(-3)

def setup_battery(targets, battery):
    battery_name = battery.__name__
    # create a unittest.TestCase subclass for each target
    for target, maker in targets.items():
        cls_name = "Test" + battery_name + '_' + target
        cls = makeClass(cls_name, battery, maker)
        globals()[cls_name] = cls

def makeClass(cls_name, battery, maker):
    cls = type(cls_name, (unittest.TestCase, battery), {})
    cls._construct = lambda self: maker()
    return cls

setup_battery(targets, ShiftBattery)

The idea being that we setup a bunch of targets with different configurations (dtypes, multiindex, etc) and then run them through a battery of tests that should be common to all of them.

The output would look like:

test_shift (__main__.TestShiftBattery_dataframe) ... ok
test_shift_negative (__main__.TestShiftBattery_dataframe) ... ok
test_shift (__main__.TestShiftBattery_time_dataframe) ... ok
test_shift_negative (__main__.TestShiftBattery_time_dataframe) ... ok

yay nay?

edit: modified code

jreback · 2013-09-19T22:20:57Z

that's fancy! you can also just do it in a function, where you run thru the index types, something like this: https://github.com/pydata/pandas/pull/4850/files#L12R1557

but what you are doing is fine too (as you create separate tests so easy to see where stuff fails)

jtratner · 2013-09-19T22:35:34Z

Constructing test cases that dynamically makes the tests harder to follow
and harder to maintain. It's really important that test cases should be
simple to trace. If you can't just use a few for loops with arguments (or
helper functions), you could try is creating a baseclass like ShiftBase,
and then subclassing it with different setups for Series, Panel, etc. (or
whatever you're looking at).

dalejung · 2013-09-20T01:53:59Z

I was apprehensive about the indirection too, but I couldn't get test generators/for loops to output enough useful info for each sub-test. Take this example:

ind = pd.date_range(start="2000", freq="D", periods=10)
targets = {}
targets['int_series'] = lambda : pd.Series(range(10))
targets['bool_series'] = lambda : pd.Series(np.random.randn(10) > 0, index=ind)
targets['float_series'] = lambda : pd.Series(np.random.randn(10))
targets['string_series'] = lambda : pd.Series(list('asdfqwerzx'), index=ind)

class ShiftBattery(object):
    def test_check(self):
        obj = self._construct()
        if obj.is_time_series:
            assert False, "Don't support time series"

setup_battery(targets, ShiftBattery)

class TestFor(unittest.TestCase):
    def test_for(self):
        def check(obj):
            if obj.is_time_series:
                assert False, "Don't support time series"

        for obj in targets.values():
            check(obj())

class TestGenerator(object):
    def test_for(self):
        def check(obj):
            if obj.is_time_series:
                assert False, "Don't support time series"

        for obj in targets.values():
            yield check, obj()

For loop:

======================================================================
FAIL: test_for (pandas.tests.test_examples.TestFor)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/datacaliber/Dropbox/Projects/trading/python/externals/pandas/pandas/tests/test_examples.py", line 34, in test_for
    check(obj())
  File "/Users/datacaliber/Dropbox/Projects/trading/python/externals/pandas/pandas/tests/test_examples.py", line 31, in check
    assert False, "Don't support time series"
AssertionError: Don't support time series

Generator

======================================================================
FAIL: pandas.tests.test_examples.TestGenerator.test_for(2000-09-19    a
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/datacaliber/Dropbox/Projects/trading/python/externals/nose/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/datacaliber/Dropbox/Projects/trading/python/externals/pandas/pandas/tests/test_examples.py", line 40, in check
    assert False, "Don't support time series"
AssertionError: Don't support time series

======================================================================
FAIL: pandas.tests.test_examples.TestGenerator.test_for(2000-09-19    False
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/datacaliber/Dropbox/Projects/trading/python/externals/nose/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/datacaliber/Dropbox/Projects/trading/python/externals/pandas/pandas/tests/test_examples.py", line 40, in check
    assert False, "Don't support time series"
AssertionError: Don't support time series

Battery:

======================================================================
FAIL: test_check (pandas.tests.test_examples.TestShiftBattery_bool_series)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/datacaliber/Dropbox/Projects/trading/python/externals/pandas/pandas/tests/test_examples.py", line 23, in test_check
    assert False, "Don't support time series"
AssertionError: Don't support time series

======================================================================
FAIL: test_check (pandas.tests.test_examples.TestShiftBattery_string_series)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/datacaliber/Dropbox/Projects/trading/python/externals/pandas/pandas/tests/test_examples.py", line 23, in test_check
    assert False, "Don't support time series"
AssertionError: Don't support time series

----------------------------------------------------------------------

The Battery test is the only one that runs all tests and outputs meaningful nose output. Plus, the Battery doesn't have to know about the targets and can remain simple.

jtratner · 2013-09-20T02:29:07Z

I would prefer not to inject into globals, when you can use regular inheritance to do the same exact thing.

class ShiftBattery(object):
    def test_shift(self):
        self.obj.shift(1)

    def test_shift_negative(self):
        self.obj.shift(-3)

class TestShiftDataFrame(ShiftBattery, unittest.TestCase):
    def setUp(self):
        self.obj = DataFrame(range(10))

class TestShiftPanel(ShiftBattery, unittest.TestCase):
    def setUp(self):
        self.obj = tm.makePanel()

Sometimes it's nice to make it clear that something works the same way across all objects:

class TestShift(unittest.TestCase):
    def setUp(self):
        self.objects = [tm.makePanel(), tm.makeCustomDataFrame(5, 5)]
    def test_shift1(self):
        for obj in self.objects:
           obj.shift(1)

jtratner · 2013-09-20T02:30:08Z

As an added bonus, if later you realize that there's an edge case for Panel or Series or something, you can add a test just to that class.

jtratner · 2013-09-20T02:34:20Z

Alternatively, you can create a helper method for everything

class TestShift(unittest.TestCase):
    def _check_int_shift(self, obj):
        self.obj.shift(1)
        self.obj.shift(-3)
    def test_int_series(self):
        self._check_int_shift(self, int_series)
    def test_bool_series(self):
        self._check_bool_shift(self, bool_series)

dalejung · 2013-09-25T20:44:13Z

@jreback can you take a look at https://gist.github.com/dalejung/6705707. It's just a small util I've been using when generating tests, but I'm not sure if it should go into pandas and if so where.

jreback · 2013-09-25T20:51:55Z

@dalejung

you could put that in core/indexing.py.

that idiom is used in lots of locations in the library....

if you want to split this off, and then search and replace where something like this is used would be great

here's some places its used now (might be more):

./pandas/core/indexing.py:
  864 :         tup = [slice(None, None) for _ in range(self.ndim)]
./pandas/core/internals.py:
 2298 :         slicer = [slice(None, None) for _ in range(self.ndim)]
 2403 :         slicer = [slice(None, None) for _ in range(self.ndim)]
./pandas/io/pytables.py:
 3793 :         slicer = [ slice(None, None) ] * obj.ndim

also in pandas/tests/testing_index.py, called _axify

dalejung · 2013-09-25T21:10:00Z

cool cool. Is slicer the correct term for this? I couldn't figure out what to call a tuple of slices. I like slicer.

On Sep 25, 2013, at 4:51 PM, jreback [email protected] wrote:

@dalejung

you could put that in core/indexing.py.

that idiom is used in lots of locations in the library....

if you want to split this off, and then search and replace where something like this is used would be great

here's some places its used now (might be more):

./pandas/core/indexing.py:
864 : tup = [slice(None, None) for _ in range(self.ndim)]
./pandas/core/internals.py:
2298 : slicer = [slice(None, None) for _ in range(self.ndim)]
2403 : slicer = [slice(None, None) for _ in range(self.ndim)]
./pandas/io/pytables.py:
3793 : slicer = [ slice(None, None) ] * obj.ndim
—
Reply to this email directly or view it on GitHub.

jreback · 2013-09-25T21:13:11Z

_axis_slicer? or _slicer ok...

you may want to change the args a bit as sometimes I just pass in an indexer directly for the axis in question

(this is an internal function, so can prob just add a kw, or maybe make it the first arg)

e.g.

def _slicer(indexer=None, start=None, stop=None, axis=None)

jtratner · 2013-09-25T21:29:02Z

You should change _missing to Ellipsis. Avoids an additional global.
On Sep 25, 2013 5:13 PM, "jreback" [email protected] wrote:

_axis_slicer? or _slicer ok...

you may want to change the args a bit as sometimes I just pass in an
indexer directly for the axis in question

(this is an internal function, so can prob just add a kw, or maybe make it
the first arg)

e.g.

def _slicer(indexer=None, start=None, stop=None, axis=None)

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4874#issuecomment-25125266
.

jtratner · 2013-09-25T21:29:02Z

Also, are you sure that works under Python 3? Some comparisons were
disallowed in Py3

dalejung · 2013-09-25T22:07:46Z

Wouldn't using Ellipsis as a missing sentinel be confusing considering its only real use is in numerical python?

jtratner · 2013-09-25T22:33:43Z

@dalejung I think it will produce the same thing if you set stop to None (because you no longer need to test for start <= 0) and if you pass start > 0, then you want to be slicing [:3] so you swap. Is there ever a time where None would not have that meaning?

def slicer(start, stop=None, axis=None):
    assert axis is not None, "Must pass an axis to slicer"
    assert isinstance(axis, int), "Must pass an integer axis to slicer"
    if isinstance(start, slice):
       key = start
   else:
       if stop is None and start > 0:
           stop, start = start, stop
       key = slice(start, stop)
   slices = [slice(None) for x in range(axis+1)]
   slices[axis] = key
   return tuple(slices)

jreback · 2013-09-25T22:50:19Z

@jtratner this is a purleey internal method simply to make easy constructing multiple axis slices (rather than 'manually' doing this in places)

dalejung · 2013-09-25T23:20:43Z

@jtratner the original behaved a bit differently,

slicer(3, axis=0) => (slice(None, 3, None,)
slicer(3, None, axis=0) => (slice(3, None, None),) # explict None doesn't flip.

I'm not really tied to that behavior. Going back to missing sentinels, are we standardized on Ellipsis? AFAIK, its use is predicated on no one using it outside of numerical python. Wouldn't it be confusing with pandas, especially within an indexing related file.

cpcloud · 2013-09-25T23:30:33Z

I'm weakly -1 on using Ellipsis. What's wrong with None?

jreback · 2013-09-26T00:04:17Z

@dalejung you saw my comment above? this has to be able to take an indexer (that doesn't happen to be a slice)
(

dalejung · 2013-09-26T00:24:40Z

@jreback yup.

@cpcloud Using a missing sentinel is for when func(x) is semantically different than func(x, None). Basically when None is itself a valid argument.

cpcloud · 2013-09-26T00:26:38Z

@dalejung Yep .. i see .. hm NotImplemented could be a possibility

dalejung · 2013-09-26T01:13:36Z

@cpcloud well, it's not meant for unimplemented paths. It's for setting default behaviors.

def func(x=None):
    # if x is missing, default to 10
    if x is None:
        x = 10
    ...

The problem with the above setup is that you can't tell the difference between func() and func(None) which is a problem if None is valid for x. In the above, you can never pass in None and have it pass the check. In most cases, a regular if x is None: check is fine because None is not valid and is itself the missing sentinel.

def func(x=_missing):
    if x is _missing:
        x = 10

handles both None and allows default missing behavior. _missing is basically any object that's guaranteed to never be passed in; object(), Ellipsis, etc.

jreback · 2013-09-30T12:22:01Z

@dalejung how's this coming? waiting for axis slicer, IIRC?

jreback · 2013-10-11T11:45:46Z

@dalejung hows this coming along?

jreback · 2014-01-03T20:51:07Z

@dalejung want to rebase this?

jreback · 2014-02-16T21:51:09Z

@dalejung this should be pretty easy to do now (the underlying NDFrame.shift is fixed to handle axes better)....

pls rebase

jreback · 2014-03-09T15:07:18Z

@dalejung still working on this?

dalejung · 2014-03-09T22:44:51Z

@jreback Actually, the underlying NDFrame.shift change makes this PR redundant.

jreback · 2014-03-09T22:55:11Z

hmm

I think the intent was to integrate Panel.shift
which has special handling ATM

dalejung · 2014-03-11T20:12:19Z

@jreback Not you're right. I had split up the Panel.shift into two PRs since they addressed separate tangential issues. #6605 is the panel implementation.

dalejung mentioned this pull request Sep 27, 2013

Axis slicer #4994

Closed

jreback closed this Jan 3, 2014

jreback reopened this Jan 3, 2014

jreback added the Reshaping label Feb 16, 2014

TST: test that DataFrame.shift(axis='columns') works

9819be4

dalejung closed this Mar 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENG/BUG: Changed NDFrame.shift to work with arbitrary axis/ndims. #4874

ENG/BUG: Changed NDFrame.shift to work with arbitrary axis/ndims. #4874

dalejung commented Sep 18, 2013

jreback commented Sep 18, 2013

dalejung commented Sep 19, 2013

jreback commented Sep 19, 2013

jtratner commented Sep 19, 2013

dalejung commented Sep 20, 2013

jtratner commented Sep 20, 2013

jtratner commented Sep 20, 2013

jtratner commented Sep 20, 2013

dalejung commented Sep 25, 2013

jreback commented Sep 25, 2013

dalejung commented Sep 25, 2013

jreback commented Sep 25, 2013

jtratner commented Sep 25, 2013

jtratner commented Sep 25, 2013

dalejung commented Sep 25, 2013

jtratner commented Sep 25, 2013

jreback commented Sep 25, 2013

dalejung commented Sep 25, 2013

cpcloud commented Sep 25, 2013

jreback commented Sep 26, 2013

dalejung commented Sep 26, 2013

cpcloud commented Sep 26, 2013

dalejung commented Sep 26, 2013

jreback commented Sep 30, 2013

jreback commented Oct 11, 2013

jreback commented Jan 3, 2014

jreback commented Feb 16, 2014

jreback commented Mar 9, 2014

dalejung commented Mar 9, 2014

jreback commented Mar 9, 2014

dalejung commented Mar 11, 2014

ENG/BUG: Changed NDFrame.shift to work with arbitrary axis/ndims. #4874

ENG/BUG: Changed NDFrame.shift to work with arbitrary axis/ndims. #4874

Conversation

dalejung commented Sep 18, 2013

jreback commented Sep 18, 2013

dalejung commented Sep 19, 2013

jreback commented Sep 19, 2013

jtratner commented Sep 19, 2013

dalejung commented Sep 20, 2013

jtratner commented Sep 20, 2013

jtratner commented Sep 20, 2013

jtratner commented Sep 20, 2013

dalejung commented Sep 25, 2013

jreback commented Sep 25, 2013

dalejung commented Sep 25, 2013

jreback commented Sep 25, 2013

jtratner commented Sep 25, 2013

jtratner commented Sep 25, 2013

dalejung commented Sep 25, 2013

jtratner commented Sep 25, 2013

jreback commented Sep 25, 2013

dalejung commented Sep 25, 2013

cpcloud commented Sep 25, 2013

jreback commented Sep 26, 2013

dalejung commented Sep 26, 2013

cpcloud commented Sep 26, 2013

dalejung commented Sep 26, 2013

jreback commented Sep 30, 2013

jreback commented Oct 11, 2013

jreback commented Jan 3, 2014

jreback commented Feb 16, 2014

jreback commented Mar 9, 2014

dalejung commented Mar 9, 2014

jreback commented Mar 9, 2014

dalejung commented Mar 11, 2014