Skip to content

failing test_moments after centered moving window introduced #2490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Dec 11, 2012 · 42 comments
Closed

failing test_moments after centered moving window introduced #2490

ghost opened this issue Dec 11, 2012 · 42 comments
Labels
Testing pandas testing functions or related to the test suite
Milestone

Comments

@ghost
Copy link

ghost commented Dec 11, 2012

I'm getting sporadic failures in
FAIL: test_rolling_std (pandas.stats.tests.test_moments.TestMoments)
FAIL: test_rolling_sum (pandas.stats.tests.test_moments.TestMoments)
FAIL: test_rolling_kurt (pandas.stats.tests.test_moments.TestMoments)
FAIL: test_rolling_max (pandas.stats.tests.test_moments.TestMoments)
and related

git bisected to: 915d261

I've upgraded to statsmodels '0.5.0.dev-c9062e4', did a clean/develop
but am still getting failures.

any ideas? @changhiskhan

@changhiskhan
Copy link
Contributor

Do you have the failing output handy? I'll take a look, thanks.

On Tue, Dec 11, 2012 at 10:16 AM, y-p [email protected] wrote:

I'm getting sporadic failures in
FAIL: test_rolling_std (pandas.stats.tests.test_moments.TestMoments)
FAIL: test_rolling_sum (pandas.stats.tests.test_moments.TestMoments)
FAIL: test_rolling_kurt (pandas.stats.tests.test_moments.TestMoments)
FAIL: test_rolling_max (pandas.stats.tests.test_moments.TestMoments)
and related

git bisected to: 915d261915d261d298c54e887

I've upgraded to statsmodels '0.5.0.dev-c9062e4', did a clean/develop
but am still getting failures.

any ideas? @changhiskhan https://github.com/changhiskhan


Reply to this email directly or view it on GitHubhttps://github.com//issues/2490.

@ghost
Copy link
Author

ghost commented Dec 11, 2012

======================================================================
FAIL: test_rolling_count (pandas.stats.tests.test_moments.TestMoments)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 46, in test_rolling_count
    fill_value=0)
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 320, in _check_moment_func
    fill_value=fill_value)
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 380, in _check_ndarray
    assert_almost_equal(result[1], expected[10])
  File "/home/user1/src/pandas/pandas/util/testing.py", line 124, in assert_almost_equal
    1, a / b, decimal=5, err_msg=err_msg(a, b), verbose=False)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 468, in assert_almost_equal
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 5 decimals expected 13.00000 but got 1.00000

======================================================================
FAIL: test_rolling_mean (pandas.stats.tests.test_moments.TestMoments)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 49, in test_rolling_mean
    self._check_moment_func(mom.rolling_mean, np.mean)
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 320, in _check_moment_func
    fill_value=fill_value)
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 387, in _check_ndarray
    self.assert_(np.isnan(result[14]))
AssertionError: False is not true

======================================================================
FAIL: test_rolling_min (pandas.stats.tests.test_moments.TestMoments)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 173, in test_rolling_min
    self._check_moment_func(mom.rolling_min, np.min)
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 320, in _check_moment_func
    fill_value=fill_value)
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 387, in _check_ndarray
    self.assert_(np.isnan(result[14]))
AssertionError: False is not true

======================================================================
FAIL: test_rolling_std (pandas.stats.tests.test_moments.TestMoments)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 235, in test_rolling_std
    lambda x: np.std(x, ddof=1))
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 320, in _check_moment_func
    fill_value=fill_value)
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 387, in _check_ndarray
    self.assert_(np.isnan(result[14]))
AssertionError: False is not true

----------------------------------------------------------------------

@wesm
Copy link
Member

wesm commented Dec 11, 2012

I have no idea and I can't reproduce (it doesn't seem to have anything to do with the scikits.timeseries testing dependency introduced). I'm going to cut the beta and let you guys debug

@ghost
Copy link
Author

ghost commented Dec 11, 2012

can either of you post output of updated ci/print_versions.py from here

@wesm
Copy link
Member

wesm commented Dec 11, 2012

INSTALLED VERSIONS
------------------
Python: 2.7.2.final.0
Cython: 0.17.2
Numpy: 1.6.1
Scipy: 0.10.0
statsmodels: 0.4.3
scikits.timeseries: 0.91.3
dateutil: 2.1
pytz: 2012d
PyTables: 2.3.1
matplotlib: 1.2.0
openpyxl: 1.6.1
xlrd: 0.7.1
xlwt: 0.7.2
sqlalchemy: 0.7.1


have stats

@changhiskhan
Copy link
Contributor

INSTALLED VERSIONS

Python: 2.7.3.final.0
Cython: 0.18-pre
Numpy: 1.6.1
Scipy: 0.10.1
statsmodels: 0.5.0.dev-7720064
scikits.timeseries: 0.91.3
dateutil: 1.5
pytz: 2011n
PyTables: 2.3.1
matplotlib: 1.1.0
openpyxl: 1.6.1
xlrd: 0.7.6
xlwt: 0.7.3
sqlalchemy: 0.7.6

have stats

@ghost
Copy link
Author

ghost commented Dec 11, 2012

yeah, nothing there.

INSTALLED VERSIONS
------------------
Python: 2.7.3.candidate.2
Cython: 0.17.2
Numpy: 1.6.2
Scipy: 0.10.1
statsmodels: 0.5.0.dev-c9062e4
scikits.timeseries: 0.91.3
dateutil: 1.5
pytz: 2012d
PyTables: 2.4.0
matplotlib: 1.1.1rc2
openpyxl: 1.5.8
xlrd: 0.6.1
xlwt: 0.7.4
sqlalchemy: 0.7.8

tried in a virtualenv, and also a fresh clone - still failing.
no failures on travis either. hrmm.

@changhiskhan
Copy link
Contributor

When you say "sporadic", how often is it occurring? Are you running the multi-processing test suite or regular?

@ghost
Copy link
Author

ghost commented Dec 11, 2012

test_fast or nosetests pandas.stats.tests.test_moment both produce failures.
it fails almost always, but which specific tests fail changes between runs, ocassionaly
it passes without failure.

here's a break on failed comparison in from test_Rolling_median->check_moment_func->check_ndarray:

> /home/user1/src/pandas/pandas/stats/tests/test_moments.py(387)_check_ndarray()
-> self.assert_(np.isnan(result[14]))
(Pdb) l
382                     self.assert_(np.isnan(result[-9:]).all())
383                 else:
384                     self.assert_((result[-9:] == 0).all())
385                 if has_min_periods:
386                     self.assert_(np.isnan(expected[23]))
387  ->                 self.assert_(np.isnan(result[14]))
388                     self.assert_(np.isnan(expected[-5]))
389                     self.assert_(np.isnan(result[-14]))
390     
391         def _check_structures(self, func, static_comp,
392                               has_min_periods=True, has_time_rule=True,
(Pdb) result
array([        nan,         nan,         nan,         nan,  0.37457507,
        0.30156993,  0.22856479,  0.30628624,  0.22856479,         nan,
               nan,         nan,         nan,  0.37457507,  0.30156993,
        0.22856479,  0.30628624,  0.22856479,         nan,         nan,
               nan,         nan,  0.37457507,  0.30156993,  0.22856479,
        0.30628624,  0.22856479,         nan,         nan,         nan,
               nan,  0.37457507,  0.30156993,  0.22856479,  0.30628624,
        0.22856479,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan])
(Pdb) expected
array([        nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,  0.06942868,
        0.05451945,  0.06942868,  0.14899674,  0.22856479,  0.30156993,
        0.14899674,  0.30156993,  0.30156993,  0.30156993,  0.14899674,
        0.30156993,  0.30156993,  0.37929138,  0.37929138,  0.37929138,
        0.37457507,  0.30156993,  0.22856479,  0.30628624,  0.22856479,
               nan,         nan,         nan,         nan,         nan])

@changhiskhan
Copy link
Contributor

Try replacing _center_window in moments.py with this:

def _center_window(rs, window, axis):
    offset = int((window - 1) / 2.)
    if isinstance(rs, (Series, DataFrame, Panel)):
        rs = rs.shift(-offset, axis=axis)
    else:
        rs_indexer = [slice(None)] * rs.ndim
        rs_indexer[axis] = slice(None, -offset)

        lead_indexer = [slice(None)] * rs.ndim
        lead_indexer[axis] = slice(offset, None)

        na_indexer = [slice(None)] * rs.ndim
        na_indexer[axis] = slice(-offset, None)

        rs[tuple(rs_indexer)] = rs[tuple(lead_indexer)]
        rs[tuple(na_indexer)] = np.nan
    return rs

@ghost
Copy link
Author

ghost commented Dec 11, 2012

no effect.

@changhiskhan
Copy link
Contributor

The result looks completely mangled. center=True should just be a shift operation. If you look at moments.py, the only difference is that _center_window is called after computing the results.

Can you pdb into _center_window and see what's going on?

@ghost
Copy link
Author

ghost commented Dec 11, 2012

rs at entry to center_window in test_rolling_apply

[        nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
  0.21432634  0.25726326  0.26078034  0.31504732  0.24231212  0.24028765
  0.28524033  0.25578755  0.22841107  0.18792591  0.19638916  0.25554294
  0.32495223  0.14671714  0.12936415  0.15640616  0.219897    0.26083415
  0.2166916   0.19919935  0.22116124         nan         nan         nan
         nan         nan]
n

rs returned from center_window in test_rolling_apply

[        nan         nan         nan         nan  0.219897    0.26083415
  0.2166916   0.19919935  0.22116124         nan         nan         nan
         nan  0.219897    0.26083415  0.2166916   0.19919935  0.22116124
         nan         nan         nan         nan  0.219897    0.26083415
  0.2166916   0.19919935  0.22116124         nan         nan         nan
         nan  0.219897    0.26083415  0.2166916   0.19919935  0.22116124
         nan         nan         nan         nan         nan         nan
         nan         nan         nan         nan         nan         nan
         nan         nan]

@ghost
Copy link
Author

ghost commented Dec 11, 2012

is this the proper behaviour for shift?

mkdf(10,1,r_idx_type='dt')
Out[10]: 
C0         C_l0_g0
R0                
2000-01-03    R0C0
2000-01-04    R1C0
2000-01-05    R2C0
2000-01-06    R3C0
2000-01-07    R4C0
2000-01-10    R5C0
2000-01-11    R6C0
2000-01-12    R7C0
2000-01-13    R8C0
2000-01-14    R9C0

a=mkdf(10,1,r_idx_type='dt')
a.shift()
Out[12]: 
C0         C_l0_g0
R0                
2000-01-03     NaN
2000-01-04    R0C0
2000-01-05    R1C0
2000-01-06    R2C0
2000-01-07    R3C0
2000-01-10    R4C0
2000-01-11    R5C0
2000-01-12    R6C0
2000-01-13    R7C0
2000-01-14    R8C0

@changhiskhan
Copy link
Contributor

yes, but _center_window implements shift for ndarrays separately.

take a look at the following:

rs[tuple(rs_indexer)]
rs[tuple(na_indexer)]
rs[tuple(lag_indexer)]

And also what each *_indexer is

@ghost
Copy link
Author

ghost commented Dec 11, 2012

i think that's ok
...shorter example

        print(1, rs[tuple(rs_indexer)])
        print(2, rs[tuple(na_indexer)])
        print(3, rs[tuple(lead_indexer)])
        print(4, rs_indexer)
        print(5, na_indexer)
        print(6, lead_indexer)
======================================================================
FAIL: test_rolling_max (pandas.stats.tests.test_moments.TestMoments)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 182, in test_rolling_max
    self._check_moment_func(mom.rolling_max, np.max)
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 320, in _check_moment_func
    fill_value=fill_value)
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 387, in _check_ndarray
    self.assert_(np.isnan(result[14]))
AssertionError: False is not true
-------------------- >> begin captured stdout << ---------------------
(1, array([        nan,         nan,         nan,         nan,  1.34631239,
        1.34631239,  1.34631239,  1.34631239,  1.34631239,         nan,
               nan,         nan,         nan,  1.34631239,  1.34631239,
        1.34631239,  1.34631239,  1.34631239,         nan,         nan,
               nan,         nan,  1.34631239,  1.34631239,  1.34631239,
        1.34631239,  1.34631239,         nan,         nan,         nan,
               nan,  1.34631239,  1.34631239,  1.34631239,  1.34631239,
        1.34631239,         nan,         nan,         nan,         nan,
               nan]))
(2, array([ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan]))
(3, array([        nan,         nan,         nan,         nan,  1.34631239,
        1.34631239,  1.34631239,  1.34631239,  1.34631239,         nan,
               nan,         nan,         nan,  1.34631239,  1.34631239,
        1.34631239,  1.34631239,  1.34631239,         nan,         nan,
               nan,         nan,  1.34631239,  1.34631239,  1.34631239,
        1.34631239,  1.34631239,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan]))
(4, [slice(None, -9, None)])
(5, [slice(-9, None, None)])
(6, [slice(9, None, None)])

--------------------- >> end captured stdout << ----------------------

----------------------------------------------------------------------

@changhiskhan
Copy link
Contributor

nosetests bug?

@ghost
Copy link
Author

ghost commented Dec 11, 2012

λ nosetests --version
nosetests version 1.2.1

@changhiskhan
Copy link
Contributor

same.

@changhiskhan
Copy link
Contributor

Ok, try replacing _center_window with the following:

def _center_window(rs, window, axis):
    offset = int((window - 1) / 2.)
    if isinstance(rs, (Series, DataFrame, Panel)):
        rs = rs.shift(-offset, axis=axis)
    elif rs.ndim == 1:
        rs[:-offset] = rs[offset:]
        rs[-offset:] = np.nan
    else:
        rs_indexer = [slice(None)] * rs.ndim
        rs_indexer[axis] = slice(None, -offset)

        lead_indexer = [slice(None)] * rs.ndim
        lead_indexer[axis] = slice(offset, None)

        na_indexer = [slice(None)] * rs.ndim
        na_indexer[axis] = slice(-offset, None)

        rs[tuple(rs_indexer)] = rs[tuple(lead_indexer)]
        rs[tuple(na_indexer)] = np.nan
    return rs

@ghost
Copy link
Author

ghost commented Dec 11, 2012

no effect with

def _center_window(rs, window, axis):
    offset = int((window - 1) / 2.)
    if isinstance(rs, (Series, DataFrame, Panel)):
        rs = rs.shift(-offset, axis=axis)
    elif rs.ndim == 1:
        rs[:-offset] = rs[offset:]
        rs[-offset:] = np.nan
    else:
        rs_indexer = [slice(None)] * rs.ndim
        rs_indexer[axis] = slice(None, -offset)

        lead_indexer = [slice(None)] * rs.ndim
        lead_indexer[axis] = slice(offset, None)

        na_indexer = [slice(None)] * rs.ndim
        na_indexer[axis] = slice(-offset, None)

        rs[rs_indexer] = rs[lead_indexer]
        rs[na_indexer] = np.nan
    return rs

@ghost
Copy link
Author

ghost commented Dec 11, 2012

unless you've sniffed something, let's put it away for a day or two and see
if lightning strikes.

@changhiskhan
Copy link
Contributor

Yeah, I even installed numpy 1.6.2 just in case but could not repro

On Dec 11, 2012, at 4:28 PM, y-p [email protected] wrote:

unless you've sniffed something, let's put it away for a day or two and see
if lightning strikes.


Reply to this email directly or view it on GitHub.

@changhiskhan
Copy link
Contributor

have you tried diagnosing again? Hate to waste more time on this but it really bothers me that you're running into it and we can't repro.
What's your OS? 32/64-bit?

Our windows binaries are built using a Jenkins CI box and that has 2.6-3.2 both 32/64bit on windows and they all passed there...We've also tested on 64bit OSX and Ubuntu here between my and Wes's setup.

@wesm
Copy link
Member

wesm commented Dec 14, 2012

Yaroslav just e-mailed me having run into this also. maybe that will provide some insight

@ghost
Copy link
Author

ghost commented Dec 14, 2012

No I haven't.
I'm on 64bit debian wheezy testing. It fails on 2.6 and 3.2 as well.
not 3.3 though, strangely enough.

@yarikoptic
Copy link
Contributor

FWIW -- now also got the "Arrays are not almost equal to 5 decimals expected 13.00000 but got 1.00000" too on amd64 wheezy.

@wesm
Copy link
Member

wesm commented Dec 14, 2012

I could debug this pretty quickly if I can get access to a box where it fails

@yarikoptic
Copy link
Contributor

yeap... let me update/check on sparc box (meanwhile you could email me your public ssh key and desired login name as the continuation of the "private" thread we already had ;) )

@ghost
Copy link
Author

ghost commented Dec 14, 2012

fair enough.
if you don't zero in on it, we can always random.seed() and checkpoint through the steps until
we find the point of difference between good/bad boxes.

@ghost
Copy link
Author

ghost commented Dec 14, 2012

weird, I just started getting consistent test_fperr_robustness failures as well, that's new.
maybe a recent debian package update. travis is still green.

@yarikoptic
Copy link
Contributor

peek at /var/log/dpkg.log if you need to discover what got updated any
time recently

On Thu, 13 Dec 2012, y-p wrote:

weird, I just started getting consistent test_fperr_robustness failures as
well, thgat's new.
maybe a recent debian package update. travis is still green.

Yaroslav O. Halchenko
Postdoctoral Fellow, Department of Psychological and Brain Sciences
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

@ghost
Copy link
Author

ghost commented Dec 14, 2012

nothing jumps out - no libc change that I can see.
I did roll back from 2.7.3 final to the wheezy 2.7.3~rc2-2.1,
which was originally just an attempt to isolate the problem anyway.

maybe test_fperr_robustness triggers only on rc2.
In any case the others (test_rolling_x) trigger on both anyway.

@changhiskhan
Copy link
Contributor

The error has to be in _center_window. Can you step to where rs gets mutated and see what the mutation does (last two statements before the return)?

@ghost
Copy link
Author

ghost commented Dec 14, 2012

*edit: submit button slipped, edited."
I placed a random.seed call in setUp() to make the data deterministic and verified.
after testing various points in the code, I narrowed the problem down to _center_window,
as chang realized.

I noticed strange phenomena such as that adding print statements altered the probability
of pass/fail drastically (someimes this way, sometimes the other).

I then modified _center_window to:

def _center_window(rs, window, axis):
    # print( rs,window,axis)
    offset = int((window - 1) / 2.)
    if isinstance(rs, (Series, DataFrame, Panel)):
        rs = rs.shift(-offset, axis=axis)
    else:
        rs_indexer = [slice(None) for x in range(rs.ndim)] # waldo1
        print( axis,offset)
        rs_indexer[axis] = slice(None, -offset)

        lead_indexer = [slice(None) for x in range(rs.ndim)]
        lead_indexer[axis] = slice(offset, None)

        na_indexer =  [slice(None) for x in range(rs.ndim)]
        na_indexer[axis] = slice(-offset, None)

        x = rs[tuple(lead_indexer)]
        print( "x",x)
        print( "idx",rs_indexer)
        print("rs-before", rs)
        y=list(rs_indexer)
        print( "y",y)
        rs[y] = x #waldo2
        print("rs-after", rs)
        rs[tuple(na_indexer)] = np.nan
    return rs

inserted a div by zero into test_rolling_count so that the test always fails
and the captured output is presented, then ran nose in a loop on that specific test
until I caught two runs, one failing on the assertion another which only failed because
of the div by zero. compare the outputs:

======================================================================
FAIL: test_rolling_count (pandas.stats.tests.test_moments.TestMoments)
----------------------------------------------------------------------
Traceback (most recent call last):[slice(None) for x in range(rs.ndim)]
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 50, in test_rolling_count
    fill_value=0)
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 332, in _check_moment_func
    fill_value=fill_value)
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 400, in _check_ndarray
    assert_almost_equal(result[1], expected[10])
  File "/home/user1/src/pandas/pandas/util/testing.py", line 124, in assert_almost_equal
    1, a / b, decimal=5, err_msg=err_msg(a, b), verbose=False)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 468, in assert_almost_equal
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 5 decimals expected 13.00000 but got 1.00000
-------------------- >> begin captured stdout << ---------------------
(0, 9)
('x', array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,
        11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  20.,
        20.,  20.,  20.,  20.,  20.,  20.,  20.,  20.,  20.,  19.,  18.,
        17.,  16.,  15.,  14.,  13.,  12.,  11.,  10.]))
('idx', [slice(None, -9, None)])
('rs-before', array([  0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   1.,
         2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,  12.,
        13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  20.,  20.,  20.,
        20.,  20.,  20.,  20.,  20.,  20.,  20.,  19.,  18.,  17.,  16.,
        15.,  14.,  13.,  12.,  11.,  10.]))
('y', [slice(None, -9, None)])
('rs-after', array([ 14.,  13.,  12.,  11.,  19.,  18.,  17.,  16.,  15.,  14.,  13.,
        12.,  11.,  19.,  18.,  17.,  16.,  15.,  14.,  13.,  12.,  11.,
        19.,  18.,  17.,  16.,  15.,  14.,  13.,  12.,  11.,  19.,  18.,
        17.,  16.,  15.,  14.,  13.,  12.,  11.,  10.,  18.,  17.,  16.,
        15.,  14.,  13.,  12.,  11.,  10.]))

--------------------- >> end captured stdout << ----------------------

----------------------------------------------------------------------
Ran 1 test in 0.006s

FAILED (failures=1)
E
======================================================================
ERROR: test_rolling_count (pandas.stats.tests.test_moments.TestMoments)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 50, in test_rolling_count
    fill_value=0)
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 332, in _check_moment_func
    fill_value=fill_value)
  File "/home/user1/src/pandas/pandas/stats/tests/test_moments.py", line 412, in _check_ndarray
    1/0
ZeroDivisionError: integer division or modulo by zero
-------------------- >> begin captured stdout << ---------------------
(0, 9)
('x', array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,
        11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  20.,
        20.,  20.,  20.,  20.,  20.,  20.,  20.,  20.,  20.,  19.,  18.,
        17.,  16.,  15.,  14.,  13.,  12.,  11.,  10.]))
('idx', [slice(None, -9, None)])
('rs-before', array([  0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   1.,
         2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,  12.,
        13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  20.,  20.,  20.,
        20.,  20.,  20.,  20.,  20.,  20.,  20.,  19.,  18.,  17.,  16.,
        15.,  14.,  13.,  12.,  11.,  10.]))
('y', [slice(None, -9, None)])
('rs-after', array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,
        11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  20.,
        20.,  20.,  20.,  20.,  20.,  20.,  20.,  20.,  20.,  19.,  18.,
        17.,  16.,  15.,  14.,  13.,  12.,  11.,  10.,  18.,  17.,  16.,
        15.,  14.,  13.,  12.,  11.,  10.]))

--------------------- >> end captured stdout << ----------------------

Look at the code line marked "waldo2".
note that rs-after is different in each case even though all the inputs to "waldo2" don't change.

I suspected that the previously used lines at "waldo1", which was

[slice(None)] * rs.ndim)

was actualy creating multiple refs to the same object and therfore changed it to a list comprehension
to be sure. no effect though.

so it looks like the line at "waldo2", originally:

rs[tuple(rs_indexer)] = rs[tuple(lead_indexer)]

is not determinstic even though it's inputs are, or that the inputs only appear determinstic but there is
something else going on.

ghost pushed a commit that referenced this issue Dec 14, 2012
pandas issue #2490
numpy/numpy#324

manifests only on certain distors, probably libc dependent.
debian wheezy, libc 2.13-37 is affected
ubuntu precise 2.15-0ubuntu10.2 is not

These are not necessarily the earliest/latest package
revisions addected/not affected respectively.
@ghost
Copy link
Author

ghost commented Dec 14, 2012

Upgrading to 1.7.0b2 fixes the problem, this jives with me not experiencing the issue with
3.3 which uses 1.7.0b2.

I bisected things down on numpy to this commit,
which came from numpy/numpy#324.

The PR fixes a bug having to do with overlapping source and destination memcpy, which
looks like it matches the case of the line identified above.

The fact that it's a problem with memcpy also explains why debian wheezy shows the problem
while ubuntu precise does not: different libc.

Doesn't look like the fix was backported to 1.6.2.
pushed a workaround in 6cadd6c.

paging @rgommers, @certik

@ghost ghost closed this as completed Dec 14, 2012
@ghost ghost reopened this Dec 14, 2012
@certik
Copy link
Contributor

certik commented Dec 14, 2012

@y-p ---- so what should be done on the numpy side?

@wesm
Copy link
Member

wesm commented Dec 14, 2012

cc @yarikoptic so he's aware that we're looking at an upstream numpy bug in 1.6.x

@ghost
Copy link
Author

ghost commented Dec 14, 2012

@certik , backport to maintenance/1.6.x? numpy/numpy@0920bed cherry-picks
cleanly on v1.6.2, and I verified that it fixes the issue.

@certik
Copy link
Contributor

certik commented Dec 14, 2012

@y-p, would you mind sending a PR for the 1.6.x?

@changhiskhan
Copy link
Contributor

@y-p thanks**1e6

@wesm
Copy link
Member

wesm commented Dec 14, 2012

closing this bad boy if that works. thanks all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

4 participants