Skip to content

Random bad asserts for stat ops when running tests. #6982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dalejung opened this issue Apr 27, 2014 · 11 comments · Fixed by #6985 or #6990
Closed

Random bad asserts for stat ops when running tests. #6982

dalejung opened this issue Apr 27, 2014 · 11 comments · Fixed by #6985 or #6990
Labels
Testing pandas testing functions or related to the test suite
Milestone

Comments

@dalejung
Copy link
Contributor

======================================================================
FAIL: test_sum (pandas.tests.test_frame.TestDataFrame)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python2.6_with_system_site_packages/lib/python2.6/site-packages/pandas/tests/test_frame.py", line 10590, in test_sum
    has_numeric_only=True, check_dtype=False, check_less_precise=True)
  File "/home/travis/virtualenv/python2.6_with_system_site_packages/lib/python2.6/site-packages/pandas/tests/test_frame.py", line 10780, in _check_stat_op
    check_less_precise=check_less_precise)  # HACK: win32
  File "/home/travis/virtualenv/python2.6_with_system_site_packages/lib/python2.6/site-packages/pandas/util/testing.py", line 513, in assert_series_equal
    assert_almost_equal(left.values, right.values, check_less_precise)
  File "testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas/src/testing.c:2554)
  File "testing.pyx", line 93, in pandas._testing.assert_almost_equal (pandas/src/testing.c:1796)
  File "testing.pyx", line 140, in pandas._testing.assert_almost_equal (pandas/src/testing.c:2387)
AssertionError: expected 0.00144 but got 0.00144
@dalejung
Copy link
Contributor Author

@jreback
It looks like the issue is that nanops specify the dtype_max when calling the ops. ff7bb2c seems to be the culprit.

In the case that I found, the unit test is checking a float32 frame vs that same frame up-casted to float64.

http://nbviewer.ipython.org/gist/anonymous/11349526

@jreback
Copy link
Contributor

jreback commented Apr 27, 2014

it could be but this error has been around a while actually

the issue is that np.sum is used as the comparison which should be passing .sum(dtype='float32') in this case

(the actual pandas routines are correct) after the fix above

@jreback jreback added this to the 0.14.0 milestone Apr 27, 2014
@jreback
Copy link
Contributor

jreback commented Apr 27, 2014

pls submit a PR for this if you can (you can pass lambda x: np.sum(dtype='float32') instead of np.sum I think. This is sort a 'numpy' issue, really as np.sum is really doing the wrong thing

@jreback
Copy link
Contributor

jreback commented Apr 27, 2014

@dalejung I put up #6985, I *think * this should fix....can you reproduce reliably?

@jreback
Copy link
Contributor

jreback commented Apr 27, 2014

@dalejung if you notice any more pls lmk

@jreback
Copy link
Contributor

jreback commented Apr 28, 2014

@dalejung not sure my fix actually fixed this....!

@dalejung
Copy link
Contributor Author

dalejung commented May 2, 2014

@jreback Hey, the last PR fixes this reliably. Just to clarify, if I have an array of float32, using a float64 accumulator is the correct behavior?

@jreback
Copy link
Contributor

jreback commented May 2, 2014

you can get away with the default accumulator on 64-bit systems because the default actually IS float64; however on 32-bit it breaks, but it will STILL work as long as it doesn't overflow.

so you need an overflow on 32-bit to fail, BUT using a 64-bit accumular is always safe, I think (and that's what I did)

@dalejung
Copy link
Contributor Author

dalejung commented May 2, 2014

hm, I wasn't even thinking about overflow :/. I was more concerned about the output being different based on the accumulator. Not sure which output is technical correct. Like, the float64 is obviously more correct but I wasn't sure if there was an expectation of using float32 throughout the process.

@jreback
Copy link
Contributor

jreback commented May 2, 2014

I think it should be the same (though precision could affect it), so they could be slightly differently if accumulating really small numbers (that barely fit in float32). I would just always use float64, unless you have a really good reason.

@dalejung
Copy link
Contributor Author

dalejung commented May 2, 2014

Agreed. I always use float64 throughout so this is the first time I've given it any thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
2 participants