Skip to content

Failing interpolation test #5174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cpcloud opened this issue Oct 10, 2013 · 42 comments · Fixed by #5177 or #5362
Closed

Failing interpolation test #5174

cpcloud opened this issue Oct 10, 2013 · 42 comments · Fixed by #5177 or #5362
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Testing pandas testing functions or related to the test suite
Milestone

Comments

@cpcloud
Copy link
Member

cpcloud commented Oct 10, 2013

$ nosetests pandas/tests/test_generic.py:TestSeries.test_interp_quad
F
======================================================================
FAIL: test_interp_quad (pandas.tests.test_generic.TestSeries)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/phillip/Documents/code/py/pandas/pandas/tests/test_generic.py", line 339, in test_interp_quad
    assert_series_equal(result, expected)
  File "/home/phillip/Documents/code/py/pandas/pandas/util/testing.py", line 452, in assert_series_equal
    assert_attr_equal('dtype', left, right)
  File "/home/phillip/Documents/code/py/pandas/pandas/util/testing.py", line 369, in assert_attr_equal
    assert_equal(left_attr,right_attr,"attr is not equal [{0}]" .format(attr))
  File "/home/phillip/Documents/code/py/pandas/pandas/util/testing.py", line 354, in assert_equal
    assert a == b, "%s: %r != %r" % (msg.format(a,b), a, b)
AssertionError: attr is not equal [dtype]: dtype('int64') != dtype('float64')

----------------------------------------------------------------------
Ran 1 test in 0.041s

FAILED (failures=1)
@cpcloud
Copy link
Member Author

cpcloud commented Oct 10, 2013

cc @TomAugspurger

@TomAugspurger
Copy link
Contributor

Does this make any sense? A float block with array of items with int dtype?

ipdb> self
SingleBlockManager
Items: Int64Index([1, 2, 3, 4], dtype=int64)
FloatBlock: 4 dtype: float64

I'm in core.internals.apply

@jreback
Copy link
Contributor

jreback commented Oct 10, 2013

items are the 'index'...so that is right

@jreback
Copy link
Contributor

jreback commented Oct 10, 2013

@cpcloud where do you see this failing? I can't repro on 64 or 32-bit

@jreback
Copy link
Contributor

jreback commented Oct 10, 2013

spoke too soon!

@jreback
Copy link
Contributor

jreback commented Oct 10, 2013

@TomAugspurger that test just needs to have the expected be int64 otherwise looks fine. as an FYI, maybe need
some tests that don't infer dtypes (e.g. set downcast=False to have no inferring of the results)

@TomAugspurger
Copy link
Contributor

So are you saying to change expected to expected = Series([1, 4, 9, 16], index=[1, 2, 3, 4]) (int type), because that fails for me. I'm trying to figure out why the result [1., 4., 9., 16.] doesn't get downcast for me right now.

@jreback
Copy link
Contributor

jreback commented Oct 10, 2013

@TomAugspurger you may also want to put some more creative logic in there for inference. Since we know that we are going to only float/int coming in, you could always infer the ints so that you will get ints if possible and the floats will stay floats.

@jreback
Copy link
Contributor

jreback commented Oct 10, 2013

@TomAugspurger the result IS downcast (to int64), its the expected that is float64

@TomAugspurger
Copy link
Contributor

@TomAugspurger the result IS downcast (to int64), its the expected that is float64

Not for me:

In [5]: result = sq.interpolate(method='quadratic')
In [6]: result
Out[6]: 
1     1
2     4
3     9
4    16
dtype: float64

Can you clear this up for me? I think this is where things aren't going the same way. b is the float block with the nan interpolated.

ipdb> !b
FloatBlock: 4 dtype: float64
ipdb> !b.values
array([  1.,   4.,   9.,  16.])
ipdb> !b.downcast(downcast)[0].values  # should be ints?
array([  1.,   4.,   9.,  16.])
ipdb> downcast
'infer'

That's in pandas/core/internals.py(337)_maybe_downcast()

I'll dig a bit deeper.

@TomAugspurger
Copy link
Contributor

umm... in /pandas/core/common.py(1064)_possibly_downcast_to_dtype():

ipdb> result
array([  1.,   4.,   9.,  16.])
ipdb> result.astype(dtype)
array([ 1,  4,  8, 16])
ipdb> dtype
dtype('int64')
ipdb> 

but back at in the interpreter:

In [10]: a.astype(np.int64)
Out[10]: array([ 1,  4,  9, 16])

In [11]: a = np.array([1., 4., 9., 16.])

In [12]: a.astype(np.int64)
Out[12]: array([ 1,  4,  9, 16])

@jreback
Copy link
Contributor

jreback commented Oct 10, 2013

This is a precision issue

array([  1.,   4.,   9.,  16.])
(Pdb) p result[0]
1.0
(Pdb) p result[1]
4.0
(Pdb) p result[2]
9.0000000000000036
(Pdb) p result[3]
16.0

thus this array is NOT equal to array([1,4,9,16])

thus should not be downcasted (though you can make a case that it close 'enough') to be....

(Pdb) result == new_result
array([ True,  True, False,  True], dtype=bool)
(Pdb) result.round(8) == new_result
array([ True,  True,  True,  True], dtype=bool)

should we round when trying to downcast to int?

@jreback
Copy link
Contributor

jreback commented Oct 10, 2013

I think I should just do allclose with the default tolerances (1e-5,1e-8).....

@TomAugspurger
Copy link
Contributor

Fair enough. And users can override that with s.interpolate(…, infer=False) right? Where would the necessary changes need to be made?

@TomAugspurger
Copy link
Contributor

Or were you saying allclose for the test?

@jreback
Copy link
Contributor

jreback commented Oct 10, 2013

yes...they can specify infer=False to turn off downcasting; I am going to put up a PR to basically use allclose to figure out if the values are downcastable, so going to change your test.

@jreback
Copy link
Contributor

jreback commented Oct 10, 2013

see #5177 I think that should do it

@jreback
Copy link
Contributor

jreback commented Oct 10, 2013

@TomAugspurger see if you think you need tests with infer=False (you may not)....

@yarikoptic
Copy link
Contributor

jsut did on v0.12.0-993-gda89834

======================================================================
FAIL: test_interp_quad (pandas.tests.test_generic.TestSeries)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_generic.py", line 483, in test_interp_quad
    assert_series_equal(result, expected)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/util/testing.py", line 416, in assert_series_equal
    assert_attr_equal('dtype', left, right)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/util/testing.py", line 399, in assert_attr_equal
    assert_equal(left_attr,right_attr,"attr is not equal [{0}]" .format(attr))
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/util/testing.py", line 382, in assert_equal
    assert a == b, "%s: %r != %r" % (msg.format(a,b), a, b)
AssertionError: attr is not equal [dtype]: dtype('float64') != dtype('int64')

@jreback
Copy link
Contributor

jreback commented Oct 28, 2013

@yarikoptic can you show ci/print_versions?

@jreback jreback reopened this Oct 28, 2013
@yarikoptic
Copy link
Contributor

$> ci/print_versions.py 

INSTALLED VERSIONS
------------------
Python: 2.7.5.final.0
OS: Linux 3.9-1-amd64 #1 SMP Debian 3.9.8-1 x86_64
byteorder: little
LC_ALL: None
LANG: en_US

pandas: 0.12.0.dev-09e62f5
Cython: 0.19.1
Numpy: 1.7.1
Scipy: 0.12.0
statsmodels: 0.6.0.dev-d11bf99
    patsy: 0.1.0+dev
scikits.timeseries: Not installed
dateutil: 1.5
pytz: 2012c
bottleneck: Not installed
PyTables: 2.4.0
    numexpr: 2.0.1
matplotlib: 1.3.1
openpyxl: 1.6.1
xlrd: 0.9.2
xlwt: 0.7.4
xlsxwriter: Not installed
sqlalchemy: 0.8.2
lxml: 3.2.0
bs4: 4.2.1
html5lib: 0.95-dev
bigquery: Not installed
apiclient: 1.2

@jtratner
Copy link
Contributor

I've seen this consistently on OSX for the last week and a half as well.

@jreback
Copy link
Contributor

jreback commented Oct 28, 2013

what kind of machine is this/linux kernel?

@jreback
Copy link
Contributor

jreback commented Oct 28, 2013

@jtratner can you see what this does?

In [11]: np.allclose(np.array([9.0005]),np.array([9.]))
Out[11]: False

In [12]: np.allclose(np.array([9.00005]),np.array([9.]))
Out[12]: True

maybe need to put an argument there

@jtratner
Copy link
Contributor

yes, won't have access until tonight.

On Mon, Oct 28, 2013 at 11:58 AM, jreback [email protected] wrote:

@jtratner https://github.com/jtratner can you see what this does?

In [11]: np.allclose(np.array([9.0005]),np.array([9.]))
Out[11]: False

In [12]: np.allclose(np.array([9.00005]),np.array([9.]))
Out[12]: True

maybe need to put an argument there


Reply to this email directly or view it on GitHubhttps://github.com//issues/5174#issuecomment-27224616
.

@TomAugspurger
Copy link
Contributor

@jreback

In [1]: np.allclose(np.array([9.0005]),np.array([9.]))
Out[1]: False

In [2]: np.allclose(np.array([9.00005]),np.array([9.]))
Out[2]: True

But I haven't sorted out my failing scipy tests due to precision errors, so I'm not sure how reliable my results are.

@jreback
Copy link
Contributor

jreback commented Oct 28, 2013

@TomAugspurger are you showing this failure as well?

@TomAugspurger
Copy link
Contributor

Yep. The way I had it written originally (with expected as a float) passed on my system.

Should I change expected to a float and set infer=False for this test?

@jreback
Copy link
Contributor

jreback commented Oct 28, 2013

can you step thru and see why its not coercing? (it does on my system and on travis),

put a break in com._possibly_downcast_to_dtype it SHOULD coerce to int64 in the existing test, lmk where it returns

@TomAugspurger
Copy link
Contributor

I think that's what I posted up here When it tried to downcast the result' the 9 got flipped to an 8.

Let me know if you were asking for something different.

@jreback
Copy link
Contributor

jreback commented Oct 28, 2013

@TomAugspurger ahh...I c ....that is very odd....why would numpy flip the 9 float to an 8...(and only on mac)...

I guess let's just change the test, e.g. infer=False and compare vs float....can you do a quick PR for that?

@jreback
Copy link
Contributor

jreback commented Oct 28, 2013

@yarikoptic the fix @TomAugspurger put in should fix this problem....pls let us know and of course any other issues

@jtratner
Copy link
Contributor

are you sure it's not a broader downcasting problem? and if it's a numpy
issue we should give them a heads-up. In particular, should test with numpy
1.6 and also Python 3 + numpy 1.7.

@jreback
Copy link
Contributor

jreback commented Oct 28, 2013

see toms example from above

I think it's a numpy bug (maybe only on Mac/Linux that's similar)

it's the astype which fails on precision (I think)

@jtratner
Copy link
Contributor

I saw that, just wasn't clear whether the repr sometimes rounds values or
something...

@jtratner
Copy link
Contributor

Also, the error itself seems confusing, given that it's not reproducible in
the interpreter...

@jreback
Copy link
Contributor

jreback commented Oct 28, 2013

I think it's 8.99999995 or something is getting astyped to 8 maybe I should round to like 5 decimal places first then astype

@jtratner
Copy link
Contributor

So astype just floors it?

@jreback
Copy link
Contributor

jreback commented Oct 28, 2013

prob

I will something that can test with

@TomAugspurger
Copy link
Contributor

I wasn't sure what to title this issue. You're right that it's probably a numpy
Issue. But for our part it's more just a reminder to remove the fix I just put in
On the interpolation test. I'll try to test it tomorrow.

-Tom

On Oct 28, 2013, at 16:48, "jreback" <[email protected]mailto:[email protected]> wrote:

prob

I will something that can test with


Reply to this email directly or view it on GitHubhttps://github.com//issues/5174#issuecomment-27260245.

@jtratner
Copy link
Contributor

@jreback yep, you're right about the issue, so rounding would resolve it.

In [25]: arr = np.array([8.5, 8.6, 8.7, 8.8, 8.9999999999995])

In [26]: arr
Out[26]: array([ 8.5,  8.6,  8.7,  8.8,  9. ])

In [27]: arr.astype(int)
Out[27]: array([8, 8, 8, 8, 8])

@jreback
Copy link
Contributor

jreback commented Oct 28, 2013

yep it's an easy fix ;I think I had it at one point),but took it out as I assumed astype did round and not floor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Testing pandas testing functions or related to the test suite
Projects
None yet
5 participants