sym_diff failure on 3.4? #6444

dsm054 · 2014-02-22T17:42:08Z

test_symmetric_diff is failing after trying out pandas on a fresh 3.4 pull. It passes on 3.3 with the same numpy trunk version. It seems to have to do with the nan section:

>>> from pandas import Index
>>> import numpy as np
>>> idx1 = Index([1, 2, np.nan])
>>> idx2 = Index([0, 1, np.nan])
>>> result = idx1.sym_diff(idx2)
>>> expected = Index([0.0, np.nan, 2.0, np.nan])  # oddness with nans
>>> nans = pd.isnull(expected)
>>> result
Float64Index([0.0, nan, nan, 2.0], dtype='object')
>>> expected
Float64Index([0.0, nan, 2.0, nan], dtype='object')
>>> nans
array([False,  True, False,  True], dtype=bool)
>>> result[nans]
Float64Index([nan, 2.0], dtype='object')

Version info (basically a fresh 3.4 build with only numpy dev installed):

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.0.candidate.1
python-bits: 32
OS: Linux
OS-release: 3.8.0-35-generic
machine: i686
processor: i686
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8

pandas: 0.13.1-282-g9564ead
Cython: None
numpy: 1.9.0.dev-2d6ea6e
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2013.9
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

The text was updated successfully, but these errors were encountered:

jreback · 2014-02-22T17:46:47Z

cc @TomAugspurger

yeh...Tom will have a look when he tries out 3.4

its has to do with nan ordering. weird

jreback · 2014-02-22T17:48:31Z

@dsm054 you can take a look if you'd like as well!

right now I only tests 3.4 on windows...and has been coming up since this was merged in

#6016

dsm054 · 2014-02-22T17:53:50Z

Not sure how I missed that in the search. Should we close this as a dup?

jreback · 2014-02-22T17:58:04Z

wasn't a 'direct' issue, just a comment...so this is fine

TomAugspurger · 2014-02-22T17:59:45Z

I haven't been able to get a 3.4 virtualenv running yet. Does machine with 3.4 have a newer version of numpy as well?

On Feb 22, 2014, at 11:46 AM, "jreback" <[email protected]mailto:[email protected]> wrote:

cc @TomAugspurgerhttps://github.com/TomAugspurger

yeh...Tom will have a look when he tries out 3.4

its has to do with nan ordering. weird

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/6444#issuecomment-35808943.

dsm054 · 2014-02-22T18:00:43Z

What's the "expected" (= desired, really) behaviour for sorting an array with nans?

jreback · 2014-02-22T18:01:49Z

@TomAugspurger you can use numpy 1.8.

jreback · 2014-02-22T18:07:52Z

hmm...if you sort with a stable (mergesort), it should leave the nans alone (and you can't tell if 2 nans switch place). though IIRC their was an issue on the sort ordering....

dsm054 · 2014-02-22T23:45:13Z

I had another look at this and I'm starting to think there's nothing wrong here. It's simply that in Python 3.3 we have

>>> set(Float64Index([0.0, np.nan, np.nan, 2.0], dtype='object'))
{0.0, nan, 2.0, nan}

and in 3.4 we have

>>> set(Float64Index([0.0, np.nan, np.nan, 2.0], dtype='object'))
{0.0, nan, nan, 2.0}

and we were never promised otherwise. If sorting preserves nan location (although annoyingly, the presence of nans breaks the sorting even of the non-nan elements in both lists and ndarrays), then since sets are unordered, we have no reason to expect our expected result. If we want to impose Series-style "push nans to the back" sorting, we can, but right now I think it's just that the test is too sensitive.

TomAugspurger · 2014-02-23T00:25:43Z

Agreed. I think I'll rewrite the test to make sure the count of the nans is correct, and ignore the order.

jreback · 2014-02-23T00:40:25Z

ok

I suspect maybe 3.4 changed some sort of hashing scheme though
eg they are now more pseudo random whereas 3.3 it is not turned on

this is some of security thing I think

you can prob test this by seeing if the ordering of a dict keys is the same in 3.3 vs 3.4

jreback · 2014-02-23T00:48:02Z

http://bugs.python.org/issue19183

is prob the culprit

dsm054 · 2014-02-23T01:51:07Z

I haven't been following the Hash Randomization Wars(tm), but if we're implicitly relying on a fixed set order, we've made a wrong step regardless of whether or not we actually observe a failure.

jreback · 2014-02-23T02:01:54Z

the issue is that a float64index doesn't sort the same when it had Nan's , and the set operations sort the results

need to fix this test so that it doesn't rely on the index sort order
need to doc that Nan's in a float64index can change the order
think about if Nan's should actually not be used in a float64index and instead use s NaT like value to represent

jreback · 2014-02-23T02:08:46Z

related is #6194

jreback added Testing labels Feb 22, 2014

jreback added this to the 0.14.0 milestone Feb 22, 2014

TomAugspurger mentioned this issue Feb 23, 2014

BUG/TST: sorting of NaNs on sym_diff #6453

Merged

jreback closed this as completed in #6453 Feb 23, 2014

pijucha mentioned this issue Jun 26, 2016

BUG/PERF: Sort mixed-int in Py3, fix Index.difference #13514

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sym_diff failure on 3.4? #6444

sym_diff failure on 3.4? #6444

dsm054 commented Feb 22, 2014

jreback commented Feb 22, 2014

jreback commented Feb 22, 2014

dsm054 commented Feb 22, 2014

jreback commented Feb 22, 2014

TomAugspurger commented Feb 22, 2014

dsm054 commented Feb 22, 2014

jreback commented Feb 22, 2014

jreback commented Feb 22, 2014

dsm054 commented Feb 22, 2014

TomAugspurger commented Feb 23, 2014

jreback commented Feb 23, 2014

jreback commented Feb 23, 2014

dsm054 commented Feb 23, 2014

jreback commented Feb 23, 2014

jreback commented Feb 23, 2014

sym_diff failure on 3.4? #6444

sym_diff failure on 3.4? #6444

Comments

dsm054 commented Feb 22, 2014

jreback commented Feb 22, 2014

jreback commented Feb 22, 2014

dsm054 commented Feb 22, 2014

jreback commented Feb 22, 2014

TomAugspurger commented Feb 22, 2014

dsm054 commented Feb 22, 2014

jreback commented Feb 22, 2014

jreback commented Feb 22, 2014

dsm054 commented Feb 22, 2014

TomAugspurger commented Feb 23, 2014

jreback commented Feb 23, 2014

jreback commented Feb 23, 2014

dsm054 commented Feb 23, 2014

jreback commented Feb 23, 2014

jreback commented Feb 23, 2014