-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: union should not try to sort inplace because platform impls differ as to when sorting occurs for objects that cannot be compared #5266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jtratner Can you try this out? Thanks |
running a vbench ... likely a hit since this will now copy |
Actually, nothing huge:
|
Why not just sort on id if it can't sort on anything else? I.e.:
That way you don't have to copy for an edge case. |
It has to be dtype object already. |
I guess that's not necessarily stable/repeatable tho |
and it will copy on that kind of slice |
Definitely - I was just thinking you would put that in the except clause with TypeErrors so you're generally not copying where possible. |
But sorting by id is bad anyways, since it's not really deterministic between calls with non-interned values. |
But this does appear to fix the issue. 😄 |
the whole idea of catching a failed inplace sort is botched |
wait, no never mind. popped up again on a different thing now. |
arg we need mac testing |
can u put up the error |
Is there something in the test suite that's dependent upon randomness or order of instantiation? Weirdly it passes with FAIL: pandas.computation.tests.test_eval:TestAlignment.test_basic_series_frame_alignment('numexpr', 'python')
/usr/local/bin/vim +872 pandas/computation/tests/test_eval.py # check_basic_series_frame_alignment
testit(r_idx_type, c_idx_type, index_name)
/usr/local/bin/vim +867 pandas/computation/tests/test_eval.py # testit
assert_frame_equal(res, expected)
/usr/local/bin/vim +484 pandas/util/testing.py # assert_frame_equal
assert_index_equal(left.columns, right.columns)
/usr/local/bin/vim +364 pandas/util/testing.py # assert_index_equal
right.dtype))
AssertionError: [index] left [object Index([u'98ןן2הנ50פ', 2000-01-03 00:00:00, 2000-01-04 00:00:00, 2000-01-05 00:00:00, 2000-01-06 00:00:00, 2000-01-07 00:00:00, u'אוז91כיע5צ', u'י09לירג6אש', u'יא0ק0פלר31', u'ם8327יכסצנ', u'ןעץמףרן43ץ', u'קזהאצ5כצון'], dtype='object')], right [Index([u'98ןן2הנ50פ', 2000-01-04 00:00:00, 2000-01-05 00:00:00, 2000-01-06 00:00:00, u'י09לירג6אש', 2000-01-03 00:00:00, u'אוז91כיע5צ', u'יא0ק0פלר31', 2000-01-07 00:00:00, u'ם8327יכסצנ', u'ןעץמףרן43ץ', u'קזהאצ5כצון'], dtype='object') object] |
let me see ... |
Okay, now I have figured it out - it fails if you run it with "-A 'not network and not slow'". must be something to do with test ordering. |
that repr is screwed up too |
is that just a mistype ? or are those two consecutive commas a bug? |
Nope, that's wrong too, it just intermittently fails now - very confusing... I wonder if you bumped up the size of the objects it's generating - I bet you'd see it on your computer (double or triple them just for testing) |
sweet i now get the runtimewarning |
which of course makes sense |
I can't reproduce your most recent error. |
It only occurs about 60% of the time for me - I finally figured out that it was occuring semi-randomly by running it endlessly: from subprocess import call
[call(['nosetests', 'pandas.computation.tests.test_eval:TestAlignment.test_basic_series_frame_alignment']) for i in range(30)] |
@cpcloud what is this test even supposed to test?
|
@cpcloud how about this for union/sorting generally:
And then in the test suite, don't bother comparing the result if the index and column types aren't compatible. There's no point in spending a lot of time trying to test undefined behavior and figuring out why some routes produce different, still-undefined behavior. So I think this would just be checking that column and index are either equal or are an integer/float combo. |
Just that the operation in
There's only a "common sense" notion of "index and column types that aren't compatible" since indices are converted to object dtype if they are not the same dtype. The idea for these tests was just to make sure that operations that don't make sense do the same thing no matter how you're evaluating an expression. I suspect the reason they are happening differently is because Python space uses This is an actual issue because the result should be the same (it is on Linux AFAICT) but Mac OS seems to do something a bit different with joining as opposed to combining objects with Does that all make sense? |
@jtratner Can you show me the output of the following: import numpy as np
import pandas as pd
from pandas.util.testing import makeCustomDataframe as mkdf
f = lambda *args, **kwargs: np.random.randn()
df = mkdf(10, 7, data_gen_f=f, r_idx_type='i', c_idx_type='dt')
s = pd.Series(np.random.randn(5), df.index[:5])
x, y = df.align(s, axis=1)
print(y) |
That totally makes sense - - I was just thinking that there isn't really a Unfortunately, I just had to take my main mac for service so I have to get Maybe @TomAugspurger can take a look? |
@cpcloud status on this? |
want to try to fix this? |
Yep kind of fell by the wayside a bit. Bring on the version issues!! |
just had a duh moment, no need for copying. since we have access to the separate indexes we can copmare the first elements of each with the default sorting comparator ( |
YES this finally passed |
@jreback this passing locally can you review pls when u get a chance? thx |
by locally i mean passes on travis but hasn't been run on master |
looks ok |
well travis is randomly failing |
why is this continually failing? https://travis-ci.org/pydata/pandas/jobs/26625395 it passes every time on cpcloud/pandas branch but keeps not "finding" pandas |
…er as to when sorting occurs for objects that cannot be compared
any direct comparioson with int ndarrays? (e.g. need to explicity set |
looks ok otherwise |
There's comparison between single element ints and all other types in the case of union of non monotonic indices. Is that what you mean? |
But by that point they are object arrays. Not sure if that matters re casting the int types |
nvm looks ok then |
BUG: union should not try to sort inplace because platform impls differ as to when sorting occurs for objects that cannot be compared
on python 33-32, and 34-64 on windows (I may not have installed on some of the other py3 versions) looks like Odd that this doesn't break on travis though
|
ah darn ok. that should be caught by the |
i can repro this on linux yay |
i see ... if your indices are already mixed then there's nothing you can do...i'll just return results unsorted if |
yeh for sure just bail out (leave unsorted) |
closes #5039