-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DataFrame.equals should not care about block order (GH #9330) #9745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lgtm, pls add a release note! |
5e1da3c
to
cd6dc68
Compare
worth adding that test that round trips to hdf (in the example issue)? |
Yeah, makes sense. [I'm hoping that using |
put in io/tests/test_pytables.py yes I think u have to use dtype.name the prob is (and maybe want to include a categorical) is that a categorical dtype is not safe to compare to a numpy dtype (but the name is) side issue is that if u have 2 cat blocks (cats are block separated, 1 cat per) thrn I sm not sure how they sort |
Urf. Getting the categorical stuff right is going to be a bit tricky. Can I assume that NaN will never be an element in a category? I would have thought it would be disallowed entirely but I don't understand the output when you build one containing one. |
@dsm054 you can have a nan in a category
|
Blek. What's the right way to handle putting the categorical blocks -- which won't coalesce even if the categories are the same -- in a canonical order? |
so, I would compare all 'regular' blocks (e.g. non-Categorical/non-Sparse) like you are doing. But these need special handling. I think you can order by the mgr block order. These map to how the it is represented in the actual DataFrame, and so will be the same. You could actually do this for all blocks I think.
|
518feaf
to
8bde0f8
Compare
8bde0f8
to
18f25e4
Compare
@jreback: okay, I've got a version which passes both my tests and the OP's original case, by sorting on a (block type name, mgr_locs) tuple. I don't understand how the mgr blocks work well enough to judge that side of things, unfortunately, so I don't know whether matching non-consolidated blocks can be in the wrong order even after this. Currently passing everywhere except for what looks like an unrelated library build error; probably it'll work the next time it rebuilds. |
hmm, can you add a test with 2 categoricals (to your mixed test case) |
Beyond |
sorry missed that part of the test |
closes #9330 and another version of the same problem here on SO by canonicalizing the block order during an
equals
comparison.Tested at the block manager level and above it at the frame level.