Skip to content

ENH: support MultiIndex and tuple hashing #15224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jan 25, 2017

on top of #15216

closes #15227

@jreback
Copy link
Contributor Author

jreback commented Jan 25, 2017

cc @mikegraham

I separated this from the prior PR.

yes my goal here is to effectively emulate hash(tuple) but doing it efficiently (e.g. we already have the arrays in a packed format, turning into python objects, just to compute a tuple hash is very inefficient).

@jreback
Copy link
Contributor Author

jreback commented Jan 25, 2017

@mikegraham
ok I incorporated your commit (with a slight modification) and now passes tests!

@jreback jreback force-pushed the mi_hash2 branch 3 times, most recently from 3bd237f to 157dcd2 Compare January 25, 2017 21:26
@codecov-io
Copy link

codecov-io commented Jan 25, 2017

Current coverage is 86.32% (diff: 100%)

Merging #15224 into master will increase coverage by 0.01%

@@             master     #15224   diff @@
==========================================
  Files           139        139          
  Lines         51096      51140    +44   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          44103      44148    +45   
+ Misses         6993       6992     -1   
  Partials          0          0          

Powered by Codecov. Last update be32852...8b1d3f9

@jreback jreback added this to the 0.20.0 milestone Jan 25, 2017
@mikegraham
Copy link
Contributor

There were a couple memory improvements in 187573b over the commit as you cherry-picked it, I think.

@jreback
Copy link
Contributor Author

jreback commented Jan 25, 2017

@mikegraham ahh ok, I will pick-again. thanks!

@jreback
Copy link
Contributor Author

jreback commented Jan 26, 2017

@mikegraham ok fixed.

Note that this obviously returns uint64, so it is not quite compat with the tuplehashing in python?

@mikegraham
Copy link
Contributor

I stole the algorithm, but it's not compatible. I also iterate over the members in forward error rather than reverse...I don't think that is substantive.

@jreback
Copy link
Contributor Author

jreback commented Jan 26, 2017

cc @mikegraham any final comments?

@jorisvandenbossche

@mikegraham
Copy link
Contributor

mikegraham commented Jan 26, 2017

We might want to do something about this behavior.

>>> pandas.tools.hashing.hash_pandas_object(pandas.DataFrame(), index=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/tools/hashing.py", line 91, in hash_pandas_object
    h = _combine_hash_arrays(hashes, num_items)
  File "pandas/tools/hashing.py", line 19, in _combine_hash_arrays
    first = next(arrays)
StopIteration```

@jreback
Copy link
Contributor Author

jreback commented Jan 26, 2017

@mikegraham updated

@jreback jreback closed this in c67486f Jan 27, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
closes pandas-dev#15227

Author: Jeff Reback <[email protected]>
Author: Mike Graham <mikegraham2gmail.com>

Closes pandas-dev#15224 from jreback/mi_hash2 and squashes the following commits:

8b1d3f9 [Jeff Reback] not correctly hashing categorical in a MI
48a2402 [Jeff Reback] support for mixed type arrays
58f682d [Jeff Reback] memory optimization
0c13df7 [Mike Graham] Steal the algorithm used to combine hashes from tupleobject.c
e8dd607 [Jeff Reback] add hash_tuples
44e9c7d [Mike Graham] wipSteal the algorithm used to combine hashes from tupleobject.c
e507c4a [Jeff Reback] ENH: support MultiIndex and tuple hashing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants