Hashable DataFrames #3882

hayd · 2013-06-13T11:27:13Z

See this SO answer, they want to use memoisation.

OP points out this gets different results from (presumably it does it off id)

hash(pd.DataFrame([1,2,3]))

Should they be hashable or should hash raise? (does it defeat the point of hashing if hashing is expensive?) cc @cpcloud

The text was updated successfully, but these errors were encountered:

cpcloud · 2013-06-13T12:36:17Z

i've been thinking about this off and on. a somewhat related issue is that of the empty frame, i.e., DataFrame(). i think the ~~PandasObject~~ NDFrame should raise in all cases since that's what numpy does (overriding where it makes sense and is useful). i guess u could have the empty DataFrame be hashable but that seems like it's not worth the effort it would take to do, who needs to hash empty DataFrames?

jreback · 2013-06-13T12:41:52Z

series raises on __hash__ as should all NDFrame, because they are mutable hashing is meaningless. OTOH, index are hashable, as they are immutable

cpcloud · 2013-06-13T13:06:01Z

Indexes are currently not hashable, since they try to hash the underlying ndarray.

jreback · 2013-06-13T13:12:52Z

yes..you are right...oh well my argument is bad then!

hayd · 2013-06-13T13:28:05Z

Ah, you're right, I didn't even check series, it's just DataFrame which should raise.

Easy fix (raise __hash__ for generics) pr on the way.

cpcloud · 2013-06-13T15:12:38Z

could implement this for indices...thoughts?

cpcloud · 2013-06-13T15:13:49Z

in that case u should probably hash the name, number of levels, class, and dtype

jreback · 2013-06-13T15:17:52Z

still have the mutability issue

though I suppose if the user accepts this it would be nice to deal with it

I would table to 0.12 for now

hayd · 2013-06-13T15:27:48Z

So, at the moment I've put this in NDFrame.

Maybe it should go in PandasObject, and then have objects which should hash override it (like if we can get indices to hash using that clever method). Are there any besides Index/MultiIndex?

cpcloud · 2013-06-13T15:44:17Z

i vote for default to not hashable. better to alert the user to non-hashability rather than possibly giving misleading ideas about the hashability of things

jreback · 2013-06-13T15:45:39Z

agree....not hashability is/should be default until we change API

hayd · 2013-06-13T15:58:26Z

ok I've moved it to PandasObject, removes repeated code too. :)

hayd mentioned this issue Jun 13, 2013

FIX hash of DataFrame raises Typerror #3884

Merged

hayd closed this as completed in #3884 Jun 13, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hashable DataFrames #3882

Hashable DataFrames #3882

hayd commented Jun 13, 2013

cpcloud commented Jun 13, 2013

jreback commented Jun 13, 2013

cpcloud commented Jun 13, 2013

jreback commented Jun 13, 2013

hayd commented Jun 13, 2013

cpcloud commented Jun 13, 2013

cpcloud commented Jun 13, 2013

jreback commented Jun 13, 2013

hayd commented Jun 13, 2013

cpcloud commented Jun 13, 2013

jreback commented Jun 13, 2013

hayd commented Jun 13, 2013

Hashable DataFrames #3882

Hashable DataFrames #3882

Comments

hayd commented Jun 13, 2013

cpcloud commented Jun 13, 2013

jreback commented Jun 13, 2013

cpcloud commented Jun 13, 2013

jreback commented Jun 13, 2013

hayd commented Jun 13, 2013

cpcloud commented Jun 13, 2013

cpcloud commented Jun 13, 2013

jreback commented Jun 13, 2013

hayd commented Jun 13, 2013

cpcloud commented Jun 13, 2013

jreback commented Jun 13, 2013

hayd commented Jun 13, 2013