-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
strange dtype behaviour as function of series length #7332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this is really odd, happens with basically anything in the _TYPE_MAP. I think easiest just to hash by the name instead of an object. maybe some kind of translation issue with the hashing (e.g. _TYPE_MAP is actually populated with numpy c-dtype definitions). |
For me it doesn't seem to happen in the same way with int64, but I agree that using the name should work here. The problem with that is that as long as Series can wind up with dtypes which look like the numpy dtypes but aren't equal to the numpy version it's hard to trust dtype checks anywhere in the code. Where we're only doing a fastpath check we should still get the right answer, but we could have unexpected and hard-to-track down coercion issues elsewhere. |
yep...going to bench and fix, should be pretty straightforward |
@jreback: will do, but I admit to still being a little puzzled about what's going on. I don't understand how we wind up with I wasn't able to come up with a pure numpy demo, but I think there might be one. (PS: note that my issues were with int32, not int64, so it's probably build-dependent.) |
gr8. I agree. somehow the actual object |
well the hash on the np.dtype is DIFFERENT, really really odd
more test.py
|
hmm....i'll bet numexpr has a different hash for its maybe no guarantees on that (although it IS odd) |
@dsm054 should these be the same (the bottom result)?
For the floats the same though
|
cross posted to numexpr: numpy: |
side issue, I think you have a nice soln for this: http://stackoverflow.com/questions/24044492/python-pandas-transforming-moving-values-from-diagonal?noredirect=1#comment37072272_24044492 (maybe add to cookbook)? |
If it's a numexpr thing, that might explain why I couldn't find a purely numpy-based example.. And yes, I'd argue that the two objects should be equal so as not to drive end users bonkers, and two equal objects have to have the same hash or dictionaries won't work. The |
Found when tracking down what was going on with this question about performance.
First the case that makes sense:
Now let's increase the length of the series.
.. wait, what?
We've now got a new
numpy.int32
type floating around, not equal to the one innumpy
. The crossover seems to be at 10k:ISTM that this lack of recognition of the dtype as in
_TYPE_MAP
prevents the early exit from being taken ininfer_dtype
upon recognition that it's an integer dtype, and that slows things down considerably.The text was updated successfully, but these errors were encountered: