-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Make lib.maybe_convert_objects work with uint64 #4845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Make lib.maybe_convert_objects work with uint64 #4845
Conversation
unit64 should already be preserved in block manager |
it's not, snippet from form_block: elif issubclass(v.dtype.type, np.integer):
if v.dtype == np.uint64:
# HACK #2355 definite overflow
if (v > 2 ** 63 - 1).any():
object_items.append((i, k, v))
continue
int_items.append((i, k, v)) |
you know what, i think it's because of the pathway through, so I identified the wrong error here. Sorry, still getting used to the internals. Series(np.array([5], dtype='uint64')) comes out as uint |
here's the problem: >>> import sys
>>> uint_arr = np.array([sys.maxint + 5], dtype='uint64')
>>> ser = Series(uint_arr)
>>> ser.dtype
dtype('uint64')
>>> df = DataFrame([uint_arr])
>>> df.dtypes
0 object
dtype: object |
But it does work if not in a list: In [16]: df = DataFrame(uint_arr)
In [18]: df.dtypes
Out[18]:
0 uint64
dtype: object Time to go a-searching. |
the problem is in interleave_dtypes it's a bit tricky |
@jreback well, you can just take that set of lines out and it works...but then it overflows under certain ops What's supposed to happen if you do this? ser = Series(np.array([5], dtype='uint64'))
ser - 10 Should pandas handle that for you and try to promote? or should it just overflow? |
Right now it ends up with (on current master) 0 18446744073709551611
Name: 0, dtype: uint64 |
The test suite implies that, at least with dataframe, it's not supposed to overflow (subtracting a uint column such that it goes negative) : #4414 ish in pandas/tests/test_frame.py
# vs mix int
if op in ['add','sub','mul']:
result = getattr(self.mixed_int, op)(2 + self.mixed_int)
exp = f(self.mixed_int, 2 + self.mixed_int)
# overflow in the uint
dtype = None
if op in ['sub']:
dtype = dict(B = 'object', C = None) |
uint64 are very funny |
Well, right now overflow doesn't bubble up. And it doesn't bubble up in numpy either. I'd vote to not bubble up on overflow like numpy. |
@jreback this should pass now. Behavior is to allow overflow, keep dtype as uint64. (which I believe is same behavior as numpy) |
I'd like to review this before merging...I had to deal with this myself in a different context recently |
Yeah, definitely, wasn't planning to merge until you looked at it. |
added some more test cases just to make sure I covered all the branches that interact with uint64. |
Okay. I don't think that returning uint64 for the case that you have all positive integers is a good idea. Here's why:
Instead, I recommend that PyLong objects only be bounds checked to see if they are over INT64_MAX, in other words:
not so bad right? Just have to be careful with doing a
if something exceeds INT64_MAX, you should enter another function that attempts to put the integers into a uint64 array -- if it cannot the original object array is returned. pls ping me again when this is ready to have a look at again! |
thanks for the feedback -- I agree that uint64 is tricky with all the overflow issues, but I'm confused about the difference between what you're suggesting and how the changes work now (or maybe you're just suggesting different internal architecture). Right now, it only tries to make
whereas if it's all integers, comes out as integer
But what about the case where you end up with a long > INT64_MAX? Still 'uint64'? Or try for something like That said, it might just make sense to wrap the entire thing in a |
When it's greater than uint64 max (and not negative, etc.)
remove extraneous typecheck against uint64 better dtype checks in test_frame update tests to reflect actual use of uint64, etc
pushed the issue & PR to 0.14 |
@jtratner, this is in limbo. are you waiting for input from wes or can this go forward? |
I don't remember :P I'll take another look at what I wrote up and the |
@jtratner closing for now...pls reopen if you are continuing work on this |
Adds handling for uint64 objects during conversion. When negative numbers and uint64 are detected, we then convert the result to object. Picks up where pandas-devgh-4845 left off. Closes pandas-devgh-4471.
Adds handling for uint64 objects during conversion. When negative numbers and uint64 are detected, we then convert the result to object. Picks up where pandas-devgh-4845 left off. Closes pandas-devgh-4471.
Adds handling for uint64 objects during conversion. When negative numbers and uint64 are detected, we then convert the result to object. Picks up where pandas-devgh-4845 left off. Closes pandas-devgh-4471.
Adds handling for uint64 objects during conversion. When negative numbers and uint64 are detected, we then convert the result to object. Picks up where pandas-devgh-4845 left off. Closes pandas-devgh-4471.
Adds handling for uint64 objects during conversion. When negative numbers and uint64 are detected, we then convert the result to object. Picks up where pandas-devgh-4845 left off. Closes pandas-devgh-4471.
Adds handling for `uint64` objects during conversion. When negative numbers and `uint64` are detected, we then convert the result to `object`. Picks up where #4845 left off. Closes #4471. Author: gfyoung <[email protected]> Closes #14916 from gfyoung/convert-objects-uint64 and squashes the following commits: ed325cd [gfyoung] BUG: Convert uint64 in maybe_convert_objects
Adds handling for `uint64` objects during conversion. When negative numbers and `uint64` are detected, we then convert the result to `object`. Picks up where pandas-dev#4845 left off. Closes pandas-dev#4471. Author: gfyoung <[email protected]> Closes pandas-dev#14916 from gfyoung/convert-objects-uint64 and squashes the following commits: ed325cd [gfyoung] BUG: Convert uint64 in maybe_convert_objects
Fixes #4471.
Only when it's greater than uint64 max (and not negative, etc.)