BUG: Make lib.maybe_convert_objects work with uint64 #4845

jtratner · 2013-09-15T15:32:48Z

Only when it's greater than uint64 max (and not negative, etc.)

jreback · 2013-09-15T15:37:11Z

unit64 should already be preserved in block manager
you don't need a separate type

jtratner · 2013-09-15T15:46:44Z

it's not, snippet from form_block:

        elif issubclass(v.dtype.type, np.integer):
            if v.dtype == np.uint64:
                # HACK #2355 definite overflow
                if (v > 2 ** 63 - 1).any():
                    object_items.append((i, k, v))
                    continue
            int_items.append((i, k, v))

jtratner · 2013-09-15T15:47:53Z

you know what, i think it's because of the pathway through, so I identified the wrong error here. Sorry, still getting used to the internals.

Series(np.array([5], dtype='uint64')) comes out as uint

jtratner · 2013-09-15T15:50:55Z

here's the problem:

>>> import sys
>>> uint_arr = np.array([sys.maxint + 5], dtype='uint64')
>>> ser = Series(uint_arr)
>>> ser.dtype
dtype('uint64')
>>> df = DataFrame([uint_arr])
>>> df.dtypes
0    object
dtype: object

jtratner · 2013-09-15T15:51:55Z

But it does work if not in a list:

In [16]: df = DataFrame(uint_arr)
In [18]: df.dtypes
Out[18]:
0    uint64
dtype: object

Time to go a-searching.

jreback · 2013-09-15T16:18:55Z

the problem is in interleave_dtypes it's a bit tricky

jtratner · 2013-09-15T16:21:47Z

@jreback well, you can just take that set of lines out and it works...but then it overflows under certain ops

What's supposed to happen if you do this?

ser = Series(np.array([5], dtype='uint64'))
ser - 10

Should pandas handle that for you and try to promote? or should it just overflow?

jtratner · 2013-09-15T16:22:50Z

Right now it ends up with (on current master)

0    18446744073709551611
Name: 0, dtype: uint64

jtratner · 2013-09-15T16:24:51Z

The test suite implies that, at least with dataframe, it's not supposed to overflow (subtracting a uint column such that it goes negative) :

 #4414 ish in pandas/tests/test_frame.py
            # vs mix int
            if op in ['add','sub','mul']:
                result = getattr(self.mixed_int, op)(2 + self.mixed_int)
                exp = f(self.mixed_int, 2 + self.mixed_int)

                # overflow in the uint
                dtype = None
                if op in ['sub']:
                    dtype = dict(B = 'object', C = None)

jreback · 2013-09-15T16:40:35Z

uint64 are very funny
the problem is you just want overflow to bubble up if it happens (which may not be happening )

jtratner · 2013-09-15T16:59:53Z

Well, right now overflow doesn't bubble up. And it doesn't bubble up in numpy either. I'd vote to not bubble up on overflow like numpy.

jtratner · 2013-09-15T17:03:52Z

@jreback this should pass now. Behavior is to allow overflow, keep dtype as uint64. (which I believe is same behavior as numpy)

wesm · 2013-09-15T21:01:21Z

I'd like to review this before merging...I had to deal with this myself in a different context recently

jtratner · 2013-09-15T21:02:26Z

Yeah, definitely, wasn't planning to merge until you looked at it.

jtratner · 2013-09-15T23:00:14Z

added some more test cases just to make sure I covered all the branches that interact with uint64.

wesm · 2013-09-17T22:36:38Z

Okay. I don't think that returning uint64 for the case that you have all positive integers is a good idea. Here's why:

In [1]: np.array([1,2 ,3,4,5], dtype=np.uint64) - 5
Out[1]: 
array([18446744073709551612, 18446744073709551613, 18446744073709551614,
       18446744073709551615,                    0], dtype=uint64)

Instead, I recommend that PyLong objects only be bounds checked to see if they are over INT64_MAX, in other words:

In [2]: np.iinfo(np.int64).max
Out[2]: 9223372036854775807

In [3]: np.iinfo(np.int64).max + 1
Out[3]: 9223372036854775808L

not so bad right? Just have to be careful with doing a PyLong_Check(obj) and only checking for overflow in such cases. maybe you want to also write a unit test containing numpy.uint64 scalar values so you know you catch that one, since they will not typecheck as PyLong

In [5]: np.uint64.mro()
Out[5]: 
[numpy.uint64,
 numpy.unsignedinteger,
 numpy.integer,
 numpy.number,
 numpy.generic,
 object]

if something exceeds INT64_MAX, you should enter another function that attempts to put the integers into a uint64 array -- if it cannot the original object array is returned.

pls ping me again when this is ready to have a look at again!

jtratner · 2013-09-17T23:08:55Z

thanks for the feedback -- I agree that uint64 is tricky with all the overflow issues, but I'm confused about the difference between what you're suggesting and how the changes work now (or maybe you're just suggesting different internal architecture). Right now, it only tries to make uint if it overflows.

>>> lib.maybe_convert_objects(np.array([5, 3, 2**63 + 15], dtype=object))
array([                  5,                   3, 9223372036854775823], dtype=uint64)

whereas if it's all integers, comes out as integer

>>> lib.maybe_convert_objects(np.array([1, 2, 3], dtype=object))
array([1, 2, 3])
>>> _.dtype
np.dtype('int64')

But what about the case where you end up with a long > INT64_MAX? Still 'uint64'? Or try for something like doubledouble or longdouble or something?

That said, it might just make sense to wrap the entire thing in a try/except clause, try to put longs in int64, and if it fails with an OverflowError, then try to make an array of uint64 instead. (which would simplify the looping code too by removing the try suite from every iteration.

When it's greater than uint64 max (and not negative, etc.)

remove extraneous typecheck against uint64 better dtype checks in test_frame update tests to reflect actual use of uint64, etc

jreback · 2013-09-30T12:21:05Z

pushed the issue & PR to 0.14

ghost · 2014-01-03T01:41:29Z

@jtratner, this is in limbo. are you waiting for input from wes or can this go forward?

jtratner · 2014-01-03T02:04:24Z

I don't remember :P I'll take another look at what I wrote up and the
thread. Thanks for the ping :)

jreback · 2014-02-18T20:04:50Z

@jtratner closing for now...pls reopen if you are continuing work on this

Adds handling for uint64 objects during conversion. When negative numbers and uint64 are detected, we then convert the result to object. Picks up where pandas-devgh-4845 left off. Closes pandas-devgh-4471.

Adds handling for `uint64` objects during conversion. When negative numbers and `uint64` are detected, we then convert the result to `object`. Picks up where #4845 left off. Closes #4471. Author: gfyoung <[email protected]> Closes #14916 from gfyoung/convert-objects-uint64 and squashes the following commits: ed325cd [gfyoung] BUG: Convert uint64 in maybe_convert_objects

Adds handling for `uint64` objects during conversion. When negative numbers and `uint64` are detected, we then convert the result to `object`. Picks up where pandas-dev#4845 left off. Closes pandas-dev#4471. Author: gfyoung <[email protected]> Closes pandas-dev#14916 from gfyoung/convert-objects-uint64 and squashes the following commits: ed325cd [gfyoung] BUG: Convert uint64 in maybe_convert_objects

jtratner added 3 commits September 17, 2013 19:22

BUG: lib.maybe_convert_objects work with uint64

86cd657

When it's greater than uint64 max (and not negative, etc.)

TST: Add test cases for lib.maybe_convert_objects directly

4d64644

remove extraneous typecheck against uint64 better dtype checks in test_frame update tests to reflect actual use of uint64, etc

Try using a separate uint64 function

672fccd

jreback closed this Feb 18, 2014

gfyoung mentioned this pull request Dec 19, 2016

BUG: Convert uint64 in maybe_convert_objects #14916

Closed

Uh oh!

BUG: Make lib.maybe_convert_objects work with uint64 #4845

BUG: Make lib.maybe_convert_objects work with uint64 #4845

Uh oh!

Conversation

jtratner commented Sep 15, 2013

Uh oh!

jreback commented Sep 15, 2013

Uh oh!

jtratner commented Sep 15, 2013

Uh oh!

jtratner commented Sep 15, 2013

Uh oh!

jtratner commented Sep 15, 2013

Uh oh!

jtratner commented Sep 15, 2013

Uh oh!

jreback commented Sep 15, 2013

Uh oh!

jtratner commented Sep 15, 2013

Uh oh!

jtratner commented Sep 15, 2013

Uh oh!

jtratner commented Sep 15, 2013

Uh oh!

jreback commented Sep 15, 2013

Uh oh!

jtratner commented Sep 15, 2013

Uh oh!

jtratner commented Sep 15, 2013

Uh oh!

wesm commented Sep 15, 2013

Uh oh!

jtratner commented Sep 15, 2013

Uh oh!

jtratner commented Sep 15, 2013

Uh oh!

wesm commented Sep 17, 2013

Uh oh!

jtratner commented Sep 17, 2013

Uh oh!

jreback commented Sep 30, 2013

Uh oh!

ghost commented Jan 3, 2014

Uh oh!

jtratner commented Jan 3, 2014

Uh oh!

jreback commented Feb 18, 2014

Uh oh!

Uh oh!