Skip to content

pd.DataFrame converts np.uint64 greater than 2**63-1 to objects #11846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ily9000 opened this issue Dec 15, 2015 · 3 comments
Closed

pd.DataFrame converts np.uint64 greater than 2**63-1 to objects #11846

ily9000 opened this issue Dec 15, 2015 · 3 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions

Comments

@ily9000
Copy link

ily9000 commented Dec 15, 2015

My question is based on this post I found:
http://stackoverflow.com/questions/34283319/pandas-converts-large-unsigned-integers-to-object-types

The maximum value for np.uint64 is (2^64)-1 so why does pandas convert np.uint64 types greater than 2^63-1 to objects when converting arrays to dataframes. pd.to_numeric() also fails to convert these values back to np.uint64.

@ily9000 ily9000 changed the title pandas convert np.uint64 greater than 2**63-1 to objects pandas converts np.uint64 greater than 2**63-1 to objects Dec 15, 2015
@ily9000 ily9000 changed the title pandas converts np.uint64 greater than 2**63-1 to objects pd.DataFrame converts np.uint64 greater than 2**63-1 to objects Dec 15, 2015
@jreback
Copy link
Contributor

jreback commented Dec 15, 2015

this is a dupe of this: #4471

essentially uint64 support in pandas is not well tested / buggy.

pull-request are for sure welcome. Not really sure what actual use-cases in the wild are for this type of integer.as IMHO it doesn't buy you much.

@jreback jreback closed this as completed Dec 15, 2015
@jreback jreback added the Dtype Conversions Unexpected or buggy dtype conversions label Dec 15, 2015
@aoz
Copy link

aoz commented Mar 22, 2016

@jreback FYI: I've just hit this issue while working on a real life, commercial project. My largish (few GB) datasets consist of uint64 elements, runs failed on some of them, and I tracked down the issue to this. Trying to find a workaround...

@pwaller
Copy link
Contributor

pwaller commented May 16, 2016

@jreback Re: you can't think of a use case — unfortunately I'm importing a dataset which already has uint64's in it. Not being able to use them is quite a big inconvenience, as I keep hitting this, even if I work around it!

I think this is really surprising behaviour, but I guess that's because DataFrame's storage is somehow different from that of a Series in a way I don't understand...

In [1]: import pandas as pd, numpy as np

In [2]: s = pd.Series(np.array([2**64 - 1]))

In [3]: s.dtype
Out[3]: dtype('uint64')

In [4]: df = pd.DataFrame({"s": s})

In [5]: df.dtypes
Out[5]: 
s    object
dtype: object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

4 participants