Skip to content

BUG: invalid coercion raises in groupby uniques #14758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Nov 28, 2016 · 6 comments
Closed

BUG: invalid coercion raises in groupby uniques #14758

jreback opened this issue Nov 28, 2016 · 6 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Groupby
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Nov 28, 2016

df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], dtype=object) + 9223372036854775807
df.groupby(0).sum()

...
C:\Anaconda\envs\bmc3\lib\site-packages\pandas\core\groupby.py in labels(self)
   2252     def labels(self):
   2253         if self._labels is None:
-> 2254             self._make_labels()
   2255         return self._labels
   2256

C:\Anaconda\envs\bmc3\lib\site-packages\pandas\core\groupby.py in _make_labels(self)
   2264         if self._labels is None or self._group_index is None:
   2265             labels, uniques = algos.factorize(self.grouper, sort=self.sort)
-> 2266             uniques = Index(uniques, name=self.name)
   2267             self._labels = labels
   2268             self._group_index = uniques

C:\Anaconda\envs\bmc3\lib\site-packages\pandas\indexes\base.py in __new__(cls, data, dtype, copy, name, fastpath, tupleize_cols, **kwargs)
    234                 if inferred == 'integer':
    235                     from .numeric import Int64Index
--> 236                     return Int64Index(subarr.astype('i8'), copy=copy,
    237                                       name=name)
    238                 elif inferred in ['floating', 'mixed-integer-float']:

OverflowError: int too big to convert

We are inferring int when the this should be 'object' (or just not integer):

In [36]: pd.lib.infer_dtype(np.array([9223372036854775808, 9223372036854775810, 9223372036854775812], dtype=object))
Out[36]: 'integer'
@jreback jreback added Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions Groupby labels Nov 28, 2016
@jreback jreback added this to the Next Major Release milestone Nov 28, 2016
@jreback
Copy link
Contributor Author

jreback commented May 31, 2017

this works on master if someone wants to do a test.

@jreback jreback modified the milestones: 0.20.2, Next Major Release May 31, 2017
@jreback
Copy link
Contributor Author

jreback commented May 31, 2017

cc @gfyoung

@jreback jreback modified the milestones: 0.21.0, 0.20.2 May 31, 2017
@gfyoung
Copy link
Member

gfyoung commented May 31, 2017

We are inferring int when the this should be 'object' (or just not integer):

Why should we inferring object? It's perfectly acceptable as uint64- this is yet another example of how we don't have great support for uint64.

@gfyoung
Copy link
Member

gfyoung commented May 31, 2017

I believe the correct return dtype should be uint64, as illustrated by the fact that we get an int when adding a smaller number:

>>> df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], dtype=object) + 9
>>> df.groupby(0).sum().dtypes
1    int64
dtype: object

@jreback
Copy link
Contributor Author

jreback commented May 31, 2017

this was before uint64 support that's why it works now

just need a test to verify

i bet this will break though with a number out of range for uint64 though (but maybe separate issue)

@gfyoung
Copy link
Member

gfyoung commented Jun 11, 2017

this was before uint64 support that's why it works now

Ah, yes, confirmed. I'll add this as a test.

i bet this will break though with a number out of range for uint64 though (but maybe separate issue)

That's probably a separate issue. Handling numbers beyond what we can capably hold is going to very difficult in any case.

gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 11, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 11, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 11, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 11, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Groupby
Projects
None yet
Development

No branches or pull requests

2 participants