BUG/DOC: DataFrame.values return type when uint64 is mixed with signed int types #10364

kawochen · 2015-06-16T12:44:52Z

DataFrame.values' doc mentions type promotion but leaves out that when uint64 is mixed with signed int types, the return type is int64, which can be surprising. Perhaps consider returning dtype(object) in internals._interleaved_dtype when uint64 and signed ints are present?

The text was updated successfully, but these errors were encountered:

jreback · 2015-06-17T10:39:34Z

so this could certainly have an expanded doc-string (and maybe a link to np.find_common_type in the Notes section, though it uses custom logic to avoid always upcasting to object).

Certainly _interleaved_dtype could be improved, though general support for uint64 is pretty limited in general (e.g. using it in numerics will automatically make this object). So in reaility it is a type to be avoided, unless you really really have large ints.

Author: Sašo Stanovnik <[email protected]> Closes #13917 from sstanovnik/fix-multitype-series-slice and squashes the following commits: 8c7d1ea [Sašo Stanovnik] Colon to comma. 057d56b [Sašo Stanovnik] Wording and code organization fixes. 926ca1e [Sašo Stanovnik] Fix a derp. 442b8c1 [Sašo Stanovnik] Whatsnew, issue tag, test reordering. 8d675ad [Sašo Stanovnik] Add tests for common dtypes, raises check for pandas ones. eebcb23 [Sašo Stanovnik] Moved multitype tests to sparse/tests/test_multitype.py ac790d7 [Sašo Stanovnik] Modify .values docs to process issue #10364. 2104948 [Sašo Stanovnik] Factor the common type discovery to an internal function. 6782bc7 [Sašo Stanovnik] Revert default argument change. 93d2de6 [Sašo Stanovnik] Modified the whatsnew message. 33973a5 [Sašo Stanovnik] Additional multitype tests. 114217e [Sašo Stanovnik] Infer dtype instead of forcing float in SparseArray. c7fb0f2 [Sašo Stanovnik] Use numpy to determine common dtypes. fb6237c [Sašo Stanovnik] Add a whatsnew note. 2e833fa [Sašo Stanovnik] BUG: multi-type sparse slicing fixes and improvements

petehuang · 2016-12-28T16:32:27Z

This was addressed via #13917 and should be closed. Thanks to @sstanovnik, @sinhrks and @jreback for the fix and @kawochen for the issue!

jreback · 2017-02-17T18:02:48Z

@gfyoung just noticed this a uint64 issue!

gfyoung · 2017-02-17T18:29:49Z

Hmm they seem to be everywhere 😀. Should double check that it's resolved.

gfyoung · 2017-02-18T04:07:05Z

@jreback : Okay, this is not resolved as I had thought. So when we call DataFrame.values, it defaults to float64 when int and uint are encountered together. I think we're sticking with object now?

jorisvandenbossche · 2017-02-18T11:58:28Z

This was indeed addressed in the linked PR, test has changed here: https://github.com/pandas-dev/pandas/pull/13917/files#diff-137c96159899927aedd9b37f0f7dddf8R117

Returning float is following the numpy rules.

I think we're sticking with object now?

What do you mean exactly? That it should return object? Before (< 0.19.0), it returned int64, now it returns float64.
Which I think is correct, and this issue can just be closed

Current DataFrame.values docs also seem to correctly mention the float return type (minus a typo :-)):

In [75]: pd.DataFrame.values?
Type:        property
String form: <property object at 0x7f7253dd2408>
Docstring:  
Numpy representation of NDFrame

Notes
-----
The dtype will be a lower-common-denominator dtype (implicit
upcasting); that is to say if the dtypes (even of numeric types)
are mixed, the one that accommodates all will be chosen. Use this
with care if you are not dealing with the blocks.

e.g. If the dtypes are float16 and float32, dtype will be upcast to
float32.  If dtypes are int32 and uint8, dtype will be upcast to
int32. By numpy.find_common_type convention, mixing int64 and uint64
will result in a flot64 dtype.

jorisvandenbossche · 2017-02-18T12:03:24Z

Fixed by #13917

gfyoung · 2017-02-18T17:33:06Z

@jorisvandenbossche : The reason why I asked about object is because that is what we return when we call something like to_numeric and we have mixed uint and signed int. I'm always wary of float because it destroys precision on the uint values, that's all.

I agree the documentation is very clear on the behavior, but whether it should be consistent with how we handle other uint and int situations is a different matter.

jreback added Docs Dtype Conversions Unexpected or buggy dtype conversions Difficulty Novice labels Jun 17, 2015

jreback added this to the Someday milestone Jun 17, 2015

sstanovnik mentioned this issue Aug 9, 2016

BUG: multi-type SparseDataFrame fixes and improvements #13917

Closed

3 tasks

jorisvandenbossche modified the milestones: 0.19.0, Someday Feb 18, 2017

jorisvandenbossche closed this as completed Feb 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG/DOC: DataFrame.values return type when uint64 is mixed with signed int types #10364

BUG/DOC: DataFrame.values return type when uint64 is mixed with signed int types #10364

kawochen commented Jun 16, 2015

jreback commented Jun 17, 2015

petehuang commented Dec 28, 2016

jreback commented Feb 17, 2017

gfyoung commented Feb 17, 2017

gfyoung commented Feb 18, 2017

jorisvandenbossche commented Feb 18, 2017

jorisvandenbossche commented Feb 18, 2017

gfyoung commented Feb 18, 2017 •

edited

Loading

BUG/DOC: DataFrame.values return type when uint64 is mixed with signed int types #10364

BUG/DOC: DataFrame.values return type when uint64 is mixed with signed int types #10364

Comments

kawochen commented Jun 16, 2015

jreback commented Jun 17, 2015

petehuang commented Dec 28, 2016

jreback commented Feb 17, 2017

gfyoung commented Feb 17, 2017

gfyoung commented Feb 18, 2017

jorisvandenbossche commented Feb 18, 2017

jorisvandenbossche commented Feb 18, 2017

gfyoung commented Feb 18, 2017 • edited Loading

gfyoung commented Feb 18, 2017 •

edited

Loading