Skip to content

BUG/DOC: DataFrame.values return type when uint64 is mixed with signed int types #10364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kawochen opened this issue Jun 16, 2015 · 8 comments
Closed
Labels
Docs Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@kawochen
Copy link
Contributor

DataFrame.values' doc mentions type promotion but leaves out that when uint64 is mixed with signed int types, the return type is int64, which can be surprising. Perhaps consider returning dtype(object) in internals._interleaved_dtype when uint64 and signed ints are present?

@jreback
Copy link
Contributor

jreback commented Jun 17, 2015

so this could certainly have an expanded doc-string (and maybe a link to np.find_common_type in the Notes section, though it uses custom logic to avoid always upcasting to object).

Certainly _interleaved_dtype could be improved, though general support for uint64 is pretty limited in general (e.g. using it in numerics will automatically make this object). So in reaility it is a type to be avoided, unless you really really have large ints.

@jreback jreback added Docs Dtype Conversions Unexpected or buggy dtype conversions Difficulty Novice labels Jun 17, 2015
@jreback jreback added this to the Someday milestone Jun 17, 2015
jreback pushed a commit that referenced this issue Aug 10, 2016
Author: Sašo Stanovnik <[email protected]>

Closes #13917 from sstanovnik/fix-multitype-series-slice and squashes the following commits:

8c7d1ea [Sašo Stanovnik] Colon to comma.
057d56b [Sašo Stanovnik] Wording and code organization fixes.
926ca1e [Sašo Stanovnik] Fix a derp.
442b8c1 [Sašo Stanovnik] Whatsnew, issue tag, test reordering.
8d675ad [Sašo Stanovnik] Add tests for common dtypes, raises check for pandas ones.
eebcb23 [Sašo Stanovnik] Moved multitype tests to sparse/tests/test_multitype.py
ac790d7 [Sašo Stanovnik] Modify .values docs to process issue #10364.
2104948 [Sašo Stanovnik] Factor the common type discovery to an internal function.
6782bc7 [Sašo Stanovnik] Revert default argument change.
93d2de6 [Sašo Stanovnik] Modified the whatsnew message.
33973a5 [Sašo Stanovnik] Additional multitype tests.
114217e [Sašo Stanovnik] Infer dtype instead of forcing float in SparseArray.
c7fb0f2 [Sašo Stanovnik] Use numpy to determine common dtypes.
fb6237c [Sašo Stanovnik] Add a whatsnew note.
2e833fa [Sašo Stanovnik] BUG: multi-type sparse slicing fixes and improvements
@petehuang
Copy link
Contributor

This was addressed via #13917 and should be closed. Thanks to @sstanovnik, @sinhrks and @jreback for the fix and @kawochen for the issue!

@jreback
Copy link
Contributor

jreback commented Feb 17, 2017

@gfyoung just noticed this a uint64 issue!

@gfyoung
Copy link
Member

gfyoung commented Feb 17, 2017

Hmm they seem to be everywhere 😀. Should double check that it's resolved.

@gfyoung
Copy link
Member

gfyoung commented Feb 18, 2017

@jreback : Okay, this is not resolved as I had thought. So when we call DataFrame.values, it defaults to float64 when int and uint are encountered together. I think we're sticking with object now?

@jorisvandenbossche
Copy link
Member

This was indeed addressed in the linked PR, test has changed here: https://github.com/pandas-dev/pandas/pull/13917/files#diff-137c96159899927aedd9b37f0f7dddf8R117

Returning float is following the numpy rules.

I think we're sticking with object now?

What do you mean exactly? That it should return object? Before (< 0.19.0), it returned int64, now it returns float64.
Which I think is correct, and this issue can just be closed

Current DataFrame.values docs also seem to correctly mention the float return type (minus a typo :-)):

In [75]: pd.DataFrame.values?
Type:        property
String form: <property object at 0x7f7253dd2408>
Docstring:  
Numpy representation of NDFrame

Notes
-----
The dtype will be a lower-common-denominator dtype (implicit
upcasting); that is to say if the dtypes (even of numeric types)
are mixed, the one that accommodates all will be chosen. Use this
with care if you are not dealing with the blocks.

e.g. If the dtypes are float16 and float32, dtype will be upcast to
float32.  If dtypes are int32 and uint8, dtype will be upcast to
int32. By numpy.find_common_type convention, mixing int64 and uint64
will result in a flot64 dtype.

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.19.0, Someday Feb 18, 2017
@jorisvandenbossche
Copy link
Member

Fixed by #13917

@gfyoung
Copy link
Member

gfyoung commented Feb 18, 2017

@jorisvandenbossche : The reason why I asked about object is because that is what we return when we call something like to_numeric and we have mixed uint and signed int. I'm always wary of float because it destroys precision on the uint values, that's all.

I agree the documentation is very clear on the behavior, but whether it should be consistent with how we handle other uint and int situations is a different matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

5 participants