Skip to content

[BUG] loc does not preserve datatype with a single element #36247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
glyg opened this issue Sep 9, 2020 · 7 comments
Closed

[BUG] loc does not preserve datatype with a single element #36247

glyg opened this issue Sep 9, 2020 · 7 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions

Comments

@glyg
Copy link
Contributor

glyg commented Sep 9, 2020

Opening a new issue as per @Dr-Irv suggestion on #31763
Sorry issue formatting got lost

import pandas as pd
import numpy as np

not_preserved = pd.DataFrame(np.arange(24).reshape((8, 3)), columns=list("ABC"))
not_preserved['F'] = np.random.random(8)

print("Mixed datatype DataFrame")
print(not_preserved.dtypes)

print("Single value loc: ")
print(not_preserved.loc[0, ['A', 'B', 'C']].dtypes)

print("Single value through slice: ")
print(not_preserved.loc[0:0, ['A', 'B', 'C']].dtypes)

print("The columns are cast to float:")
print(not_preserved.loc[0, ['A', 'B', 'C']])

# This prints:
# Mixed datatype DataFrame
# A      int64
# B      int64
# C      int64
# F    float64
# dtype: object
# Single value loc: 
# float64
# Single value through slice: 
# A    int64
# B    int64
# C    int64
# dtype: object

# The columns are cast to float:
# A    0.0  
# B    1.0
# C    2.0

As a test:

# A single cell is fine

assert isinstance(not_preserved.loc[0, "A"], np.int64) # passes

assert not_preserved.loc[0:0, ["A", "B"]].dtypes["A"] is np.dtype('int64') # passes
assert not_preserved.loc[0, ["A", "B"]].dtypes is np.dtype('int64') # AssertionError

This is with pandas 1.0.3 but was already the case in 0.24.1, it seems that what changed was the behavior of a replace method downstream in my code.
Originally posted by @glyg in #31763 (comment)

@phofl
Copy link
Member

phofl commented Sep 9, 2020

Hi, could you please provide your expected output?

@glyg
Copy link
Contributor Author

glyg commented Sep 9, 2020

Hi, I'm editing the issue for clarity, sorry :D

@phofl phofl added Bug Dtype Conversions Unexpected or buggy dtype conversions labels Sep 9, 2020
@glyg
Copy link
Contributor Author

glyg commented Sep 9, 2020

Sorry the issue formatting got shunt by the quote mechanism. I think it's a pure pandas problem, so the full version info might not be necessary

@phofl
Copy link
Member

phofl commented Sep 9, 2020

The resulting Series of ```print(not_preserved.loc[0, ['A', 'B', 'C']].dtypes)`` is converted to a compatible dtype for all columns. There is no attempt to cast it back, when only a subset of columns is selected. You can not keep the dtype for all columns, if they are mixed like int, float and object, so I think this is intended?

@glyg
Copy link
Contributor Author

glyg commented Sep 10, 2020

Ok, so the idea is that the row term of the loc is applied first, and then the second - selecting the A, B, C columns from the resulting series? I had it in reverse, first getting A, B and C, which are int64 columns and then returning the first row as a Series -- Feel free to close the issue, I guess this is enough that it's there if someone have the same (admittedly corner case) problem.

@jreback
Copy link
Contributor

jreback commented Sep 10, 2020

this is a duplicate issue - there are several discussions about this

if mixing float / int we can't easily do much about this as numpy upcasts

add in an object column for example and a row selection will upcast to the compatible (which would be object)

so actually we could change this by returning an object dtype Series if any mixed types - there is a perf concern here though - would have to check and see how big of a deal this is (and it's hard to tell in the real world)

as i said i think there are some open issues about this

@TomAugspurger
Copy link
Contributor

Agreed that we have other issues about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

4 participants