Skip to content

BUG: Bug in loc did not change dtype when complete column was assigned #37749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 40 commits into from
Closed
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
6450a2c
BUG: Bug in loc did not change dtype when complete columne was assigned
phofl Nov 10, 2020
1599c5c
Fix list comprehension issue
phofl Nov 10, 2020
4d39612
Fix import order
phofl Nov 10, 2020
f9f37cb
Add test
phofl Nov 11, 2020
5cf355b
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 11, 2020
8d203f9
Change dtype for 32 bit
phofl Nov 11, 2020
e35e009
Implement fix and add new test
phofl Nov 11, 2020
4c391da
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 13, 2020
71fbf9f
Add new column
phofl Nov 13, 2020
babcd38
Run black
phofl Nov 13, 2020
caa6046
Parametrize tests
phofl Nov 13, 2020
8b95236
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 13, 2020
3b98ee0
Adress review comments
phofl Nov 14, 2020
f9b8a59
Change whatsnew wording
phofl Nov 14, 2020
4bef38e
Simplify tests
phofl Nov 14, 2020
27ea3e2
Fix related issue
phofl Nov 15, 2020
f94277b
Add issues
phofl Nov 15, 2020
279e812
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 15, 2020
d5f6150
Move import
phofl Nov 15, 2020
706dc6a
Delete line
phofl Nov 15, 2020
66d4b4e
Fix return value
phofl Nov 15, 2020
fa25075
Move and rename tests
phofl Nov 17, 2020
3c06ba6
Fix failing test
phofl Nov 17, 2020
a33659c
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 17, 2020
0f556c4
Fix pre commit
phofl Nov 17, 2020
181e62a
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 17, 2020
b759ac9
Remove import
phofl Nov 17, 2020
a353930
Fix test
phofl Nov 17, 2020
d28e1e1
Add test
phofl Nov 17, 2020
1aa8522
Adress review comments
phofl Nov 20, 2020
1bc0d46
Fix test
phofl Nov 21, 2020
61aab16
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 21, 2020
14fe5a8
Move test
phofl Nov 21, 2020
26b5d6f
Fix test
phofl Nov 21, 2020
913ffea
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 22, 2020
e6e22f3
Fix bug with series to cell
phofl Nov 22, 2020
23f6f3b
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Nov 22, 2020
99b87c9
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Dec 23, 2020
f97a252
Move whatsnew
phofl Dec 23, 2020
700ce6c
Merge branch 'master' of https://github.com/pandas-dev/pandas into 20635
phofl Feb 13, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -477,6 +477,7 @@ Indexing
- Bug in :meth:`DataFrame.loc` returned requested key plus missing values when ``loc`` was applied to single level from :class:`MultiIndex` (:issue:`27104`)
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`CategoricalIndex` using a listlike indexer containing NA values (:issue:`37722`)
- Bug in :meth:`DataFrame.xs` ignored ``droplevel=False`` for columns (:issue:`19056`)
- Bug in :meth:`DataFrame.loc` did not preserve dtype of new values, when complete columns was assigned (:issue:`20635`)

Missing
^^^^^^^
Expand Down
9 changes: 9 additions & 0 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

from pandas.core.dtypes.common import (
is_array_like,
is_dtype_equal,
is_hashable,
is_integer,
is_iterator,
Expand Down Expand Up @@ -1550,6 +1551,14 @@ def _setitem_with_indexer(self, indexer, value):
val = list(value.values()) if isinstance(value, dict) else value
blk = self.obj._mgr.blocks[0]
take_split_path = not blk._can_hold_element(val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be the else condtiion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, value can be anything from int, float to numpy array. I think this check is only necessary if we have Series or DataFrame. Maybe with an array?

if not take_split_path:
if isinstance(value, ABCSeries):
take_split_path = not (is_dtype_equal(value.dtype, blk.dtype))
elif isinstance(value, ABCDataFrame):
dtypes = list(value.dtypes.unique())
take_split_path = not (
len(dtypes) == 1 and is_dtype_equal(dtypes[0], blk.dtype)
)

# if we have any multi-indexes that have non-trivial slices
# (not null slices) then we must take the split path, xref
Expand Down
35 changes: 35 additions & 0 deletions pandas/tests/frame/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
date_range,
notna,
period_range,
to_datetime,
)
import pandas._testing as tm
from pandas.core.arrays import SparseArray
Expand Down Expand Up @@ -298,6 +299,40 @@ def test_iloc_setitem_bool_indexer(self, klass):
expected = DataFrame({"flag": ["x", "y", "z"], "value": [2, 3, 4]})
tm.assert_frame_equal(df, expected)

@pytest.mark.parametrize("dtype", ["int64", "Int64"])
def test_setitem_complete_columns_different_dtypes(self, dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make these test_setitem_null_slice_foo_bar (or possibly test_loc_setitem_null_slice_foo_bar per comment below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand it. I should rename the tests to test_loc_setitem_null_slice_foo_bar?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one would be test_setitem_null_slice_different_dtypes, the next one would be test_setitem_null_slice_single_column_series_value_different_dtype

basically the goal is to establish a pattern in test naming for indexing tests with test_{method_name}{key_description}...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the explanation. Could we maybe document this somewhere a bit more extensively?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we maybe document this somewhere a bit more extensively?

at some point i hope. the pattern isnt exactly in place yet

# GH: 20635
df = DataFrame({"A": ["a", "b"], "B": ["1", "2"], "C": ["3", "4"], "D": [1, 2]})
rhs = df.loc[:, ["B", "C"]].astype("int64").astype(dtype)
df.loc[:, ["B", "C"]] = rhs
expected = DataFrame({"A": ["a", "b"], "B": [1, 2], "C": [3, 4], "D": [1, 2]})
expected[["B", "C"]] = expected[["B", "C"]].astype(dtype)
tm.assert_frame_equal(df, expected)

def test_setitem_single_column_as_series_different_dtype(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about (add to the original frame) and test Int64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# GH: 20635
df = DataFrame({"A": ["a", "b"], "B": ["1", "2"], "C": ["3", "4"]})
df.loc[:, "C"] = df.loc[:, "C"].astype("int64")
expected = DataFrame({"A": ["a", "b"], "B": ["1", "2"], "C": [3, 4]})
tm.assert_frame_equal(df, expected)

def test_setitem_conversion_to_datetime(self):
# GH: 20511
df = DataFrame(
[["2015-01-01", "2016-01-01"], ["2016-01-01", "2015-01-01"]],
columns=["date0", "date1"],
)
df.iloc[:, [0]] = df.iloc[:, [0]].apply(
lambda x: to_datetime(x, errors="coerce")
)
expected = DataFrame(
{
"date0": [to_datetime("2015-01-01"), to_datetime("2016-01-01")],
"date1": ["2016-01-01", "2015-01-01"],
}
)
tm.assert_frame_equal(df, expected)


class TestDataFrameSetItemSlicing:
def test_setitem_slice_position(self):
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/series/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,14 @@ def test_setitem_nan_casts(self):
ser[:5] = np.nan
tm.assert_series_equal(ser, expected)

@pytest.mark.parametrize("dtype", ["int64", "Int64"])
def test_setitem_assigning_different_dtype(self, dtype):
# GH: 20635
ser = Series(["3", "4"], name="A")
ser.loc[:] = ser.loc[:].astype("int64").astype(dtype)
expected = Series([3, 4], name="A", dtype=dtype)
tm.assert_series_equal(ser, expected)


class TestSetitemWithExpansion:
def test_setitem_empty_series(self):
Expand Down