Skip to content

ENH: dtype-unaware (empty) objects ("any" dtype) #48110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
3 changes: 2 additions & 1 deletion pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -2117,7 +2117,8 @@ def _setitem_with_indexer_missing(self, indexer, value):
curr_dtype = getattr(curr_dtype, "numpy_dtype", curr_dtype)
new_dtype = maybe_promote(curr_dtype, value)[0]
else:
new_dtype = None
if isinstance(self.obj.dtype, object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea. This breaks the current behavior and the special case for object looks a bit weird. What happens if you assign a different dtype? This would still change which is inconsistent then-

Also you have to keep the fall through clause

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also wont this isinstance be True for everything?

Copy link
Contributor Author

@weikhor weikhor Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also wont this isinstance be True for everything? Yes. I should not do like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea. This breaks the current behavior and the special case for object looks a bit weird. What happens if you assign a different dtype? This would still change which is inconsistent then-

Also you have to keep the fall through clause

From this issue #19647 , when adding element to empty object Series. The series should change based on type object which is set by user.

s = pd.Series(dtype='object')
s.loc['myint'] = 1
s.loc['myfloat'] = 2.

Expected output:

myint      1.0
myfloat    2.0
dtype: object

If I assign to different dtype, for example

s = pd.Series(dtype=float)
s.loc['myint'] = 1.0
s.loc['myfloat'] = 2

Output:

myint      1.0
myfloat    2.0
dtype: float64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the empty series has some other dtype apart from object? Then this is not caught by your current change.

on the other side, if the dtype is the default this should still change, this is not covered here

Copy link
Contributor Author

@weikhor weikhor Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the empty series has some other dtype apart from object? Then this is not caught by your current change.

If the empty series has some other dtype apart from object, for example

s = pd.Series(dtype=float)
s.loc['myint'] = 1.0
s.loc['myfloat'] = 2
print(s)

Output will be like this

myint      1.0
myfloat    2.0
dtype: float64

new_dtype = self.obj.dtype

new_values = Series([value], dtype=new_dtype)._values

Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/series/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -1665,3 +1665,18 @@ def test_setitem_empty_mask_dont_upcast_dt64():
ser.mask(mask, "foo", inplace=True)
assert ser.dtype == dti.dtype # no-op -> dont upcast
tm.assert_series_equal(ser, orig)


def test_setitem_on_series_dtype_object():
# GH#19647
result = Series(dtype="object")
result.loc["int"] = 1
result.loc["float"] = 2.0
expected = Series(data=[1, 2.0], index=["int", "float"]).astype("object")
tm.assert_series_equal(result, expected)

result = Series()
result.loc["int"] = 1
result.loc["float"] = 2.0
expected = Series(data=[1, 2.0], index=["int", "float"]).astype("float")
tm.assert_series_equal(result, expected)