-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DF.__setitem__ creates extension column when given extension scalar #34875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 29 commits
0ec5911
9336955
01fb076
5c8b356
2c1f640
d509bf4
e231bb1
18ed043
a6b18f4
cbc29be
2f79822
0f9178e
bfa18fb
e7e9a48
291eb2d
3a788ed
38d7ce5
a5e8df5
5e439bd
6cc7959
7e27a6e
90a8570
39b2984
7a01041
03e528b
7bb9553
a3be9a6
3a92164
c93a847
966283a
f2aea7b
6495a36
42e7afa
8343df3
a50a42c
3452c20
6f3fb51
b95cdfc
6830fde
6653ef8
c73a2de
100f334
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -75,6 +75,7 @@ | |
from pandas.core.dtypes.cast import ( | ||
cast_scalar_to_array, | ||
coerce_to_dtypes, | ||
construct_1d_arraylike_from_scalar, | ||
find_common_type, | ||
infer_dtype_from_scalar, | ||
invalidate_string_dtypes, | ||
|
@@ -515,24 +516,38 @@ def __init__( | |
else: | ||
mgr = init_dict({}, index, columns, dtype=dtype) | ||
else: | ||
try: | ||
arr = np.array(data, dtype=dtype, copy=copy) | ||
except (ValueError, TypeError) as err: | ||
exc = TypeError( | ||
"DataFrame constructor called with " | ||
f"incompatible data and dtype: {err}" | ||
) | ||
raise exc from err | ||
if not dtype: | ||
dtype, _ = infer_dtype_from_scalar(data, pandas_dtype=True) | ||
|
||
if arr.ndim == 0 and index is not None and columns is not None: | ||
values = cast_scalar_to_array( | ||
(len(index), len(columns)), data, dtype=dtype | ||
) | ||
mgr = init_ndarray( | ||
values, index, columns, dtype=values.dtype, copy=False | ||
) | ||
if is_extension_array_dtype(dtype): | ||
justinessert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if index is None or columns is None: | ||
justinessert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
raise ValueError("DataFrame constructor not properly called!") | ||
|
||
values = [ | ||
construct_1d_arraylike_from_scalar(data, len(index), dtype) | ||
for _ in range(len(columns)) | ||
] | ||
mgr = arrays_to_mgr(values, columns, index, columns, dtype=None) | ||
else: | ||
raise ValueError("DataFrame constructor not properly called!") | ||
try: | ||
justinessert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
arr = np.array(data, dtype=dtype, copy=copy) | ||
except (ValueError, TypeError) as err: | ||
exc = TypeError( | ||
"DataFrame constructor called with " | ||
f"incompatible data and dtype: {err}" | ||
) | ||
raise exc from err | ||
|
||
if arr.ndim == 0 and index is not None and columns is not None: | ||
values = cast_scalar_to_array( | ||
(len(index), len(columns)), data, dtype=dtype | ||
) | ||
|
||
mgr = init_ndarray( | ||
values, index, columns, dtype=values.dtype, copy=False | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) | ||
else: | ||
raise ValueError("DataFrame constructor not properly called!") | ||
|
||
NDFrame.__init__(self, mgr) | ||
|
||
|
@@ -3730,7 +3745,13 @@ def reindexer(value): | |
infer_dtype, _ = infer_dtype_from_scalar(value, pandas_dtype=True) | ||
|
||
# upcast | ||
value = cast_scalar_to_array(len(self.index), value) | ||
if is_extension_array_dtype(infer_dtype): | ||
value = construct_1d_arraylike_from_scalar( | ||
value, len(self.index), infer_dtype | ||
) | ||
else: | ||
value = cast_scalar_to_array(len(self.index), value) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you not do this inside cast_scalar_to_array (pass in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you clarify what you mean by "this"? Your last comments were to move this whole if/else logic out of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. instead of adding this if/then clause, can you change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. exactly the reason to make the change. we are almost certainly going to need to do this in other places and this is a natural fit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be more specific, it would be easy if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We cannot "simply move the if_extenseion array case into cast_scalar_arry", because this is a specific case where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/cast.py#L1492 you just need to move the if_extension_dtype into this function. I don't think there is ambiguity or anything. This returns exactly 1 ndarray. I am not sure where 2D even matters here at all. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A single ndarray can be 2 dimensional, and in this case it often is, for example, when it is used here Doing this in
this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "This returns exactly 1 ndarray" is technically true, but it can be a 2d ndarray For example values = np.empty((2,3), dtype=int)
values.fill(2) would give you a numpy array like
|
||
|
||
value = maybe_cast_to_datetime(value, infer_dtype) | ||
|
||
# return internal types directly | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we had to change some other tests, I think we need to break this to a new section and show the changes from before. E.g. construction of a multi-column df now is object if we don't have unform datetimes? (we need to be very clear what is the change here since we had to change some tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realized that I added this to the wrong file, so I'm moving this addition to
v1.1.0.rst
. But can you clarify what you would like me to do here? I'm not totally sure based on your comment.I think that this line correctly describes the change. Are you asking to also include an example, such as
The example you gave, where a datetime column has multiple different timezones, this was always an object column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok tell you what. let's create an issue to refactor this (frame constructor). That's what I mainly have a problem with, we are adding if/then ALL over the place for extension types rather than a proper refactor.
So ok with merging this (just one small change on the naming in the tests). And please create an issue (and if you want / can refactor would be great) as a followup.