Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
DF.__setitem__ creates extension column when given extension scalar #34875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DF.__setitem__ creates extension column when given extension scalar #34875
Changes from 39 commits
0ec5911
9336955
01fb076
5c8b356
2c1f640
d509bf4
e231bb1
18ed043
a6b18f4
cbc29be
2f79822
0f9178e
bfa18fb
e7e9a48
291eb2d
3a788ed
38d7ce5
a5e8df5
5e439bd
6cc7959
7e27a6e
90a8570
39b2984
7a01041
03e528b
7bb9553
a3be9a6
3a92164
c93a847
966283a
f2aea7b
6495a36
42e7afa
8343df3
a50a42c
3452c20
6f3fb51
b95cdfc
6830fde
6653ef8
c73a2de
100f334
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you not do this inside cast_scalar_to_array (pass in the
infer_dtype
optionally)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what you mean by "this"? Your last comments were to move this whole if/else logic out of
cast_scalar_to_array
, which I do agree made for a much cleaner solutionThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of adding this if/then clause, can you change
cast_scalar_to_array
to just handle if the dtype is an extension_array? (and obviously pass infer_dtype here)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast_scalar_to_array
only deals with numpy arrays. Initially this PR added handling extension dtypes in there as well, but after discussion we actually moved it out because it gave a lot of complications (eg EAs don't support 2D arrays, whilecast_scalar_to_array
generally does).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly the reason to make the change. we are almost certainly going to need to do this in other places and this is a natural fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more specific, it would be easy if
cast_scalar_array
was simply returning a single 1d array. But in most cases it is not, it is returning a 2d array.Since there's no good way to make a 2d Extension array, I handled this by returning a list of Extension arrays. Is this what your asking I go back to doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot "simply move the if_extenseion array case into cast_scalar_arry", because this is a specific case where
cast_scalar_array
is receiving an int to itsshape
parameter. But in general,shape
is a tuple to define the shape of the 2d array.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/cast.py#L1492
you just need to move the if_extension_dtype into this function. I don't think there is ambiguity or anything. This returns exactly 1 ndarray. I am not sure where 2D even matters here at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single ndarray can be 2 dimensional, and in this case it often is, for example, when it is used here
Doing this in
cast_scalar_to_array
would NOT work without a larger refactor.this
shape
parameter is usually a tuple not an int like it is here, soconstruct_1d_arraylike_from_scalar
would fail whenshape=(2, 5)
for example (since extension arrays are only 1d)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This returns exactly 1 ndarray" is technically true, but it can be a 2d ndarray
For example
would give you a numpy array like