-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: concat of Series of EA and other dtype fails #20840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: concat of Series of EA and other dtype fails #20840
Conversation
Codecov Report
@@ Coverage Diff @@
## master #20840 +/- ##
==========================================
+ Coverage 91.77% 91.78% +<.01%
==========================================
Files 153 153
Lines 49280 49313 +33
==========================================
+ Hits 45229 45261 +32
- Misses 4051 4052 +1
Continue to review full report at Codecov.
|
pandas/core/dtypes/concat.py
Outdated
@@ -175,8 +175,8 @@ def is_nonempty(x): | |||
return _concat_sparse(to_concat, axis=axis, typs=typs) | |||
|
|||
extensions = [is_extension_array_dtype(x) for x in to_concat] | |||
if any(extensions): | |||
to_concat = [np.atleast_2d(x.astype('object')) for x in to_concat] | |||
if any(extensions) and axis == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so what happens on axis=0? coerced toobject
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right (everything is upcast).
@@ -64,6 +64,11 @@ def test_concat_mixed_dtypes(self, data): | |||
expected = pd.concat([df1.astype('object'), df2.astype('object')]) | |||
self.assert_frame_equal(result, expected) | |||
|
|||
result = pd.concat([df1['A'], df2['A']]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just for axis=1? I would make this a separate test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug is just with axis=0
, the default, which this is testing.
For axis=1
there's not any up-casting required.
Though I don't see any tests for concat(..., axis='columns')
with the extension array. Can you add some Joris? concat EA and EA, mix of EA and non-EA, aligned and no aligned. Or should I?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, thought we already had some, but indeed, apparently not. Will add some.
pandas/core/dtypes/concat.py
Outdated
if any(extensions): | ||
to_concat = [np.atleast_2d(x.astype('object')) for x in to_concat] | ||
if any(extensions) and axis == 1: | ||
to_concat = [np.atleast_2d(x.astype('object')) for x in to_concat] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you added this extra indent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, sorry that was from a previous iteration where I had a second if
:-)
I may have missed some, only glanced briefly in extension/base/reshape.py
…On Sat, Apr 28, 2018 at 7:07 AM, Joris Van den Bossche < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/tests/extension/base/reshaping.py
<#20840 (comment)>:
> @@ -64,6 +64,11 @@ def test_concat_mixed_dtypes(self, data):
expected = pd.concat([df1.astype('object'), df2.astype('object')])
self.assert_frame_equal(result, expected)
+ result = pd.concat([df1['A'], df2['A']])
Hmm, thought we already had some, but indeed, apparently not. Will add
some.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20840 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIsIGGdaqV7zfq5-4kAZFoJZW9j_zks5ttFudgaJpZM4TqFIC>
.
|
@@ -175,7 +175,7 @@ def is_nonempty(x): | |||
return _concat_sparse(to_concat, axis=axis, typs=typs) | |||
|
|||
extensions = [is_extension_array_dtype(x) for x in to_concat] | |||
if any(extensions): | |||
if any(extensions) and axis == 1: | |||
to_concat = [np.atleast_2d(x.astype('object')) for x in to_concat] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this very strange that you need to add axis=1 here, I don't think this routine should be call at all if axis=1. This seems like it should be handled at a higher level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this very strange that you need to add axis=1 here, I don't think this routine should be call at all if axis=1
axis=1
is used all over _concat_compat
. Why do you think it's strange?
For the specific case from #20832 it is axis=0
, but we don't want to reshape to 2d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the special casing non-consolidateble blocks with axis=1
is indeed an existing pattern.
For categoricals:
pandas/pandas/core/dtypes/concat.py
Lines 220 to 221 in 563a6ad
if axis == 1: | |
return res.reshape(1, len(res)) |
for datetimetz:
pandas/pandas/core/dtypes/concat.py
Lines 453 to 454 in 563a6ad
if axis == 1: | |
x = np.atleast_2d(x) |
Note that this is all rather confusing, as axis=1
in the "concatting code" actually means axis=0
in user facing code (because we store the 1D data in 2D objects):
pandas/pandas/core/reshape/concat.py
Lines 312 to 315 in 563a6ad
# Need to flip BlockManager axis in the DataFrame special case | |
self._is_frame = isinstance(sample, DataFrame) | |
if self._is_frame: | |
axis = 1 if axis == 0 else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right this should be changed to use _get_block_manager_axis
FYI (but can be in future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right this should be changed to use _get_block_manager_axis FYI (but can be in future).
I don't think so, _get_block_manager_axis
is for a full Series/DataFrame, the logic here is dtype-dependent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does your respsone have to do with my comment? this is exactly what _get_block_managed_axis does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, my response was not very well phrased, but still, then you will have to explain your comment better :-)
I understood your comment as: "we should use _get_block_manager_axis
here ", and AFAIK, _get_block_manager_axis
lets you convert an axis of 0/1 to the appropriate number for blocks (so for DataFrame it switches 0 and 1). But here, I don't convert any axis
argument, I reshape data depending on the axis
value (and depending on the dtype).
So I don't fully understand where in the above code lines I would use _get_block_manager_axis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff --git a/pandas/core/reshape/concat.py b/pandas/core/reshape/concat.py
index 6e564975f..6570e7a83 100644
--- a/pandas/core/reshape/concat.py
+++ b/pandas/core/reshape/concat.py
@@ -312,7 +312,7 @@ class _Concatenator(object):
# Need to flip BlockManager axis in the DataFrame special case
self._is_frame = isinstance(sample, DataFrame)
if self._is_frame:
- axis = 1 if axis == 0 else 0
+ axis = sample._get_block_manager_axis(axis)
self._is_series = isinstance(sample, Series)
if not 0 <= axis <= sample.ndim:
is a more correct usage, though prob doesn't simplify code much. this is spagetti anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is spagetti anyhow.
On that we certainly agree! :-)
Closes #20832