BUG: concat of Series of EA and other dtype fails #20840

jorisvandenbossche · 2018-04-27T09:43:18Z

Closes #20832

codecov · 2018-04-27T10:33:36Z

Codecov Report

Merging #20840 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #20840      +/-   ##
==========================================
+ Coverage   91.77%   91.78%   +<.01%     
==========================================
  Files         153      153              
  Lines       49280    49313      +33     
==========================================
+ Hits        45229    45261      +32     
- Misses       4051     4052       +1

Flag	Coverage Δ
#multiple	`90.17% <100%> (ø)`	⬆️
#single	`41.88% <100%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/dtypes/concat.py	`99.18% <100%> (ø)`	⬆️
pandas/util/testing.py	`84.59% <0%> (-0.21%)`	⬇️
pandas/core/arrays/base.py	`83.95% <0%> (ø)`	⬆️
pandas/api/extensions/__init__.py	`100% <0%> (ø)`	⬆️
pandas/core/series.py	`93.99% <0%> (ø)`	⬆️
pandas/core/internals.py	`95.59% <0%> (+0.01%)`	⬆️
pandas/core/indexing.py	`93.14% <0%> (+0.05%)`	⬆️
pandas/core/dtypes/missing.py	`92.94% <0%> (+0.08%)`	⬆️
pandas/core/algorithms.py	`94.48% <0%> (+0.08%)`	⬆️
pandas/core/dtypes/cast.py	`88.06% <0%> (+0.2%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6cacdde...964a5ff. Read the comment docs.

jreback · 2018-04-27T10:35:06Z

pandas/core/dtypes/concat.py

@@ -175,8 +175,8 @@ def is_nonempty(x):
        return _concat_sparse(to_concat, axis=axis, typs=typs)

    extensions = [is_extension_array_dtype(x) for x in to_concat]
-    if any(extensions):
-        to_concat = [np.atleast_2d(x.astype('object')) for x in to_concat]
+    if any(extensions) and axis == 1:


so what happens on axis=0? coerced toobject?

Right (everything is upcast).

jreback · 2018-04-27T10:35:35Z

pandas/tests/extension/base/reshaping.py

@@ -64,6 +64,11 @@ def test_concat_mixed_dtypes(self, data):
        expected = pd.concat([df1.astype('object'), df2.astype('object')])
        self.assert_frame_equal(result, expected)

+        result = pd.concat([df1['A'], df2['A']])


is this just for axis=1? I would make this a separate test.

The bug is just with axis=0, the default, which this is testing.

For axis=1 there's not any up-casting required.

Though I don't see any tests for concat(..., axis='columns') with the extension array. Can you add some Joris? concat EA and EA, mix of EA and non-EA, aligned and no aligned. Or should I?

Hmm, thought we already had some, but indeed, apparently not. Will add some.

TomAugspurger · 2018-04-28T10:59:27Z

pandas/core/dtypes/concat.py

-    if any(extensions):
-        to_concat = [np.atleast_2d(x.astype('object')) for x in to_concat]
+    if any(extensions) and axis == 1:
+            to_concat = [np.atleast_2d(x.astype('object')) for x in to_concat]


Any reason you added this extra indent?

nope, sorry that was from a previous iteration where I had a second if :-)

TomAugspurger · 2018-04-28T12:13:09Z

I may have missed some, only glanced briefly in extension/base/reshape.py

…

On Sat, Apr 28, 2018 at 7:07 AM, Joris Van den Bossche < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/tests/extension/base/reshaping.py <#20840 (comment)>: > @@ -64,6 +64,11 @@ def test_concat_mixed_dtypes(self, data): expected = pd.concat([df1.astype('object'), df2.astype('object')]) self.assert_frame_equal(result, expected) + result = pd.concat([df1['A'], df2['A']]) Hmm, thought we already had some, but indeed, apparently not. Will add some. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#20840 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIsIGGdaqV7zfq5-4kAZFoJZW9j_zks5ttFudgaJpZM4TqFIC> .

jreback · 2018-04-28T13:47:12Z

pandas/core/dtypes/concat.py

@@ -175,7 +175,7 @@ def is_nonempty(x):
        return _concat_sparse(to_concat, axis=axis, typs=typs)

    extensions = [is_extension_array_dtype(x) for x in to_concat]
-    if any(extensions):
+    if any(extensions) and axis == 1:
        to_concat = [np.atleast_2d(x.astype('object')) for x in to_concat]


I find this very strange that you need to add axis=1 here, I don't think this routine should be call at all if axis=1. This seems like it should be handled at a higher level

I find this very strange that you need to add axis=1 here, I don't think this routine should be call at all if axis=1

axis=1 is used all over _concat_compat. Why do you think it's strange?

For the specific case from #20832 it is axis=0, but we don't want to reshape to 2d.

Yes, the special casing non-consolidateble blocks with axis=1 is indeed an existing pattern.

For categoricals:

pandas/pandas/core/dtypes/concat.py

Lines 220 to 221 in 563a6ad

if axis == 1:

return res.reshape(1, len(res))

for datetimetz:

pandas/pandas/core/dtypes/concat.py

Lines 453 to 454 in 563a6ad

if axis == 1:

x = np.atleast_2d(x)

Note that this is all rather confusing, as axis=1 in the "concatting code" actually means axis=0 in user facing code (because we store the 1D data in 2D objects):

pandas/pandas/core/reshape/concat.py

Lines 312 to 315 in 563a6ad

# Need to flip BlockManager axis in the DataFrame special case

self._is_frame = isinstance(sample, DataFrame)

if self._is_frame:

axis = 1 if axis == 0 else 0

right this should be changed to use _get_block_manager_axis FYI (but can be in future).

right this should be changed to use _get_block_manager_axis FYI (but can be in future).

I don't think so, _get_block_manager_axis is for a full Series/DataFrame, the logic here is dtype-dependent

what does your respsone have to do with my comment? this is exactly what _get_block_managed_axis does

OK, my response was not very well phrased, but still, then you will have to explain your comment better :-)

I understood your comment as: "we should use _get_block_manager_axis here ", and AFAIK, _get_block_manager_axis lets you convert an axis of 0/1 to the appropriate number for blocks (so for DataFrame it switches 0 and 1). But here, I don't convert any axis argument, I reshape data depending on the axis value (and depending on the dtype).
So I don't fully understand where in the above code lines I would use _get_block_manager_axis

diff --git a/pandas/core/reshape/concat.py b/pandas/core/reshape/concat.py index 6e564975f..6570e7a83 100644 --- a/pandas/core/reshape/concat.py +++ b/pandas/core/reshape/concat.py @@ -312,7 +312,7 @@ class _Concatenator(object): # Need to flip BlockManager axis in the DataFrame special case self._is_frame = isinstance(sample, DataFrame) if self._is_frame: - axis = 1 if axis == 0 else 0 + axis = sample._get_block_manager_axis(axis) self._is_series = isinstance(sample, Series) if not 0 <= axis <= sample.ndim:

is a more correct usage, though prob doesn't simplify code much. this is spagetti anyhow.

this is spagetti anyhow.

On that we certainly agree! :-)

BUG: concat of Series of EA and other dtype fails

c230f2e

jorisvandenbossche added Bug ExtensionArray Extending pandas with custom dtypes or arrays. labels Apr 27, 2018

jorisvandenbossche added this to the 0.23.0 milestone Apr 27, 2018

jreback requested changes Apr 27, 2018

View reviewed changes

TomAugspurger reviewed Apr 28, 2018

View reviewed changes

fix indent

0dfa346

add test for concat axis=1

964a5ff

jreback requested changes Apr 28, 2018

View reviewed changes

TomAugspurger approved these changes Apr 28, 2018

View reviewed changes

jreback approved these changes Apr 29, 2018

View reviewed changes

jreback merged commit 6322043 into pandas-dev:master Apr 29, 2018

jorisvandenbossche deleted the ea-concat-mixed branch April 30, 2018 06:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: concat of Series of EA and other dtype fails #20840

BUG: concat of Series of EA and other dtype fails #20840

jorisvandenbossche commented Apr 27, 2018

codecov bot commented Apr 27, 2018 •

edited

Loading

jreback Apr 27, 2018

TomAugspurger Apr 28, 2018

jreback Apr 27, 2018

TomAugspurger Apr 28, 2018

jorisvandenbossche Apr 28, 2018

TomAugspurger Apr 28, 2018

jorisvandenbossche Apr 28, 2018

TomAugspurger commented Apr 28, 2018 via email

jreback Apr 28, 2018

TomAugspurger Apr 28, 2018

jorisvandenbossche Apr 29, 2018

jreback Apr 29, 2018

jorisvandenbossche Apr 30, 2018

jreback Apr 30, 2018

jorisvandenbossche Apr 30, 2018

jreback Apr 30, 2018

jorisvandenbossche Apr 30, 2018

	# Need to flip BlockManager axis in the DataFrame special case
	self._is_frame = isinstance(sample, DataFrame)
	if self._is_frame:
	axis = 1 if axis == 0 else 0

BUG: concat of Series of EA and other dtype fails #20840

BUG: concat of Series of EA and other dtype fails #20840

Conversation

jorisvandenbossche commented Apr 27, 2018

codecov bot commented Apr 27, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Apr 28, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Apr 27, 2018 •

edited

Loading