Skip to content

BUG: concat of Series of EA and other dtype fails #20840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 29, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pandas/core/dtypes/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ def is_nonempty(x):
return _concat_sparse(to_concat, axis=axis, typs=typs)

extensions = [is_extension_array_dtype(x) for x in to_concat]
if any(extensions):
if any(extensions) and axis == 1:
to_concat = [np.atleast_2d(x.astype('object')) for x in to_concat]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this very strange that you need to add axis=1 here, I don't think this routine should be call at all if axis=1. This seems like it should be handled at a higher level

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this very strange that you need to add axis=1 here, I don't think this routine should be call at all if axis=1

axis=1 is used all over _concat_compat. Why do you think it's strange?

For the specific case from #20832 it is axis=0, but we don't want to reshape to 2d.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the special casing non-consolidateble blocks with axis=1 is indeed an existing pattern.

For categoricals:

if axis == 1:
return res.reshape(1, len(res))

for datetimetz:

if axis == 1:
x = np.atleast_2d(x)

Note that this is all rather confusing, as axis=1 in the "concatting code" actually means axis=0 in user facing code (because we store the 1D data in 2D objects):

# Need to flip BlockManager axis in the DataFrame special case
self._is_frame = isinstance(sample, DataFrame)
if self._is_frame:
axis = 1 if axis == 0 else 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right this should be changed to use _get_block_manager_axis FYI (but can be in future).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right this should be changed to use _get_block_manager_axis FYI (but can be in future).

I don't think so, _get_block_manager_axis is for a full Series/DataFrame, the logic here is dtype-dependent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does your respsone have to do with my comment? this is exactly what _get_block_managed_axis does

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, my response was not very well phrased, but still, then you will have to explain your comment better :-)

I understood your comment as: "we should use _get_block_manager_axis here ", and AFAIK, _get_block_manager_axis lets you convert an axis of 0/1 to the appropriate number for blocks (so for DataFrame it switches 0 and 1). But here, I don't convert any axis argument, I reshape data depending on the axis value (and depending on the dtype).
So I don't fully understand where in the above code lines I would use _get_block_manager_axis

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff --git a/pandas/core/reshape/concat.py b/pandas/core/reshape/concat.py
index 6e564975f..6570e7a83 100644
--- a/pandas/core/reshape/concat.py
+++ b/pandas/core/reshape/concat.py
@@ -312,7 +312,7 @@ class _Concatenator(object):
         # Need to flip BlockManager axis in the DataFrame special case
         self._is_frame = isinstance(sample, DataFrame)
         if self._is_frame:
-            axis = 1 if axis == 0 else 0
+            axis = sample._get_block_manager_axis(axis)
 
         self._is_series = isinstance(sample, Series)
         if not 0 <= axis <= sample.ndim:

is a more correct usage, though prob doesn't simplify code much. this is spagetti anyhow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is spagetti anyhow.

On that we certainly agree! :-)


if not nonempty:
Expand Down
25 changes: 25 additions & 0 deletions pandas/tests/extension/base/reshaping.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,31 @@ def test_concat_mixed_dtypes(self, data):
expected = pd.concat([df1.astype('object'), df2.astype('object')])
self.assert_frame_equal(result, expected)

result = pd.concat([df1['A'], df2['A']])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just for axis=1? I would make this a separate test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug is just with axis=0, the default, which this is testing.

For axis=1 there's not any up-casting required.

Though I don't see any tests for concat(..., axis='columns') with the extension array. Can you add some Joris? concat EA and EA, mix of EA and non-EA, aligned and no aligned. Or should I?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, thought we already had some, but indeed, apparently not. Will add some.

expected = pd.concat([df1['A'].astype('object'),
df2['A'].astype('object')])
self.assert_series_equal(result, expected)

def test_concat_columns(self, data, na_value):
df1 = pd.DataFrame({'A': data[:3]})
df2 = pd.DataFrame({'B': [1, 2, 3]})

expected = pd.DataFrame({'A': data[:3], 'B': [1, 2, 3]})
result = pd.concat([df1, df2], axis=1)
self.assert_frame_equal(result, expected)
result = pd.concat([df1['A'], df2['B']], axis=1)
self.assert_frame_equal(result, expected)

# non-aligned
df2 = pd.DataFrame({'B': [1, 2, 3]}, index=[1, 2, 3])
expected = pd.DataFrame({
'A': data._from_sequence(list(data[:3]) + [na_value]),
'B': [np.nan, 1, 2, 3]})
result = pd.concat([df1, df2], axis=1)
self.assert_frame_equal(result, expected)
result = pd.concat([df1['A'], df2['B']], axis=1)
self.assert_frame_equal(result, expected)

def test_align(self, data, na_value):
a = data[:3]
b = data[2:5]
Expand Down
4 changes: 4 additions & 0 deletions pandas/tests/extension/category/test_categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,10 @@ class TestConstructors(base.BaseConstructorsTests):


class TestReshaping(base.BaseReshapingTests):
@pytest.mark.skip(reason="Unobserved categories preseved in concat.")
def test_concat_columns(self, data, na_value):
pass

@pytest.mark.skip(reason="Unobserved categories preseved in concat.")
def test_align(self, data, na_value):
pass
Expand Down