Skip to content

BUG: Allow concat to take string axis names #14389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.19.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,5 @@ Bug Fixes


- Bug in ``pd.concat`` where names of the ``keys`` were not propagated to the resulting ``MultiIndex`` (:issue:`14252`)
- Bug in ``pd.concat`` where ``axis`` cannot take string parameters ``'rows'`` or ``'columns'`` (:issue:`14369`)
- Bug in ``MultiIndex.set_levels`` where illegal level values were still set after raising an error (:issue:`13754`)
36 changes: 36 additions & 0 deletions pandas/tests/frame/test_combine_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,42 @@ def test_concat_named_keys(self):
names=[None, None]))
assert_frame_equal(concatted_unnamed, expected_unnamed)

def test_concat_axis_parameter(self):
# GH 14369
df1 = pd.DataFrame({'A': [0.1, 0.2]}, index=range(2))
df2 = pd.DataFrame({'A': [0.3, 0.4]}, index=range(2))

expected_index = pd.DataFrame(
{'A': [0.1, 0.2, 0.3, 0.4]}, index=[0, 1, 0, 1])
concatted_index = pd.concat([df1, df2], axis='index')
assert_frame_equal(concatted_index, expected_index)

concatted_row = pd.concat([df1, df2], axis='rows')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably already tested elsewhere, but can you add for completeness in this test also the index=0 case and check output of that as well? (and the same for axis=1 below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? I looked through the concat parameters (as well as some other tests) and didn't see index=0. Do you mean joining dataframes without indices?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, sorry, I meant axis=0 in addition to axis='index'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense! Tests added.

assert_frame_equal(concatted_row, expected_index)

concatted_0 = pd.concat([df1, df2], axis=0)
assert_frame_equal(concatted_0, expected_index)

expected_columns = pd.DataFrame(
[[0.1, 0.3], [0.2, 0.4]], index=[0, 1], columns=['A', 'A'])
concatted_columns = pd.concat([df1, df2], axis='columns')
assert_frame_equal(concatted_columns, expected_columns)

concatted_1 = pd.concat([df1, df2], axis=1)
assert_frame_equal(concatted_1, expected_columns)

series1 = pd.Series([0.1, 0.2])
series2 = pd.Series([0.3, 0.4])

expected_row_series = pd.Series(
[0.1, 0.2, 0.3, 0.4], index=[0, 1, 0, 1])
concatted_row_series = pd.concat([series1, series2], axis='rows')
assert_series_equal(concatted_row_series, expected_row_series)

# A Series has no 'columns' axis
with assertRaisesRegexp(ValueError, 'No axis named columns'):
pd.concat([series1, series2], axis='columns')


class TestDataFrameCombineFirst(tm.TestCase, TestData):

Expand Down
6 changes: 5 additions & 1 deletion pandas/tools/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -1283,7 +1283,7 @@ def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
argument, unless it is passed, in which case the values will be
selected (see below). Any None objects will be dropped silently unless
they are all None in which case a ValueError will be raised
axis : {0, 1, ...}, default 0
axis : {0/'index', 1/'columns'}, default 0
The axis to concatenate along
join : {'inner', 'outer'}, default 'outer'
How to handle indexes on other axis(es)
Expand Down Expand Up @@ -1411,6 +1411,10 @@ def __init__(self, objs, axis=0, join='outer', join_axes=None,
sample = objs[0]
self.objs = objs

# Check for string axis parameter
if isinstance(axis, str):
axis = objs[0]._get_axis_number(axis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work in case of concatting different Serieses with axis='columns', as this axis is not defined for Series

Can you also add a test for such case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Does there already exist a utility function similar to _get_axis_number that does not depend on the actual object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the test. Thanks for pointing that out!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I wanted to say is that the code will not work in that case, but I think it should work (which means that we cannot use _get_axis_number ..)
I am not sure there is already some functionality that does this for you, so you will have to mimic what is in _get_axis_numbers. You maybe can use DataFrame._AXIS_NUMBERS/NAMES

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc it's a class method so something like

NDFrame._get_axis_number(axis) should work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would have been indeed have been useful, but it is not a class method .. (the attributes like _AXIS_NAMES are also only defined in the subclasses). And I don't think we have this logic already somewhere else (as in other methods you can just used the instance method)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, axis='columns' should work since axis=1 works for a Series. Unfortunately NDFrame._get_axis_number(axis) isn't a class method.

I've added a function in pandas.tools.util that emulates get_axis_number (in fact, it's mostly the same). I wasn't sure where to put this, but this function uses DataFrame._AXIS_NUMBERS/NAMES. This won't depend on specific objects and allows concat of Series to work with both columns (new functionality) and 1 (works in 0.19.0). I've updated existing tests and added new ones for the util function.

Does this work? It allows axis='columns' to be used for two Series though it gets all of the axis names from DataFrame.


# Need to flip BlockManager axis in the DataFrame special case
self._is_frame = isinstance(sample, DataFrame)
if self._is_frame:
Expand Down