Skip to content

BUG: Allow concat to take string axis names #14389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

brandonmburroughs
Copy link
Contributor

@brandonmburroughs brandonmburroughs commented Oct 10, 2016

Continued in #14416


This uses _get_axis_number to convert a string axis parameter to an integer. This will allow the concat method to use any other aliases given to dataframes axes in the future while not disrupting other uses of axis in the concat method. If this pattern works well, it could be used for the other merge functions also.

Adding tests for concat string axis names

Fixing pep8 related spacing
@jorisvandenbossche jorisvandenbossche added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 10, 2016
@jorisvandenbossche jorisvandenbossche added this to the 0.19.1 milestone Oct 10, 2016
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brandonmburroughs Nice PR! I added a few comments, mostly small nitpicks, apart from the axis problem with Series as first object

@@ -44,4 +44,5 @@ Bug Fixes


- Bug in ``pd.concat`` where names of the ``keys`` were not propagated to the resulting ``MultiIndex`` (:issue:`14252`)
- Bug in ``pd.concat`` where ``axis`` cannot take string parameters ``rows`` or ``columns (:issue:`14369`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing closing backticks after "columns"
Small other thing: I would also quote the parameters inside the backticks (like 'rows' ) to make clear it are string arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Added!

df2 = pd.DataFrame({'A': [0.3, 0.4]}, index=range(2))
expected_row = pd.DataFrame(
{'A': [0.1, 0.2, 0.3, 0.4]}, index=[0, 1, 0, 1])
concatted_row = pd.concat([df1, df2], axis='rows')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably already tested elsewhere, but can you add for completeness in this test also the index=0 case and check output of that as well? (and the same for axis=1 below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? I looked through the concat parameters (as well as some other tests) and didn't see index=0. Do you mean joining dataframes without indices?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, sorry, I meant axis=0 in addition to axis='index'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense! Tests added.

@@ -1283,7 +1283,7 @@ def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
argument, unless it is passed, in which case the values will be
selected (see below). Any None objects will be dropped silently unless
they are all None in which case a ValueError will be raised
axis : {0, 1, ...}, default 0
axis : {0, 1, 'rows', 'columns', ...}, default 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this something like {0/'index', 1/'columns'} to make it clear which are equivalent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. Done.

@@ -1411,6 +1411,10 @@ def __init__(self, objs, axis=0, join='outer', join_axes=None,
sample = objs[0]
self.objs = objs

# Check for string axis parameter
if isinstance(axis, str):
axis = objs[0]._get_axis_number(axis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work in case of concatting different Serieses with axis='columns', as this axis is not defined for Series

Can you also add a test for such case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Does there already exist a utility function similar to _get_axis_number that does not depend on the actual object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the test. Thanks for pointing that out!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I wanted to say is that the code will not work in that case, but I think it should work (which means that we cannot use _get_axis_number ..)
I am not sure there is already some functionality that does this for you, so you will have to mimic what is in _get_axis_numbers. You maybe can use DataFrame._AXIS_NUMBERS/NAMES

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc it's a class method so something like

NDFrame._get_axis_number(axis) should work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would have been indeed have been useful, but it is not a class method .. (the attributes like _AXIS_NAMES are also only defined in the subclasses). And I don't think we have this logic already somewhere else (as in other methods you can just used the instance method)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, axis='columns' should work since axis=1 works for a Series. Unfortunately NDFrame._get_axis_number(axis) isn't a class method.

I've added a function in pandas.tools.util that emulates get_axis_number (in fact, it's mostly the same). I wasn't sure where to put this, but this function uses DataFrame._AXIS_NUMBERS/NAMES. This won't depend on specific objects and allows concat of Series to work with both columns (new functionality) and 1 (works in 0.19.0). I've updated existing tests and added new ones for the util function.

Does this work? It allows axis='columns' to be used for two Series though it gets all of the axis names from DataFrame.

@codecov-io
Copy link

codecov-io commented Oct 10, 2016

Current coverage is 85.26% (diff: 100%)

Merging #14389 into master will decrease coverage by <.01%

@@             master     #14389   diff @@
==========================================
  Files           140        140          
  Lines         50634      50636     +2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43173      43174     +1   
- Misses         7461       7462     +1   
  Partials          0          0          

Powered by Codecov. Last update d98e982...3f08b07

@brandonmburroughs
Copy link
Contributor Author

Thanks for the feedback!

assert_frame_equal(concatted_index, expected_index)

expected_0 = pd.DataFrame(
{'A': [0.1, 0.2, 0.3, 0.4]}, index=[0, 1, 0, 1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redefining those expected frames is not needed. They should be the same for all different ways to specify one of the two axis.
This will also make the test code a bit shorter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that totally makes sense. I got rid of the duplicates.

@@ -1283,7 +1283,7 @@ def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
argument, unless it is passed, in which case the values will be
selected (see below). Any None objects will be dropped silently unless
they are all None in which case a ValueError will be raised
axis : {0, 1, ...}, default 0
axis : {0/'index'/'rows', 1/'columns'}, default 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave out the 'rows'. I think it is more there as a backcompat alias, in most other places we only mention 'index'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sounds good. I've updated it.

@brandonmburroughs
Copy link
Contributor Author

By the way, I can squash all of these commits into one once we get everything finalized.

@jorisvandenbossche
Copy link
Member

By the way, I can squash all of these commits into one once we get everything finalized.

Squashing is not really needed anymore, we squash when merging anyway.

@brandonmburroughs
Copy link
Contributor Author

My most recent commit ( brandonmburroughs/pandas@24eb09d ) hasn't shown up. It seems like the appveyor tests never responded back to Github that it finished. Maybe that has something to do with it? Any ideas on how to fix this and get the latest commit to show up?

@jorisvandenbossche
Copy link
Member

Hmm, not sure why. We moved from pydata org to pandas-dev org, but I would not expect that to cause this issue. But it's maybe a reason the appveyor status is not correct (as it in fact already completed the build)

@brandonmburroughs brandonmburroughs mentioned this pull request Oct 13, 2016
4 tasks
@jorisvandenbossche jorisvandenbossche modified the milestones: No action, 0.19.1 Oct 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

concat with axis='rows'
4 participants