-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Allow concat to take string axis names #14389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Allow concat to take string axis names #14389
Conversation
Adding tests for concat string axis names Fixing pep8 related spacing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brandonmburroughs Nice PR! I added a few comments, mostly small nitpicks, apart from the axis problem with Series as first object
@@ -44,4 +44,5 @@ Bug Fixes | |||
|
|||
|
|||
- Bug in ``pd.concat`` where names of the ``keys`` were not propagated to the resulting ``MultiIndex`` (:issue:`14252`) | |||
- Bug in ``pd.concat`` where ``axis`` cannot take string parameters ``rows`` or ``columns (:issue:`14369`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing closing backticks after "columns"
Small other thing: I would also quote the parameters inside the backticks (like 'rows'
) to make clear it are string arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Added!
df2 = pd.DataFrame({'A': [0.3, 0.4]}, index=range(2)) | ||
expected_row = pd.DataFrame( | ||
{'A': [0.1, 0.2, 0.3, 0.4]}, index=[0, 1, 0, 1]) | ||
concatted_row = pd.concat([df1, df2], axis='rows') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably already tested elsewhere, but can you add for completeness in this test also the index=0
case and check output of that as well? (and the same for axis=1 below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? I looked through the concat
parameters (as well as some other tests) and didn't see index=0
. Do you mean joining dataframes without indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, sorry, I meant axis=0
in addition to axis='index'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes sense! Tests added.
@@ -1283,7 +1283,7 @@ def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, | |||
argument, unless it is passed, in which case the values will be | |||
selected (see below). Any None objects will be dropped silently unless | |||
they are all None in which case a ValueError will be raised | |||
axis : {0, 1, ...}, default 0 | |||
axis : {0, 1, 'rows', 'columns', ...}, default 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this something like {0/'index', 1/'columns'}
to make it clear which are equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense. Done.
@@ -1411,6 +1411,10 @@ def __init__(self, objs, axis=0, join='outer', join_axes=None, | |||
sample = objs[0] | |||
self.objs = objs | |||
|
|||
# Check for string axis parameter | |||
if isinstance(axis, str): | |||
axis = objs[0]._get_axis_number(axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not work in case of concatting different Serieses with axis='columns'
, as this axis is not defined for Series
Can you also add a test for such case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Does there already exist a utility function similar to _get_axis_number
that does not depend on the actual object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the test. Thanks for pointing that out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I wanted to say is that the code will not work in that case, but I think it should work (which means that we cannot use _get_axis_number
..)
I am not sure there is already some functionality that does this for you, so you will have to mimic what is in _get_axis_numbers
. You maybe can use DataFrame._AXIS_NUMBERS/NAMES
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc it's a class method so something like
NDFrame._get_axis_number(axis) should work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would have been indeed have been useful, but it is not a class method .. (the attributes like _AXIS_NAMES
are also only defined in the subclasses). And I don't think we have this logic already somewhere else (as in other methods you can just used the instance method)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, axis='columns'
should work since axis=1
works for a Series
. Unfortunately NDFrame._get_axis_number(axis)
isn't a class method.
I've added a function in pandas.tools.util
that emulates get_axis_number
(in fact, it's mostly the same). I wasn't sure where to put this, but this function uses DataFrame._AXIS_NUMBERS/NAMES
. This won't depend on specific objects and allows concat
of Series
to work with both columns
(new functionality) and 1
(works in 0.19.0). I've updated existing tests and added new ones for the util function.
Does this work? It allows axis='columns'
to be used for two Series
though it gets all of the axis names from DataFrame
.
Current coverage is 85.26% (diff: 100%)@@ master #14389 diff @@
==========================================
Files 140 140
Lines 50634 50636 +2
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43173 43174 +1
- Misses 7461 7462 +1
Partials 0 0
|
Thanks for the feedback! |
assert_frame_equal(concatted_index, expected_index) | ||
|
||
expected_0 = pd.DataFrame( | ||
{'A': [0.1, 0.2, 0.3, 0.4]}, index=[0, 1, 0, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redefining those expected frames is not needed. They should be the same for all different ways to specify one of the two axis.
This will also make the test code a bit shorter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, that totally makes sense. I got rid of the duplicates.
@@ -1283,7 +1283,7 @@ def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, | |||
argument, unless it is passed, in which case the values will be | |||
selected (see below). Any None objects will be dropped silently unless | |||
they are all None in which case a ValueError will be raised | |||
axis : {0, 1, ...}, default 0 | |||
axis : {0/'index'/'rows', 1/'columns'}, default 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave out the 'rows'. I think it is more there as a backcompat alias, in most other places we only mention 'index'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, sounds good. I've updated it.
By the way, I can squash all of these commits into one once we get everything finalized. |
Squashing is not really needed anymore, we squash when merging anyway. |
My most recent commit ( brandonmburroughs/pandas@24eb09d ) hasn't shown up. It seems like the appveyor tests never responded back to Github that it finished. Maybe that has something to do with it? Any ideas on how to fix this and get the latest commit to show up? |
Hmm, not sure why. We moved from pydata org to pandas-dev org, but I would not expect that to cause this issue. But it's maybe a reason the appveyor status is not correct (as it in fact already completed the build) |
Continued in #14416
git diff upstream/master | flake8 --diff
This uses
_get_axis_number
to convert a stringaxis
parameter to an integer. This will allow theconcat
method to use any other aliases given to dataframes axes in the future while not disrupting other uses ofaxis
in theconcat
method. If this pattern works well, it could be used for the other merge functions also.