BUG: Allow concat to take string axis names #14389

brandonmburroughs · 2016-10-10T21:58:56Z

Continued in #14416

closes concat with axis='rows' #14369
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

This uses _get_axis_number to convert a string axis parameter to an integer. This will allow the concat method to use any other aliases given to dataframes axes in the future while not disrupting other uses of axis in the concat method. If this pattern works well, it could be used for the other merge functions also.

Adding tests for concat string axis names Fixing pep8 related spacing

jorisvandenbossche

@brandonmburroughs Nice PR! I added a few comments, mostly small nitpicks, apart from the axis problem with Series as first object

jorisvandenbossche · 2016-10-10T22:17:20Z

doc/source/whatsnew/v0.19.1.txt

@@ -44,4 +44,5 @@ Bug Fixes


 - Bug in ``pd.concat`` where names of the ``keys`` were not propagated to the resulting ``MultiIndex`` (:issue:`14252`)
+- Bug in ``pd.concat`` where ``axis`` cannot take string parameters ``rows`` or ``columns (:issue:`14369`)


missing closing backticks after "columns"
Small other thing: I would also quote the parameters inside the backticks (like 'rows' ) to make clear it are string arguments

Good idea. Added!

jorisvandenbossche · 2016-10-10T22:18:38Z

pandas/tests/frame/test_combine_concat.py

+        df2 = pd.DataFrame({'A': [0.3, 0.4]}, index=range(2))
+        expected_row = pd.DataFrame(
+            {'A': [0.1, 0.2, 0.3, 0.4]}, index=[0, 1, 0, 1])
+        concatted_row = pd.concat([df1, df2], axis='rows')


Probably already tested elsewhere, but can you add for completeness in this test also the index=0 case and check output of that as well? (and the same for axis=1 below)

What does this mean? I looked through the concat parameters (as well as some other tests) and didn't see index=0. Do you mean joining dataframes without indices?

Whoops, sorry, I meant axis=0 in addition to axis='index'

Okay, that makes sense! Tests added.

jorisvandenbossche · 2016-10-10T22:24:21Z

pandas/tools/merge.py

@@ -1283,7 +1283,7 @@ def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
        argument, unless it is passed, in which case the values will be
        selected (see below). Any None objects will be dropped silently unless
        they are all None in which case a ValueError will be raised
-    axis : {0, 1, ...}, default 0
+    axis : {0, 1, 'rows', 'columns', ...}, default 0


Can you make this something like {0/'index', 1/'columns'} to make it clear which are equivalent?

Yes, that makes sense. Done.

jorisvandenbossche · 2016-10-10T22:32:12Z

pandas/tools/merge.py

@@ -1411,6 +1411,10 @@ def __init__(self, objs, axis=0, join='outer', join_axes=None,
            sample = objs[0]
        self.objs = objs

+        # Check for string axis parameter
+        if isinstance(axis, str):
+            axis = objs[0]._get_axis_number(axis)


This will not work in case of concatting different Serieses with axis='columns', as this axis is not defined for Series

Can you also add a test for such case?

@jreback Does there already exist a utility function similar to _get_axis_number that does not depend on the actual object?

I've added the test. Thanks for pointing that out!

What I wanted to say is that the code will not work in that case, but I think it should work (which means that we cannot use _get_axis_number ..)
I am not sure there is already some functionality that does this for you, so you will have to mimic what is in _get_axis_numbers. You maybe can use DataFrame._AXIS_NUMBERS/NAMES

iirc it's a class method so something like

NDFrame._get_axis_number(axis) should work

That would have been indeed have been useful, but it is not a class method .. (the attributes like _AXIS_NAMES are also only defined in the subclasses). And I don't think we have this logic already somewhere else (as in other methods you can just used the instance method)

Yeah, axis='columns' should work since axis=1 works for a Series. Unfortunately NDFrame._get_axis_number(axis) isn't a class method.

I've added a function in pandas.tools.util that emulates get_axis_number (in fact, it's mostly the same). I wasn't sure where to put this, but this function uses DataFrame._AXIS_NUMBERS/NAMES. This won't depend on specific objects and allows concat of Series to work with both columns (new functionality) and 1 (works in 0.19.0). I've updated existing tests and added new ones for the util function.

Does this work? It allows axis='columns' to be used for two Series though it gets all of the axis names from DataFrame.

codecov-io · 2016-10-10T22:36:21Z

Current coverage is 85.26% (diff: 100%)

Merging #14389 into master will decrease coverage by <.01%

@@             master     #14389   diff @@
==========================================
  Files           140        140          
  Lines         50634      50636     +2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43173      43174     +1   
- Misses         7461       7462     +1   
  Partials          0          0

Powered by Codecov. Last update d98e982...3f08b07

brandonmburroughs · 2016-10-11T00:18:49Z

Thanks for the feedback!

jorisvandenbossche · 2016-10-12T09:19:37Z

pandas/tests/frame/test_combine_concat.py

+        assert_frame_equal(concatted_index, expected_index)
+
+        expected_0 = pd.DataFrame(
+            {'A': [0.1, 0.2, 0.3, 0.4]}, index=[0, 1, 0, 1])


redefining those expected frames is not needed. They should be the same for all different ways to specify one of the two axis.
This will also make the test code a bit shorter

Oh yeah, that totally makes sense. I got rid of the duplicates.

jorisvandenbossche · 2016-10-12T09:21:02Z

pandas/tools/merge.py

@@ -1283,7 +1283,7 @@ def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
        argument, unless it is passed, in which case the values will be
        selected (see below). Any None objects will be dropped silently unless
        they are all None in which case a ValueError will be raised
-    axis : {0, 1, ...}, default 0
+    axis : {0/'index'/'rows', 1/'columns'}, default 0


I would leave out the 'rows'. I think it is more there as a backcompat alias, in most other places we only mention 'index'

Okay, sounds good. I've updated it.

brandonmburroughs · 2016-10-13T00:46:34Z

By the way, I can squash all of these commits into one once we get everything finalized.

jorisvandenbossche · 2016-10-13T07:02:50Z

By the way, I can squash all of these commits into one once we get everything finalized.

Squashing is not really needed anymore, we squash when merging anyway.

brandonmburroughs · 2016-10-13T13:14:48Z

My most recent commit ( brandonmburroughs/pandas@24eb09d ) hasn't shown up. It seems like the appveyor tests never responded back to Github that it finished. Maybe that has something to do with it? Any ideas on how to fix this and get the latest commit to show up?

jorisvandenbossche · 2016-10-13T13:40:46Z

Hmm, not sure why. We moved from pydata org to pandas-dev org, but I would not expect that to cause this issue. But it's maybe a reason the appveyor status is not correct (as it in fact already completed the build)

BUG: Allow concat to take string axis names

584ebd2

Adding tests for concat string axis names Fixing pep8 related spacing

jorisvandenbossche added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 10, 2016

jorisvandenbossche added this to the 0.19.1 milestone Oct 10, 2016

jorisvandenbossche reviewed Oct 10, 2016

View reviewed changes

brandonmburroughs added 2 commits October 10, 2016 19:58

Updating documentation

a6694b9

Adding ValueError test for concat Series axis 'columns'

cf3f998

Adding concat tests for axis 0, 1, and 'index'

3f08b07

jorisvandenbossche reviewed Oct 12, 2016

View reviewed changes

brandonmburroughs added 2 commits October 12, 2016 18:49

Removing duplicate expected dfs

fdd5260

Updating documentation for concat

64702fb

brandonmburroughs closed this Oct 13, 2016

brandonmburroughs mentioned this pull request Oct 13, 2016

Concat with axis rows #14416

Closed

4 tasks

jorisvandenbossche modified the milestones: No action, 0.19.1 Oct 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Allow concat to take string axis names #14389

BUG: Allow concat to take string axis names #14389

brandonmburroughs commented Oct 10, 2016 •

edited by jorisvandenbossche

Loading

jorisvandenbossche left a comment

jorisvandenbossche Oct 10, 2016

brandonmburroughs Oct 10, 2016

jorisvandenbossche Oct 10, 2016

brandonmburroughs Oct 10, 2016

jorisvandenbossche Oct 11, 2016

brandonmburroughs Oct 12, 2016

jorisvandenbossche Oct 10, 2016

brandonmburroughs Oct 10, 2016

jorisvandenbossche Oct 10, 2016

jorisvandenbossche Oct 10, 2016

brandonmburroughs Oct 11, 2016

jorisvandenbossche Oct 12, 2016

jreback Oct 12, 2016

jorisvandenbossche Oct 12, 2016

brandonmburroughs Oct 13, 2016

codecov-io commented Oct 10, 2016 •

edited

Loading

brandonmburroughs commented Oct 11, 2016

jorisvandenbossche Oct 12, 2016

brandonmburroughs Oct 12, 2016

jorisvandenbossche Oct 12, 2016

brandonmburroughs Oct 12, 2016

brandonmburroughs commented Oct 13, 2016

jorisvandenbossche commented Oct 13, 2016

brandonmburroughs commented Oct 13, 2016

jorisvandenbossche commented Oct 13, 2016

		@@ -44,4 +44,5 @@ Bug Fixes


		- Bug in ``pd.concat`` where names of the ``keys`` were not propagated to the resulting ``MultiIndex`` (:issue:`14252`)
		- Bug in ``pd.concat`` where ``axis`` cannot take string parameters ``rows`` or ``columns (:issue:`14369`)

BUG: Allow concat to take string axis names #14389

BUG: Allow concat to take string axis names #14389

Conversation

brandonmburroughs commented Oct 10, 2016 • edited by jorisvandenbossche Loading

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Oct 10, 2016 • edited Loading

Current coverage is 85.26% (diff: 100%)

brandonmburroughs commented Oct 11, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brandonmburroughs commented Oct 13, 2016

jorisvandenbossche commented Oct 13, 2016

brandonmburroughs commented Oct 13, 2016

jorisvandenbossche commented Oct 13, 2016

brandonmburroughs commented Oct 10, 2016 •

edited by jorisvandenbossche

Loading

codecov-io commented Oct 10, 2016 •

edited

Loading