Skip to content

BUG: align with broadcast_axis, #13194 #13198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

pfrcks
Copy link
Contributor

@pfrcks pfrcks commented May 17, 2016

Closes #13194

  • Implemented the easy fix
  • Future work : To implement broadcast axis remaining functionality.
  • All tests in test_generic.py passed.

@jreback
Copy link
Contributor

jreback commented May 17, 2016

tests

we r on 0.18.2

@pfrcks
Copy link
Contributor Author

pfrcks commented May 17, 2016

Hi,

I seem to have made the documentation in the wrong file. I will correct it and push the changes.
Rest all is OK, right?
This is my first pull request to pandas, sorry for the noob errors.

Also, have I missed something?

@pfrcks
Copy link
Contributor Author

pfrcks commented May 17, 2016

@jreback I have corrected the documentation to the proper version file. Is this everything?
I have ran the tests also.
Am I missing something?

@jreback
Copy link
Contributor

jreback commented May 17, 2016

you need to add tests which fail before this patch and pass after

@kdebrab
Copy link
Contributor

kdebrab commented May 17, 2016

Apart from the indispensable unit test, I'd propose to move the lines:

if axis is not None:
    axis = self._get_axis_number(axis)

to right before the if broadcast_axis == 1 block, such that broadcasting will work (correctly) also in case align is called with axis='index' instead of axis=0.

Thanks!

@codecov-io
Copy link

codecov-io commented May 17, 2016

Current coverage is 84.11% (diff: 100%)

Merging #13198 into master will not change coverage

@@             master     #13198   diff @@
==========================================
  Files           138        138          
  Lines         50388      50388          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          42386      42386          
  Misses         8002       8002          
  Partials          0          0          

Powered by Codecov. Last update 4e4a7d9...1b5bbde

@pfrcks
Copy link
Contributor Author

pfrcks commented May 17, 2016

@jreback @kdebrab I've included the unit test and have also moved the location of the axis check.
Can you please take a look?

@jreback jreback changed the title BUG: GH13194 fixed BUG: align with broadcast_axis, #13194 May 17, 2016
@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 17, 2016
@@ -1874,6 +1874,52 @@ def test_pipe_panel(self):
with tm.assertRaises(ValueError):
result = wp.pipe((f, 'y'), x=1, y=1)

def test_align_joins(self):
df = DataFrame(np.array([[1., 2.], [3., 4.]]), columns=list('AB'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment here

rename to test_align_broadcast_axis

@pfrcks
Copy link
Contributor Author

pfrcks commented May 17, 2016

@jreback Have made the required changes.
Please comment if something's lacking.


# For 'inner' join
result = df.align(ts, join='inner', axis=0, broadcast_axis=1)
result1 = DataFrame(result[0])
Copy link
Member

@sinhrks sinhrks May 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed. Test returned obj (result[0] and [1]) as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sinhrks I'm not sure that I understand. Can you please elaborate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls remove DataFrame(result..) like below.

        result = df.align(...)
        expected1 = ...
        expected2 = ...
        assert_frame_equal(result[0], expected1)
        assert_frame_equal(result[1], expected2)

@sinhrks
Copy link
Member

sinhrks commented May 19, 2016

Can u add tests with:

  • alternately different Index to clheck left and right join.
  • Series.align(DataFrame)

@pfrcks
Copy link
Contributor Author

pfrcks commented May 19, 2016

@sinhrks Sure. It's late night here right now. I will do this in the morning.

@pfrcks
Copy link
Contributor Author

pfrcks commented May 20, 2016

@sinhrks I've made some changes. Please take a look and comment. This is what you meant right?

@@ -149,3 +149,4 @@ Bug Fixes
- Bug in ``NaT`` - ``Period`` raises ``AttributeError`` (:issue:`13071`)
- Bug in ``Period`` addition raises ``TypeError`` if ``Period`` is on right hand side (:issue:`13069`)
- Bug in ``pd.set_eng_float_format()`` that would prevent NaN's from formatting (:issue:`11981`)
- Fixed the bug in ``DataFrame.align()`` which was giving wrong output when supplied with the ``join`` argument. Earlier, upon supplying value of join argument as any of the four('outer', 'inner', 'left', 'right' ), align() was giving erraneous output. Align with broadcast_axis specified was using 'inner' join consistently irrespective of the value of 'join' provided when aligning dataframe and series on other axis. The problem was identified to be in pandas/core/generic.py and has been subsequently fixed. (:issue:`13194`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single line of comments pls. these are read by the users and don't need this kind of explanation.

@pfrcks
Copy link
Contributor Author

pfrcks commented May 20, 2016

@jreback Done. Kindly check and comment.

df = DataFrame(np.array([[1., 2.], [3., 4.]]), columns=list('AB'))
ts = Series([5., 6., 7.])

result = df.align(ts, join='right', axis=0, broadcast_axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try one of these with broadcast_axis='columns' and one with broadcast_axis='index' as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left, right = df.align(....) and same below

ts = Series([5., 6., 7.])

result = df.align(ts, join='right', axis=0, broadcast_axis=1)
expected1 = DataFrame(np.array([[1., 2.], [3., 4.],
Copy link
Contributor

@jreback jreback May 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call these expected_left and expected_right (and same below)

@pfrcks
Copy link
Contributor Author

pfrcks commented May 20, 2016

@jreback

    right = df.align(ts, join='right', axis=1, broadcast_axis='index')
    expected1_right = DataFrame([[pd.np.nan, pd.np.nan, pd.np.nan],
                                 [pd.np.nan, pd.np.nan, pd.np,nan]],
                                columns=[0, 1, 2])
    expected2_right = Series([5., 6., 7.])
    assert_frame_equal(right[0], expected1_right)
    assert_series_equal(right[1], expected2_right)  

While trying to run the above code, nosetests gives the error

AssertionError: Series.index are different

    Series.index classes are not equivalent
    [left]:  Index([u'A', u'B'], dtype='object')
    [right]: RangeIndex(start=0, stop=2, step=1)

However, upon running in the interpreter

right = df.align(ts, join='right', axis=1, broadcast_axis='index')
right[1]
0 5.0
1 6.0
2 7.0
dtype: float64
right[1].index
RangeIndex(start=0, stop=3, step=1)
expected2_right = Series([5., 6., 7.])
expected2_right.index
RangeIndex(start=0, stop=3, step=1)

We find that both indexes are same.
I am currently in pandas_dev environment.
Is this something assert_series_frame is doing?

self.assertTrue(result[1].equals(expected2))

# Series.align(DataFrame) tests, 'inner' join
result = ts.align(df, join='inner', axis=0, broadcast_axis=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can make all tests more comprehensive as "inner" and "outer" outputs the same, and "left" and "right" outputs inversed results.

result = df.align(series, ...)
self.assert_frame_equal(result[0], expected1)
self.assert_frame_equal(result[1], expected2)
result = series.align(df, ...)
self.assert_frame_equal(result[0], expected2)
self.assert_frame_equal(result[1], expected1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed @jreback suggestion for variable names:

left, right = df.align(series, ...)
self.assert_frame_equal(left, expected_left)
self.assert_frame_equal(right, expected_right)
left, right = series.align(df, ...)
self.assert_frame_equal(right, expected_left)
self.assert_frame_equal(left, expected_right)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Can you please look into the issue I have reported above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a single Ping is enough then you need to wait for comments

@jreback
Copy link
Contributor

jreback commented Sep 9, 2016

can you rebase / update?

@pfrcks
Copy link
Contributor Author

pfrcks commented Sep 11, 2016

@jreback I'm not sure I understand. I have rebased my_branch with master.

@jreback
Copy link
Contributor

jreback commented Sep 11, 2016

doesn't look like it
you need to push

@pfrcks
Copy link
Contributor Author

pfrcks commented Sep 11, 2016

@jreback
I have used the following commands:

git checkout my_branch
git rebase master
git push -u origin my_branch

@pfrcks pfrcks closed this Sep 11, 2016
@pfrcks pfrcks reopened this Sep 11, 2016
@pfrcks
Copy link
Contributor Author

pfrcks commented Sep 11, 2016

Sorry I closed my mistake

@jreback
Copy link
Contributor

jreback commented Sep 11, 2016

got rebase origin/master

@jreback
Copy link
Contributor

jreback commented Sep 11, 2016

got -> git

@jreback
Copy link
Contributor

jreback commented Sep 11, 2016

you are rebasing to a local master which is not updated

@pfrcks
Copy link
Contributor Author

pfrcks commented Sep 11, 2016

@jreback

I followed the steps:

  1. git remote add ups https://github.com/pydata/pandas.git
  2. git fetch ups
  3. git checkout master
  4. git rebase ups/master

First, rewinding head to replay your work on top of it...
Applying: BUG: GH13194 fixed
Applying: DOC : Change in release notes
Using index info to reconstruct a base tree...
A doc/source/whatsnew/v0.18.2.txt
:19: trailing whitespace.

  • Fixed the bug in DataFrame.align() which was giving wrong output when supplied with the join argument. (:issue:13194)
    warning: 1 line adds whitespace errors.
    Falling back to patching base and 3-way merge...
    CONFLICT (modify/delete): doc/source/whatsnew/v0.18.2.txt deleted in HEAD and modified in DOC : Change in release notes. Version DOC : Change in release notes of doc/source/whatsnew/v0.18.2.txt left in tree.
    Failed to merge in the changes.
    Patch failed at 0002 DOC : Change in release notes
    The copy of the patch that failed is found in:
    /home/pfrcks/issues/pandas-pfrcks/.git/rebase-apply/patch
    When you have resolved this problem, run "git rebase --continue".
    If you prefer to skip this patch, run "git rebase --skip" instead.
    To check out the original branch and stop rebasing, run "git rebase --abort".

I removed the whitespace error and then synced the changes but still I am getting the same issue.

@jreback
Copy link
Contributor

jreback commented Nov 22, 2016

closing, but if you want to update / rebase. pls reopen

@jreback jreback closed this Nov 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

align with broadcast_axis specified always uses inner join when aligning dataframe and series on other axis.
5 participants