Skip to content

API: Sort keys for DataFrame.assign #9818

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 7, 2015
Merged

API: Sort keys for DataFrame.assign #9818

merged 1 commit into from
Apr 7, 2015

Conversation

TomAugspurger
Copy link
Contributor

Closes #9777

Previously the order of new columns from .assign was arbitrary. For predictability, we'll sort before inserting.

We need to be comfortable with this change since we can't change behavior later with a keyword arg.

Technically we could allow referencing the a column defined within the same call to assign as long as they are sorted. e.g. df.assign(C=df.A + df.B, D=df.C**2) would work, but not df.assign(df.D=df.A +df.B, C=df.D**2). But I don't think we should.

cc @mrocklin

@shoyer
Copy link
Member

shoyer commented Apr 6, 2015

👍 This looks great to me!

@jreback jreback added this to the 0.16.1 milestone Apr 6, 2015
assert_frame_equal(result, expected)

def test_assign_alphabetical(self):
df = DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the issue number here as a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

Previously the order was arbitrary. For predicitability,
we'll sort before inserting.
@jreback
Copy link
Contributor

jreback commented Apr 6, 2015

agreed, you may want to add an example in the docs where you show using chained assignment (if for example you have an evaluation order requirement), .eg.

df = DataFrame({A = ..., B = ...})
df.assign(C=df.A+df.B).assign(D=df.C/df.A)

@TomAugspurger
Copy link
Contributor Author

I've got one like that in the warning box

        (df.assign(C = lambda x: x['A'] + x['B'])
           .assign(D = lambda x: x['A'] + x['C']))

@jorisvandenbossche
Copy link
Member

@TomAugspurger small not related note: you pushed your branch to upstream instead of master (no problem for now of course, but we should try to not do that for PRs I think)

@TomAugspurger
Copy link
Contributor Author

@jorisvandenbossche are you talking about f00d6bb ? I reorganize the whatsnew entry when merging that commit. Or I could be confused about what I did. But I think I pushed this branch to origin.

@jorisvandenbossche
Copy link
Member

no, I was talking about this pr, you apparantly pushed it to upstream (see at the top of the PR, I noticed it because did a fetch upstream). But as I said, no problem, it can always happen! Just be sure to delete it after the pr is merged

@TomAugspurger
Copy link
Contributor Author

@jorisvandenbossche I see what you mean now. Sorry about that. I'll clean it up when this gets merged.

Speaking of which, any objections to merging?

@jreback
Copy link
Contributor

jreback commented Apr 7, 2015

this looks ok, go ahead and merge

TomAugspurger pushed a commit that referenced this pull request Apr 7, 2015
API: Sort keys for DataFrame.assign
@TomAugspurger TomAugspurger merged commit 5d57aad into master Apr 7, 2015
@TomAugspurger TomAugspurger deleted the assign-sort branch April 7, 2015 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Predictable order for columns in assign through sorting
4 participants