Skip to content

ENH: allow attrs to be propagated via pd.concat #42252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Dec 5, 2021

Conversation

xiki-tempula
Copy link
Contributor

@xiki-tempula xiki-tempula commented Jun 26, 2021

@TomAugspurger @lithomas1 I have made a fix to this issue and I wonder if do you mind have a review, please? I will add the whatsnew when this PR is about to be merged.

@xiki-tempula xiki-tempula marked this pull request as draft June 26, 2021 12:19
@pep8speaks
Copy link

pep8speaks commented Jun 26, 2021

Hello @xiki-tempula! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-12-01 08:51:44 UTC

@xiki-tempula xiki-tempula marked this pull request as ready for review June 26, 2021 14:14
Copy link
Member

@ivanovmg ivanovmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, @xiki-tempula!
Thank you for the PR!
Please see my comments below.



def test_concat_retain_attrs():
'''Only retain the attrs when the attrs are the same across all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please fix the docstrings?
They need to be in triple quotes and the summary must fit in one line.
If additional lines are necessary, then they can be added as well, separated by an empty line from the summary line.

@@ -784,3 +783,28 @@ def test_finalize_frame_series_name():
df = pd.DataFrame({"name": [1, 2]})
result = pd.Series([1, 2]).__finalize__(df)
assert result.name is None


def test_concat_retain_attrs():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that these tests have to be placed in pandas/tests/reshape/concat/test_concat.py.
Is it correct?

@@ -762,7 +761,7 @@ def test_groupby_finalize(obj, method):
[
lambda x: x.agg(["sum", "count"]),
lambda x: x.transform(lambda y: y),
lambda x: x.apply(lambda y: y),
# lambda x: x.apply(lambda y: y), Fixed with #42252
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand what the changes in concat behavior have to do with this test.
Can you please clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the apply will use concat internally

concatenated = concat(results, keys=keys, axis=1, sort=False)
.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet, I am not sure I understood the reason.
Maybe some other reviewers will catch the idea.

Copy link
Contributor Author

@xiki-tempula xiki-tempula Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanovmg Sorry for the confusion. The test checks if apply will retain the attrs and mark this as failing as the attrs should get lost.

However, in this test, the apply calls concat, which will now preserve the attrs.
Though the function apply don't have a finaliser to ensure the pass of attrs, all the operations done inside apply will preserve the attrs. Thus, the final result will have attrs and thus, the test will no longer fail. So I have removed the mark as fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't comment out cases

@xiki-tempula
Copy link
Contributor Author

@ivalaginja Thanks for the review. I have fixed the tests.

Comment on lines 659 to 661
''' Retain the attrs during concat

Only retain the attrs when the attrs are the same across all dataframes.'''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have mentioned that the quote char must be double quote, not single quote.
By the way, there is a detailed docstring guide https://pandas.pydata.org/pandas-docs/dev/development/contributing_docstring.html

Besides, it may be a good idea to remove docstrings and just post issue number # GH 41828.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanovmg Thanks for the review. I have replaced the doc string with github issue.

@lithomas1 lithomas1 added metadata _metadata, .attrs Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 30, 2021
@lithomas1 lithomas1 requested a review from TomAugspurger June 30, 2021 15:08
@lithomas1 lithomas1 requested a review from jreback July 9, 2021 15:18
@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Aug 9, 2021
@xiki-tempula
Copy link
Contributor Author

xiki-tempula commented Aug 9, 2021

@github-actions @jreback @TomAugspurger Would you mind have a review of this PR, please? Thank you.

@TomAugspurger
Copy link
Contributor

I won't have time this week.

@@ -5455,6 +5455,16 @@ def __finalize__(
object.__setattr__(self, name, getattr(other, name, None))

if method == "concat":
# Issue #41828, retain the attrs only if all NDFrame have the same
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already handled above no? L5448

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Thanks for the review. No.
So during concatenation, the other is a concatenator, which is not an NDFrame.
So it will not pass the if statement in L5447.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove the comment

@jreback
Copy link
Contributor

jreback commented Oct 4, 2021

can you merge master and will look

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test for series concat as well

@@ -762,7 +761,7 @@ def test_groupby_finalize(obj, method):
[
lambda x: x.agg(["sum", "count"]),
lambda x: x.transform(lambda y: y),
lambda x: x.apply(lambda y: y),
# lambda x: x.apply(lambda y: y), Fixed with #42252
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't comment out cases

df2 = pd.DataFrame(data=d)
df2.attrs = {1: 1}
df = pd.concat([df1, df2])
assert df.attrs == {1: 1}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct the appropraiate expected value then use tm.assert_frame_equal, you can also directly compare the attrs for clarity

assert df.attrs == {1: 1}


def test_concat_drop_attrs():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@xiki-tempula
Copy link
Contributor Author

@jreback Thanks very much for the review. I have updated the changes.

@xiki-tempula xiki-tempula requested a review from jreback October 12, 2021 10:16
@jreback
Copy link
Contributor

jreback commented Nov 28, 2021

can you merge master and will look

@xiki-tempula
Copy link
Contributor Author

@jreback Thanks for the help. I have merged the master.

df2 = DataFrame(data=d)
df2.attrs = {1: 1}
df = concat([df1, df2])
assert df.attrs[1] == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use assert_equal (u can also compare attrs if u want)

though should prob add a flag to do this in the asset_ routines (i don't think we have this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Thank you for the review. I wonder if do you mean pandas._testing.assert_equal? It seems that pandas._testing.assert_equal cannot handle comparison between integers.

assert df.attrs[1] == 1


def test_concat_retain_attrs_series():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paramterize over series and frame and use assert_equal

@xiki-tempula
Copy link
Contributor Author

@jreback Thanks for the review. I have used parameterized on the tests and added the test for the empty dataframe and series to the test.
I'm not quite sure of how to use the assert_equal to test integers so I have left it there.

@jreback jreback added this to the 1.4 milestone Dec 1, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, pls add a whatsnew note and ping on green

@xiki-tempula
Copy link
Contributor Author

xiki-tempula commented Dec 1, 2021

@jreback Thanks, I have the what's new updated. I'm not quite sure of "ping on green" means, if you are talking about the tests, they are 1 cancelled and 31 successful checks for the last commit f49f027.
There is one test failed due to timeout.

@jreback jreback merged commit 1cbe011 into pandas-dev:master Dec 5, 2021
@jreback
Copy link
Contributor

jreback commented Dec 5, 2021

thanks @xiki-tempula

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metadata _metadata, .attrs Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: retain attrs when concat dataframes
6 participants