-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: allow attrs to be propagated via pd.concat #42252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @xiki-tempula! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-12-01 08:51:44 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, @xiki-tempula!
Thank you for the PR!
Please see my comments below.
|
||
|
||
def test_concat_retain_attrs(): | ||
'''Only retain the attrs when the attrs are the same across all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please fix the docstrings?
They need to be in triple quotes and the summary must fit in one line.
If additional lines are necessary, then they can be added as well, separated by an empty line from the summary line.
@@ -784,3 +783,28 @@ def test_finalize_frame_series_name(): | |||
df = pd.DataFrame({"name": [1, 2]}) | |||
result = pd.Series([1, 2]).__finalize__(df) | |||
assert result.name is None | |||
|
|||
|
|||
def test_concat_retain_attrs(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that these tests have to be placed in pandas/tests/reshape/concat/test_concat.py
.
Is it correct?
@@ -762,7 +761,7 @@ def test_groupby_finalize(obj, method): | |||
[ | |||
lambda x: x.agg(["sum", "count"]), | |||
lambda x: x.transform(lambda y: y), | |||
lambda x: x.apply(lambda y: y), | |||
# lambda x: x.apply(lambda y: y), Fixed with #42252 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand what the changes in concat
behavior have to do with this test.
Can you please clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the apply
will use concat internally
Line 382 in be2efa9
concatenated = concat(results, keys=keys, axis=1, sort=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet, I am not sure I understood the reason.
Maybe some other reviewers will catch the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivanovmg Sorry for the confusion. The test checks if apply
will retain the attrs
and mark this as failing as the attrs
should get lost.
However, in this test, the apply
calls concat
, which will now preserve the attrs
.
Though the function apply
don't have a finaliser to ensure the pass of attrs
, all the operations done inside apply
will preserve the attrs
. Thus, the final result will have attrs
and thus, the test will no longer fail. So I have removed the mark as fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't comment out cases
@ivalaginja Thanks for the review. I have fixed the tests. |
''' Retain the attrs during concat | ||
|
||
Only retain the attrs when the attrs are the same across all dataframes.''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have mentioned that the quote char must be double quote, not single quote.
By the way, there is a detailed docstring guide https://pandas.pydata.org/pandas-docs/dev/development/contributing_docstring.html
Besides, it may be a good idea to remove docstrings and just post issue number # GH 41828
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivanovmg Thanks for the review. I have replaced the doc string with github issue.
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
@github-actions @jreback @TomAugspurger Would you mind have a review of this PR, please? Thank you. |
I won't have time this week. |
pandas/core/generic.py
Outdated
@@ -5455,6 +5455,16 @@ def __finalize__( | |||
object.__setattr__(self, name, getattr(other, name, None)) | |||
|
|||
if method == "concat": | |||
# Issue #41828, retain the attrs only if all NDFrame have the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is already handled above no? L5448
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Thanks for the review. No.
So during concatenation, the other
is a concatenator
, which is not an NDFrame
.
So it will not pass the if statement in L5447.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can remove the comment
can you merge master and will look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test for series concat as well
@@ -762,7 +761,7 @@ def test_groupby_finalize(obj, method): | |||
[ | |||
lambda x: x.agg(["sum", "count"]), | |||
lambda x: x.transform(lambda y: y), | |||
lambda x: x.apply(lambda y: y), | |||
# lambda x: x.apply(lambda y: y), Fixed with #42252 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't comment out cases
df2 = pd.DataFrame(data=d) | ||
df2.attrs = {1: 1} | ||
df = pd.concat([df1, df2]) | ||
assert df.attrs == {1: 1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
construct the appropraiate expected value then use tm.assert_frame_equal
, you can also directly compare the attrs for clarity
assert df.attrs == {1: 1} | ||
|
||
|
||
def test_concat_drop_attrs(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
@jreback Thanks very much for the review. I have updated the changes. |
can you merge master and will look |
@jreback Thanks for the help. I have merged the master. |
df2 = DataFrame(data=d) | ||
df2.attrs = {1: 1} | ||
df = concat([df1, df2]) | ||
assert df.attrs[1] == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use assert_equal (u can also compare attrs if u want)
though should prob add a flag to do this in the asset_ routines (i don't think we have this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Thank you for the review. I wonder if do you mean pandas._testing.assert_equal? It seems that pandas._testing.assert_equal cannot handle comparison between integers.
assert df.attrs[1] == 1 | ||
|
||
|
||
def test_concat_retain_attrs_series(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paramterize over series and frame and use assert_equal
@jreback Thanks for the review. I have used parameterized on the tests and added the test for the empty dataframe and series to the test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, pls add a whatsnew note and ping on green
thanks @xiki-tempula |
@TomAugspurger @lithomas1 I have made a fix to this issue and I wonder if do you mind have a review, please? I will add the whatsnew when this PR is about to be merged.