Skip to content

BUG: concat doesn't preserve attrs (as alternative for append deprecated in 1.4) #45824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
MarcoGorelli opened this issue Feb 4, 2022 · 8 comments
Open
3 tasks done
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement metadata _metadata, .attrs

Comments

@MarcoGorelli
Copy link
Member

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas
df1 = pandas.DataFrame({"a": [1]})
df2 = pandas.DataFrame({"a": [2]})
df1.attrs["metadata-xy"] = 42

print(df1.append(df2).attrs)  # keeps the attrs of df1
print(pandas.concat([df1, df2]).attrs)  # no attrs in result

Issue Description

append preserves attrs, but concat doesn't

Originally reported here #35407 (comment)

Expected Behavior

df1.append(df2).attrs and pandas.concat([df1, df2]).attrs should probably match

Installed Versions

Replace this line with the output of pd.show_versions()

@MarcoGorelli MarcoGorelli added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 4, 2022
@rhshadrach
Copy link
Member

There is some discussion on what the behavior should be in #28283, but perhaps it is worthy of an issue on its own.

@asishm
Copy link
Contributor

asishm commented Feb 4, 2022

related #41828

@lithomas1 lithomas1 added API - Consistency Internal Consistency of API/Behavior Enhancement and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 5, 2022
@lithomas1
Copy link
Member

I think concat only keeps the attrs only when they match by design. Re-labeling as enhancement.
#41828 (comment)

@lithomas1 lithomas1 added the metadata _metadata, .attrs label Feb 5, 2022
@simonjayhawkins simonjayhawkins added this to the 1.4.1 milestone Feb 9, 2022
@simonjayhawkins simonjayhawkins changed the title BUG: concat doesn't preserve attrs BUG: concat doesn't preserve attrs (as alternative for append deprecated in 1.4) Feb 9, 2022
@simonjayhawkins simonjayhawkins modified the milestones: 1.4.1, 1.4.2 Feb 11, 2022
@simonjayhawkins
Copy link
Member

moving to 1.4.3

@simonjayhawkins simonjayhawkins modified the milestones: 1.4.2, 1.4.3 Apr 1, 2022
@jonastieppo
Copy link

Even append doest not work well for that. append is a method that only appends the rows of the dataframe passed as argument, in the dataframe caller.

Hence, if you swap the order you call the append in your example, you can see that attrs are not preserved. You can note in the following code:

import pandas as pd
df1 = pd.DataFrame({"a": [1]})
df2 = pd.DataFrame({"a": [2]})
df1.attrs["metadata-xy"] = 42

print("No attributes preserved: ",df2.append(df1).attrs)

About concat:
As I could check, concat binds two objects, but for some reason the attribute attrs is not binded. I think it would be a good implentation.

@simonjayhawkins
Copy link
Member

removing milestone. as now it is late in the 1.4.x series of releases any fixes probably not now suitable for backport.

@simonjayhawkins simonjayhawkins removed this from the 1.4.3 milestone Jun 22, 2022
@simonjayhawkins
Copy link
Member

contributions and PRs welcome.

@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Jun 22, 2022
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@behrenhoff
Copy link
Contributor

behrenhoff commented Nov 2, 2022

There is a related bug to concat & attrs:

import pandas

a = pandas.DataFrame({"a": [1]})
b = pandas.DataFrame({"b": [2]})
a.attrs["x"] = pandas.DataFrame()
b.attrs["x"] = pandas.DataFrame()

print(pandas.concat([a, b]).attrs)

The concat raises an Exception:

...
lib/python3.10/site-packages/pandas/core/generic.py in __nonzero__(self)
   1524     @final
   1525     def __nonzero__(self) -> NoReturn:
-> 1526         raise ValueError(
   1527             f"The truth value of a {type(self).__name__} is ambiguous. "
   1528             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

If removing either the a.attrs or the b.attrs, the concat works but the resulting attrs are empty. If the resulting attrs are empty anyway, I don't understand why concat seems to be messing with the attrs at all. At least the error message is misleading and it took me a while to figure out what was wrong.

The question is of course: what is the expected behavior? I am not sure. Maybe an attrs: Literal["copy_first", "copy_last", "update", "reverse_update"] = "update" keyword argument to concat? I guess most useful would be the "update" which would set the resulting attrs to {**df1.attrs, **df2.attrs, ..., **dfN.attrs} but I can also imagine a situation where you would want {**dfN.attrs, ..., **df2.attrs, **df1.attrs} or just the attrs from the first or last DF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement metadata _metadata, .attrs
Projects
None yet
Development

No branches or pull requests

8 participants