-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Pandas DataFrame.append
and Series.append
methods should get an inplace
kwag
#14796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am opposed to this for the exact reasons discussed in #2801: it would mislead users who might expect a performance benefit. |
Virtually all of pandas methods return a new object, the exception being the indexing operations. Using Closing, though if someone thinks that we should add a signature like
|
In the case of a namedtuple which contains a Series object, the inplace approach would be nice to have as a feature. Indeed, the nametuple objects are by design providing a way for writing a library and exposing it to a user allowing them to only modify it inplace. Consider the following dummy code: from collections import namedtuple
from pandas import Series
# ----- Library part ------
sample_schema = {
"name": str,
"some_info": str,
"content": Series
}
my_data_type = namedtuple("MyDataType", sample_schema.keys())
exposed_data = my_data_type(
name="Library data",
some_info="Modify the content as you want",
content=Series({"a": 0})
)
# ----- User code part ------
series_to_be_appended = Series({"b": 0})
# This is forbidden
exposed_data.content = exposed_data.content.append(series_to_be_appended)
# This would be allowed but is not implemented in Series
exposed_data.content.append(series_to_be_appended, inplace=True) The I would think inplace methods are nice to have on any mutable object in general. |
So the consensus among the maintainers is that it would be too confusing to have an I'd suggest removing the method from |
Agreeing here. Can live still with that. But not adding the possibility to specify |
Adding a usecase:
combined_dataframe = pd.DataFrame()
for dataframe in list_of_dataframes_read_from_csvs:
combined_dataframe.append(dataframe, inplace=True)
|
Problem description
Currently to append to a DataFrame, the following is the approach:
append
is a DataFrame or Series method, and as such should be able to modify the DataFrame or Series in place. If in place modification is not required, one may useconcat
or setinplace
kwag toFalse
. It will avoid an explicit assignment operation which is quite slow in Python, as we all know. Further, it will make the expected behavior similar to Python lists, and avoid questions such as these: 1, 2...Additionally at present,
append
is full subset ofconcat
, and as such it need not exist at all. Given the vast number of functions to append a DataFrame or Series to another in Pandas, it makes sense that each has it's merits and demerits. Gaining aninplace
kwag will clearly distinguishappend
fromconcat
, and simplify code.I understand that this issue was raised in #2801 a long time ago. However, the conversation in that deviated from the simplification offered by the
inplace
kwag to performance enhancement. I (and many like me) are looking for ease of use, and not so much at performance. Also, we expect the data to fit in memory (which is a limitation even with current version ofappend
).Expected Code
The text was updated successfully, but these errors were encountered: