Skip to content

DOC: Data Editing Samples/Guide #35378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
achapkowski opened this issue Jul 22, 2020 · 4 comments
Open

DOC: Data Editing Samples/Guide #35378

achapkowski opened this issue Jul 22, 2020 · 4 comments
Labels

Comments

@achapkowski
Copy link

Location of the documentation

N/A

Documentation problem

There seems to be general community confusion over when the use iloc, loc, at, iat or the other methods that allow you to update rows and columns. What is the best way to add a single new row? Multiple rows pd.concat something else?

It would be nice if a guide or doc was created to point to the best practices for data editing on a DataFrame.

@achapkowski achapkowski added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 22, 2020
@rhshadrach
Copy link
Member

@achapkowski
Copy link
Author

These two sets of docs are relevant to the case, but if I need to do this:

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3'],
                     'C': ['C0', 'C1', 'C2', 'C3'],
                     'D': ['D0', 'D1', 'D2', 'D3']},
                    index=[0, 1, 2, 3])
# add this single row to df1
row = ['A4', 'B4', 'C4', 'D4']

What is the most efficient way to do this?

Do I create a DataFrame for the single row and concat it? Do I use append? One of the loc/iloc methods?

@rhshadrach
Copy link
Member

I think adding a section in the documentation discussing this operation makes sense; I couldn't find one on it and hopefully didn't miss anything. Two suggestions:

  • There has been much support for deprecating append (Deprecate Series / DataFrame.append #35407).
  • Regardless of the method used, it is more efficient to gather data up front (e.g. as a dictionary/list of dictionaries) and then create a single DataFrame rather than having many appends/concats/locs.

@rhshadrach rhshadrach added this to the Contributions Welcome milestone Aug 1, 2020
@rhshadrach rhshadrach removed the Needs Triage Issue that has not been reviewed by a pandas team member label Aug 1, 2020
@achapkowski
Copy link
Author

@rhshadrach thanks for pointing out the deprecation! There are so many issues, it's hard to keep track of them all.

I know the best practice is to have all your data in hand, but sometimes your dataframe needs a few more rows of data and especially for large datasets (millions of rows) recreating the whole dataframe is overkill.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants