Skip to content

DOC: Improve the docstring of pandas.DataFrame.append() #20267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 8, 2018

Conversation

Triple0
Copy link
Contributor

@Triple0 Triple0 commented Mar 11, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
##################### Docstring (pandas.DataFrame.append)  #####################
################################################################################

Append rows of `other` to the end of `caller`, returning a new object.

Columns not in other are added as new columns.

Parameters
----------
other : DataFrame or Series/dict-like object, or list of these
    The data to append.
ignore_index : boolean, default False
    If True, do not use the index labels.
verify_integrity : boolean, default False
    If True, raise ValueError on creating index with duplicates.

Returns
-------
appended : DataFrame

Notes
-----
If a list of dict/series is passed and the keys are all contained in
the DataFrame's index, the order of the columns in the resulting
DataFrame will be unchanged.

Iteratively appending rows to a DataFrame can be more computationally
intensive than a single concatenate. A better solution is to append
those rows to a list and then concatenate the list with the original
DataFrame all at once.

See also
--------
pandas.concat : General function to concatenate DataFrame, Series
    or Panel objects

Examples
--------

>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
>>> df
   A  B
0  1  2
1  3  4
>>> df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
>>> df.append(df2)
   A  B
0  1  2
1  3  4
0  5  6
1  7  8

With `ignore_index` set to True:

>>> df.append(df2, ignore_index=True)
   A  B
0  1  2
1  3  4
2  5  6
3  7  8

The following, while not recommended methods for generating DataFrames,
show two ways to generate a DataFrame from multiple data sources.

Less efficient:

>>> df = pd.DataFrame(columns=['A'])
>>> for i in range(5):
...     df = df.append({'A': i}, ignore_index=True)
>>> df
   A
0  0
1  1
2  2
3  3
4  4

More efficient:

>>> pd.concat([pd.DataFrame([i], columns=['A']) for i in range(5)],
...           ignore_index=True)
   A
0  0
1  1
2  2
3  3
4  4

################################################################################
################################## Validation ##################################
################################################################################

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

@@ -5054,7 +5054,9 @@ def infer(x):
def append(self, other, ignore_index=False, verify_integrity=False):
"""
Append rows of `other` to the end of this frame, returning a new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to fit on a single line - can you consolidate?

object. Columns not in this frame are added as new columns.
object.

Columns not in this frame are added as new columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "this frame" can you say "the caller"? Also while conceptually simple the usage of the word "Columns" mixes reference to the calling object and the object being appended, so differentiating between those would be preferable

object. Columns not in this frame are added as new columns.
Append rows of `other` to the end of `caller`, returning a new object.

Columns not in the caller are added as new columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better but for explicitness say "Columns in other not in caller are ..."

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Other comments:

  • In the Notes: " A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once." -> can you add .. using :func:`pandas.concat` at the end of that sentence ?

  • for the Parameter section: 'boolean' -> 'bool'

@@ -5053,8 +5053,9 @@ def infer(x):

def append(self, other, ignore_index=False, verify_integrity=False):
"""
Append rows of `other` to the end of this frame, returning a new
object. Columns not in this frame are added as new columns.
Append rows of `other` to the end of `caller`, returning a new object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'caller' is not needed to quote (it's not an argument).

Also given this is the docstring specific to DataFrame (not shared with Series), I think we can be more specific. So maybe "calling DataFrame" ? (or is it then too long ?)

object. Columns not in this frame are added as new columns.
Append rows of `other` to the end of `caller`, returning a new object.

Columns not in other are added as new columns.
Copy link
Member

@jorisvandenbossche jorisvandenbossche Mar 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the columns in other, but in "calling DataFrame" (or more explicit: Columns in `other` that are not in the calling DataFrame are added as new columns

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Mar 11, 2018
* left: use only keys from left frame, similar to a SQL left outer join;
preserve key order
preserve key order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need periods at the end of bullet points

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor edits. You can also clean up the Returns section where applicable to simply show that this returns a DataFrame

----------%s
right : DataFrame
----------
right : DataFrame or Series/dict-like object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Series and dict are distinct types I'd separate these out as DataFrame, Series or dict. Update for append as well

copy : boolean, default True
If False, do not copy data unnecessarily
If False, do not copy data unnecessarily.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally put built-ins in backticks, so `False` wherever applicable

@@ -199,7 +201,7 @@
dataset.
* "many_to_many" or "m:m": allowed, but does not result in checks.

.. versionadded:: 0.21.0
.. versionadded:: 0.21.0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't put a period at the end of the versionadded directive

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Triple0 Can you try to limit the changes to a single docstring?
(I don't know to what extens other people are working on the merge / assign docstrings. It's fine to keep the changes you have in those, but please focus for the rest on doing edits to the append docstring)

# -----------------------------------------------------------------------
# DataFrame class


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Triple0 Can you undo those unrelated white-space changes?

@@ -2689,14 +2699,15 @@ def insert(self, loc, column, value, allow_duplicates=False):
allow_duplicates=allow_duplicates)

def assign(self, **kwargs):
r"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a reason for this 'r' (the \ should not be interpreted)


Keyword argument order is maintained for Python 3.6 and later.
Keyword argument order is maintained for Python 3.6 and later.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation was needed for having it part of the versionchanged directive


See Also
--------
DataFrame.assign: For column(s)-on-DataFrame operations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the docstring of DataFrame.assign I think? So then not needed to refer to itself

@WillAyd WillAyd merged commit c697ca6 into pandas-dev:master Jul 8, 2018
@WillAyd
Copy link
Member

WillAyd commented Jul 8, 2018

Thanks @Triple0 !

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants