DOC: Improve the docstring of pandas.DataFrame.append() #20267

Triple0 · 2018-03-11T01:14:47Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
##################### Docstring (pandas.DataFrame.append)  #####################
################################################################################

Append rows of `other` to the end of `caller`, returning a new object.

Columns not in other are added as new columns.

Parameters
----------
other : DataFrame or Series/dict-like object, or list of these
    The data to append.
ignore_index : boolean, default False
    If True, do not use the index labels.
verify_integrity : boolean, default False
    If True, raise ValueError on creating index with duplicates.

Returns
-------
appended : DataFrame

Notes
-----
If a list of dict/series is passed and the keys are all contained in
the DataFrame's index, the order of the columns in the resulting
DataFrame will be unchanged.

Iteratively appending rows to a DataFrame can be more computationally
intensive than a single concatenate. A better solution is to append
those rows to a list and then concatenate the list with the original
DataFrame all at once.

See also
--------
pandas.concat : General function to concatenate DataFrame, Series
    or Panel objects

Examples
--------

>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
>>> df
   A  B
0  1  2
1  3  4
>>> df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
>>> df.append(df2)
   A  B
0  1  2
1  3  4
0  5  6
1  7  8

With `ignore_index` set to True:

>>> df.append(df2, ignore_index=True)
   A  B
0  1  2
1  3  4
2  5  6
3  7  8

The following, while not recommended methods for generating DataFrames,
show two ways to generate a DataFrame from multiple data sources.

Less efficient:

>>> df = pd.DataFrame(columns=['A'])
>>> for i in range(5):
...     df = df.append({'A': i}, ignore_index=True)
>>> df
   A
0  0
1  1
2  2
3  3
4  4

More efficient:

>>> pd.concat([pd.DataFrame([i], columns=['A']) for i in range(5)],
...           ignore_index=True)
   A
0  0
1  1
2  2
3  3
4  4

################################################################################
################################## Validation ##################################
################################################################################

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

WillAyd · 2018-03-11T01:30:37Z

pandas/core/frame.py

@@ -5054,7 +5054,9 @@ def infer(x):
    def append(self, other, ignore_index=False, verify_integrity=False):
        """
        Append rows of `other` to the end of this frame, returning a new


This needs to fit on a single line - can you consolidate?

WillAyd · 2018-03-11T01:32:32Z

pandas/core/frame.py

-        object. Columns not in this frame are added as new columns.
+        object.
+
+        Columns not in this frame are added as new columns.


Instead of "this frame" can you say "the caller"? Also while conceptually simple the usage of the word "Columns" mixes reference to the calling object and the object being appended, so differentiating between those would be preferable

WillAyd · 2018-03-11T02:21:12Z

pandas/core/frame.py

-        object. Columns not in this frame are added as new columns.
+        Append rows of `other` to the end of `caller`, returning a new object.
+
+        Columns not in the caller are added as new columns.


Better but for explicitness say "Columns in other not in caller are ..."

jorisvandenbossche

Thanks for the PR!

Other comments:

In the Notes: " A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once." -> can you add .. using :func:`pandas.concat` at the end of that sentence ?
for the Parameter section: 'boolean' -> 'bool'

jorisvandenbossche · 2018-03-11T11:53:56Z

pandas/core/frame.py

@@ -5053,8 +5053,9 @@ def infer(x):

    def append(self, other, ignore_index=False, verify_integrity=False):
        """
-        Append rows of `other` to the end of this frame, returning a new
-        object. Columns not in this frame are added as new columns.
+        Append rows of `other` to the end of `caller`, returning a new object.


'caller' is not needed to quote (it's not an argument).

Also given this is the docstring specific to DataFrame (not shared with Series), I think we can be more specific. So maybe "calling DataFrame" ? (or is it then too long ?)

jorisvandenbossche · 2018-03-11T11:55:09Z

pandas/core/frame.py

-        object. Columns not in this frame are added as new columns.
+        Append rows of `other` to the end of `caller`, returning a new object.
+
+        Columns not in other are added as new columns.


It's not the columns in other, but in "calling DataFrame" (or more explicit: Columns in `other` that are not in the calling DataFrame are added as new columns

WillAyd · 2018-03-11T17:02:25Z

pandas/core/frame.py

    * left: use only keys from left frame, similar to a SQL left outer join;
-      preserve key order
+      preserve key order.


Shouldn't need periods at the end of bullet points

WillAyd

Couple minor edits. You can also clean up the Returns section where applicable to simply show that this returns a DataFrame

WillAyd · 2018-03-11T20:37:34Z

pandas/core/frame.py

----------%s
-right : DataFrame
+----------
+right : DataFrame or Series/dict-like object


Since Series and dict are distinct types I'd separate these out as DataFrame, Series or dict. Update for append as well

WillAyd · 2018-03-11T20:40:51Z

pandas/core/frame.py

 copy : boolean, default True
-    If False, do not copy data unnecessarily
+    If False, do not copy data unnecessarily.


Generally put built-ins in backticks, so `False` wherever applicable

WillAyd · 2018-03-11T20:41:35Z

pandas/core/frame.py

@@ -199,7 +201,7 @@
      dataset.
    * "many_to_many" or "m:m": allowed, but does not result in checks.

-    .. versionadded:: 0.21.0
+    .. versionadded:: 0.21.0.


Don't put a period at the end of the versionadded directive

jorisvandenbossche

@Triple0 Can you try to limit the changes to a single docstring?
(I don't know to what extens other people are working on the merge / assign docstrings. It's fine to keep the changes you have in those, but please focus for the rest on doing edits to the append docstring)

jorisvandenbossche · 2018-03-12T15:56:45Z

pandas/core/frame.py

 # -----------------------------------------------------------------------
 # DataFrame class

-


@Triple0 Can you undo those unrelated white-space changes?

jorisvandenbossche · 2018-03-12T15:58:41Z

pandas/core/frame.py

@@ -2689,14 +2699,15 @@ def insert(self, loc, column, value, allow_duplicates=False):
                          allow_duplicates=allow_duplicates)

    def assign(self, **kwargs):
-        r"""


There is a reason for this 'r' (the \ should not be interpreted)

jorisvandenbossche · 2018-03-12T16:02:14Z

pandas/core/frame.py


-            Keyword argument order is maintained for Python 3.6 and later.
+        Keyword argument order is maintained for Python 3.6 and later.


The indentation was needed for having it part of the versionchanged directive

jorisvandenbossche · 2018-03-12T16:02:47Z

pandas/core/frame.py

+
+        See Also
+        --------
+        DataFrame.assign: For column(s)-on-DataFrame operations


This is the docstring of DataFrame.assign I think? So then not needed to refer to itself

WillAyd · 2018-07-08T16:28:35Z

Thanks @Triple0 !

…0267)

Triple0 added 2 commits March 11, 2018 01:06

DOC: Improve the docstring of pandas.DataFrame.append()

4377e47

DOC: Improve the docstring of pandas.DataFrame.append()

01af9d1

WillAyd requested changes Mar 11, 2018

View reviewed changes

WillAyd reviewed Mar 11, 2018

View reviewed changes

DOC: Improve the docstring of pandas.DataFrame.append()

1ffcece

jorisvandenbossche added Docs and removed Docs labels Mar 11, 2018

DOC: Improve the docstring of pandas.DataFrame.assign()

6c3946a

jorisvandenbossche reviewed Mar 11, 2018

View reviewed changes

jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Mar 11, 2018

DOC: Improve the docstring of pandas.DataFrame.merge()

083afcc

WillAyd reviewed Mar 11, 2018

View reviewed changes

DOC: Improve the docstring of pandas.DataFrame.merge()

389d7ee

WillAyd requested changes Mar 11, 2018

View reviewed changes

jorisvandenbossche reviewed Mar 12, 2018

View reviewed changes

WillAyd added 4 commits July 8, 2018 11:11

DOC fixup

cb78862

Merge remote-tracking branch 'upstream/master' into dataframe_olalekan

df9e2a8

DOC fixup

023ed83

Updated copy description

cb9ee77

WillAyd merged commit c697ca6 into pandas-dev:master Jul 8, 2018

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

DOC: Improve the docstring of pandas.DataFrame.append() (pandas-dev#2…

d64c84f

…0267)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Improve the docstring of pandas.DataFrame.append() #20267

DOC: Improve the docstring of pandas.DataFrame.append() #20267

Triple0 commented Mar 11, 2018 •

edited

Loading

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

jorisvandenbossche left a comment

jorisvandenbossche Mar 11, 2018

jorisvandenbossche Mar 11, 2018 •

edited

Loading

WillAyd Mar 11, 2018

WillAyd left a comment

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

jorisvandenbossche left a comment

jorisvandenbossche Mar 12, 2018

jorisvandenbossche Mar 12, 2018

jorisvandenbossche Mar 12, 2018

jorisvandenbossche Mar 12, 2018

WillAyd commented Jul 8, 2018

		# -----------------------------------------------------------------------
		# DataFrame class


		Keyword argument order is maintained for Python 3.6 and later.
		Keyword argument order is maintained for Python 3.6 and later.

DOC: Improve the docstring of pandas.DataFrame.append() #20267

DOC: Improve the docstring of pandas.DataFrame.append() #20267

Conversation

Triple0 commented Mar 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche Mar 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Jul 8, 2018

Triple0 commented Mar 11, 2018 •

edited

Loading

jorisvandenbossche Mar 11, 2018 •

edited

Loading