Skip to content

DOC: update the DataFrame.eval() docstring #20209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 13, 2018

Conversation

StephenVoland
Copy link
Contributor

@StephenVoland StephenVoland commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
###################### Docstring (pandas.DataFrame.eval)  ######################
################################################################################

Evaluate a string describing operations on DataFrame columns.

Operates on columns only, not specific rows or elements.  This allows
`eval` to run arbitrary code, which can make you vulnerable to code
injection if you pass user input to this function.

Parameters
----------
expr : str
    The expression string to evaluate.
inplace : bool, default False
    If the expression contains an assignment, whether to perform the
    operation inplace and mutate the existing DataFrame. Otherwise,
    a new DataFrame is returned.

    .. versionadded:: 0.18.0.
kwargs : dict
    See the documentation for :func:`~pandas.eval` for complete details
    on the keyword arguments accepted by
    :meth:`~pandas.DataFrame.query`.

Returns
-------
ndarray, scalar, or pandas object
    The result of the evaluation.

See Also
--------
DataFrame.query : Evaluates a boolean expression to query the columns
    of a frame.
DataFrame.assign : Can evaluate an expression or function to create new
    values for a column.
pandas.eval : Evaluate a Python expression as a string using various
    backends.

Notes
-----
For more details see the API documentation for :func:`~pandas.eval`.
For detailed examples see :ref:`enhancing performance with eval
<enhancingperf.eval>`.

Examples
--------
>>> df = pd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)})
>>> df
   A   B
0  1  10
1  2   8
2  3   6
3  4   4
4  5   2
>>> df.eval('A + B')
0    11
1    10
2     9
3     8
4     7
dtype: int64

Assignment is allowed and by default the original DataFrame is not
modified.

>>> df.eval('C = A + B')
   A   B   C
0  1  10  11
1  2   8  10
2  3   6   9
3  4   4   8
4  5   2   7
>>> df
   A   B
0  1  10
1  2   8
2  3   6
3  4   4
4  5   2

Use inplace=True to modify the original DataFrame.

>>> df.eval('C = A + B', inplace=True)
>>> df
   A   B   C
0  1  10  11
1  2   8  10
2  3   6   9
3  4   4   8
4  5   2   7

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.eval" correct. :)

@joaoavf
Copy link
Contributor

joaoavf commented Mar 10, 2018

On the beggining of the text of your PR, there is a space when using [x ], there should be no space between 'x' and ']', as it leads to uncorrect formatting.

@StephenVoland
Copy link
Contributor Author

Fixed.

3 8
4 7
dtype: int64
>>> df.eval('C = A + B')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a line before this saying "Assignment is allowed and by default the original DataFrame is not modiefied."

2 3 6
3 4 4
4 5 2
>>> df.eval('C = A + B', inplace=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And before this, say "use inplace=True to modify the original DataFrame.

Copy link
Contributor

@joaoavf joaoavf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions, very good overall.

"""
Evaluate expression in the context of the calling DataFrame instance.

Evaluates a string describing operations on dataframe columns. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataframe --> DataFrame


Evaluates a string describing operations on dataframe columns. This
allows `eval` to run arbitrary code, but remember to sanitize your
inputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful to state clearly that this will not evaluate expression within elements of the DataFrame, that it will only evaluate columns.

@StephenVoland
Copy link
Contributor Author

All fixes made, thanks for comments.

not specific rows or elements). This allows `eval` to run arbitrary
code, but remember to sanitize your inputs.

This function calls `pandas.eval()` and is likely to be slower than
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this, its not relevant

Evaluate expression in the context of the calling DataFrame instance.

Evaluates a string describing operations on DataFrame columns (only,
not specific rows or elements). This allows `eval` to run arbitrary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move the last sentence to the Notes section, and slightyl expand on 'sanitize' inputs.

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 10, 2018
@codecov
Copy link

codecov bot commented Mar 12, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@faba7ef). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #20209   +/-   ##
=========================================
  Coverage          ?    91.7%           
=========================================
  Files             ?      150           
  Lines             ?    49165           
  Branches          ?        0           
=========================================
  Hits              ?    45087           
  Misses            ?     4078           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.09% <ø> (?)
#single 41.86% <ø> (?)
Impacted Files Coverage Δ
pandas/core/frame.py 97.18% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update faba7ef...b1301d8. Read the comment docs.


Notes
-----
For more details see the API documentation for :func:`~pandas.eval`.
For detailed examples see :ref:`enhancing performance with eval
<enhancingperf.eval>`.

This function calls `pandas.eval()` and is likely to be slower than
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meaningfully slower than pd.eval? We don't seem to do much before handing it off, so if you're doing anything meaningfully expensive in the eval, there shouldn't be much overhead. I'd prefer to remove this unless there's a significant difference.

pandas.DataFrame.query
pandas.DataFrame.assign
pandas.eval
pandas.DataFrame.query : Evaluates a boolean expression to query the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe DataFrame.query and DataFrame.assign will work. You will need pandas in pandas.eval.

@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Mar 13, 2018
@TomAugspurger TomAugspurger merged commit 3c8d1d4 into pandas-dev:master Mar 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants