Skip to content

API: multi-line, not inplace eval #11149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 56 additions & 6 deletions doc/source/enhancingperf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -570,18 +570,51 @@ prefix the name of the :class:`~pandas.DataFrame` to the column(s) you're
interested in evaluating.

In addition, you can perform assignment of columns within an expression.
This allows for *formulaic evaluation*. Only a single assignment is permitted.
The assignment target can be a new column name or an existing column name, and
it must be a valid Python identifier.
This allows for *formulaic evaluation*. The assignment target can be a
new column name or an existing column name, and it must be a valid Python
identifier.

.. versionadded:: 0.18.0

The ``inplace`` keyword determines whether this assignment will performed
on the original ``DataFrame`` or return a copy with the new column.

.. warning::

For backwards compatability, ``inplace`` defaults to ``True`` if not
specified. This will change in a future version of pandas - if your
code depends on an inplace assignment you should update to explicitly
set ``inplace=True``

.. ipython:: python

df = pd.DataFrame(dict(a=range(5), b=range(5, 10)))
df.eval('c = a + b')
df.eval('d = a + b + c')
df.eval('a = 1')
df.eval('c = a + b', inplace=True)
df.eval('d = a + b + c', inplace=True)
df.eval('a = 1', inplace=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show an example of inplace=False as well

df

When ``inplace`` is set to ``False``, a copy of the ``DataFrame`` with the
new or modified columns is returned and the original frame is unchanged.

.. ipython:: python

df
df.eval('e = a - c', inplace=False)
df

.. versionadded:: 0.18.0

As a convenience, multiple assignments can be performed by using a
multi-line string.

.. ipython:: python

df.eval("""
c = a + b
d = a + b + c
a = 1""", inplace=False)

The equivalent in standard Python would be

.. ipython:: python
Expand All @@ -592,6 +625,23 @@ The equivalent in standard Python would be
df['a'] = 1
df

.. versionadded:: 0.18.0

The ``query`` method gained the ``inplace`` keyword which determines
whether the query modifies the original frame.

.. ipython:: python

df = pd.DataFrame(dict(a=range(5), b=range(5, 10)))
df.query('a > 2')
df.query('a > 2', inplace=True)
df

.. warning::

Unlike with ``eval``, the default value for ``inplace`` for ``query``
is ``False``. This is consistent with prior versions of pandas.

Local Variables
~~~~~~~~~~~~~~~

Expand Down
47 changes: 46 additions & 1 deletion doc/source/whatsnew/v0.18.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -295,15 +295,60 @@ Other API Changes

- ``.memory_usage`` now includes values in the index, as does memory_usage in ``.info`` (:issue:`11597`)

Changes to eval
^^^^^^^^^^^^^^^

In prior versions, new columns assignments in an ``eval`` expression resulted
in an inplace change to the ``DataFrame``. (:issue:`9297`)

.. ipython:: python

df = pd.DataFrame({'a': np.linspace(0, 10, 5), 'b': range(5)})
df.eval('c = a + b')
df

In version 0.18.0, a new ``inplace`` keyword was added to choose whether the
assignment should be done inplace or return a copy.

.. ipython:: python

df
df.eval('d = c - b', inplace=False)
df
df.eval('d = c - b', inplace=True)
df

.. warning::

For backwards compatability, ``inplace`` defaults to ``True`` if not specified.
This will change in a future version of pandas - if your code depends on an
inplace assignment you should update to explicitly set ``inplace=True``

The ``inplace`` keyword parameter was also added the ``query`` method.

.. ipython:: python

df.query('a > 5')
df.query('a > 5', inplace=True)
df

.. warning::

Note that the default value for ``inplace`` in a ``query``
is ``False``, which is consistent with prior verions.

``eval`` has also been updated to allow multi-line expressions for multiple
assignments. These expressions will be evaluated one at a time in order. Only
assginments are valid for multi-line expressions.

.. ipython:: python

df
df.eval("""
e = d + a
f = e - 22
g = f / 2.0""", inplace=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention query added inplace

df

.. _whatsnew_0180.deprecations:

Expand Down Expand Up @@ -410,7 +455,7 @@ Bug Fixes
- Bug in ``pd.read_clipboard`` and ``pd.to_clipboard`` functions not supporting Unicode; upgrade included ``pyperclip`` to v1.5.15 (:issue:`9263`)



- Bug in ``DataFrame.query`` containing an assignment (:issue:`8664`)



Expand Down
109 changes: 83 additions & 26 deletions pandas/computation/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@
"""Top level ``eval`` module.
"""

import warnings
import tokenize
from pandas.core import common as com
from pandas.computation.expr import Expr, _parsers, tokenize_string
from pandas.computation.scope import _ensure_scope
from pandas.compat import DeepChainMap, builtins
from pandas.compat import string_types
from pandas.computation.engines import _engines
from distutils.version import LooseVersion

Expand Down Expand Up @@ -138,7 +139,7 @@ def _check_for_locals(expr, stack_level, parser):

def eval(expr, parser='pandas', engine='numexpr', truediv=True,
local_dict=None, global_dict=None, resolvers=(), level=0,
target=None):
target=None, inplace=None):
"""Evaluate a Python expression as a string using various backends.

The following arithmetic operations are supported: ``+``, ``-``, ``*``,
Expand Down Expand Up @@ -196,6 +197,13 @@ def eval(expr, parser='pandas', engine='numexpr', truediv=True,
scope. Most users will **not** need to change this parameter.
target : a target object for assignment, optional, default is None
essentially this is a passed in resolver
inplace : bool, default True
If expression mutates, whether to modify object inplace or return
copy with mutation.

WARNING: inplace=None currently falls back to to True, but
in a future version, will default to False. Use inplace=True
explicitly rather than relying on the default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add .assign in the See also (as .assign is basically what df.eval('c=a+b',inplace=False) IS)


Returns
-------
Expand All @@ -214,29 +222,78 @@ def eval(expr, parser='pandas', engine='numexpr', truediv=True,
pandas.DataFrame.query
pandas.DataFrame.eval
"""
expr = _convert_expression(expr)
_check_engine(engine)
_check_parser(parser)
_check_resolvers(resolvers)
_check_for_locals(expr, level, parser)

# get our (possibly passed-in) scope
level += 1
env = _ensure_scope(level, global_dict=global_dict,
local_dict=local_dict, resolvers=resolvers,
target=target)

parsed_expr = Expr(expr, engine=engine, parser=parser, env=env,
truediv=truediv)

# construct the engine and evaluate the parsed expression
eng = _engines[engine]
eng_inst = eng(parsed_expr)
ret = eng_inst.evaluate()

# assign if needed
if env.target is not None and parsed_expr.assigner is not None:
env.target[parsed_expr.assigner] = ret
return None
first_expr = True
if isinstance(expr, string_types):
exprs = [e for e in expr.splitlines() if e != '']
else:
exprs = [expr]
multi_line = len(exprs) > 1

if multi_line and target is None:
raise ValueError("multi-line expressions are only valid in the "
"context of data, use DataFrame.eval")

first_expr = True
for expr in exprs:
expr = _convert_expression(expr)
_check_engine(engine)
_check_parser(parser)
_check_resolvers(resolvers)
_check_for_locals(expr, level, parser)

# get our (possibly passed-in) scope
level += 1
env = _ensure_scope(level, global_dict=global_dict,
local_dict=local_dict, resolvers=resolvers,
target=target)

parsed_expr = Expr(expr, engine=engine, parser=parser, env=env,
truediv=truediv)

# construct the engine and evaluate the parsed expression
eng = _engines[engine]
eng_inst = eng(parsed_expr)
ret = eng_inst.evaluate()

if parsed_expr.assigner is None and multi_line:
raise ValueError("Multi-line expressions are only valid"
" if all expressions contain an assignment")

# assign if needed
if env.target is not None and parsed_expr.assigner is not None:
if inplace is None:
warnings.warn(
"eval expressions containing an assignment currently"
"default to operating inplace.\nThis will change in "
"a future version of pandas, use inplace=True to "
"avoid this warning.",
FutureWarning, stacklevel=3)
inplace = True

# if returning a copy, copy only on the first assignment
if not inplace and first_expr:
target = env.target.copy()
else:
target = env.target

target[parsed_expr.assigner] = ret

if not resolvers:
resolvers = ({parsed_expr.assigner: ret},)
else:
# existing resolver needs updated to handle
# case of mutating existing column in copy
for resolver in resolvers:
if parsed_expr.assigner in resolver:
resolver[parsed_expr.assigner] = ret
break
else:
resolvers += ({parsed_expr.assigner: ret},)

ret = None
first_expr = False

if not inplace and inplace is not None:
return target

return ret
Loading