Skip to content

REF: move roperators to pandas.core #40444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 23, 2021

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Mar 15, 2021

This PR does 2 things:

  • change pandas.core.ops.roperator imports to all import the module and use roperator.<rop> instead of importing the actual functions (this makes the usage similar as for the non-reversed operators with import operator)
  • move roperator.py from pandas/core/ops to pandas/core, so it can be imported in pandas.core.computation.expressions without relying on the pandas.core.ops module, which in its turn means we can move the inline import of pandas.core.computation.expressions in pandas.core.ops.array_ops to a top-level import (better for performance for a function that is called repeatedly)

I think the first change is useful anyway (can also do it as a separate PR), for the second point, a possible alternative would be to move pandas.core.computation.expressions from the computation/ module to the ops/ module

One consequence of this, though: by always importing this in array_ops, it gets imported on import pandas time, and thus numexpr always gets imported at that point (while now it's only a delayed import when doing a first operation).

Now, I checked with python -X importtime -c "import numpy, pandas, numexpr", and after already having import numpy (and pandas), also import numexpr is only a small addition (most of the import of numexpr is import numpy).
Based on running that a few times, it seems to adds 1.5-2 % to the total pandas import time.

@jorisvandenbossche jorisvandenbossche added Numeric Operations Arithmetic, Comparison, and Logical operations Refactor Internal refactoring of code labels Mar 15, 2021
@jbrockmendel
Copy link
Member

possible alternative would be to move pandas.core.computation.expressions from the computation/ module to the ops/ module

i think this would be marginally nicer. not a big deal. might want to move expressions.check to eg compat.numexpr

@jorisvandenbossche
Copy link
Member Author

One additional consideration is that expressions.py also contains the where function, which doesn't necessarily fit in ops/ ?

@jorisvandenbossche
Copy link
Member Author

(and I still need to update the imports in conftest.py)

@jbrockmendel
Copy link
Member

One additional consideration is that expressions.py also contains the where function, which doesn't necessarily fit in ops/ ?

Thats a fair point. Either way is fine, really. Could reasonably split expressions.py into where/evaluate pieces, but its too early in the AM to go down this rabbit hole.

@jorisvandenbossche
Copy link
Member Author

Fixed the failing tests. This should be good now (assuming we leave expressions.py in /computation as is, for now)

@jorisvandenbossche jorisvandenbossche added this to the 1.3 milestone Mar 17, 2021
@jorisvandenbossche
Copy link
Member Author

One additional point here: by always importing this in array_ops, it gets imported on import pandas time, and thus numexpr always gets imported at that point (while now it's only a delayed import when doing a first operation).

Now, I checked with python -X importtime -c "import numpy, pandas, numexpr", and after already having import numpy (and pandas), also import numexpr is only a small addition (most of the import of numexpr is import numpy).
Based on running that a few times, it seems to adds 1.5-2 % to the total pandas import time.

Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM cc @jreback

@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

move roperator.py from pandas/core/ops to pandas/core, so it can be imported in pandas.core.computation.expressions without relying on the pandas.core.ops module, which in its turn means we can move the inline import of pandas.core.computation.expressions in pandas.core.ops.array_ops to a top-level import (better for performance for a function that is called repeatedly)

can we reorg a bit then, this really should be in ops logically. (not blocking here, just a further thought).

cc @jbrockmendel @jorisvandenbossche

@jorisvandenbossche
Copy link
Member Author

See the discussion above, expressions.py also contains code for where, which doesn't fully fit into ops

@jreback jreback merged commit 2700775 into pandas-dev:master Mar 23, 2021
@jorisvandenbossche jorisvandenbossche deleted the ref-roperator-imports branch March 24, 2021 07:26
vladu pushed a commit to vladu/pandas that referenced this pull request Apr 5, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants