Skip to content

PERF: tighter cython declarations, faster __iter__ #43872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 5, 2021

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

Copy link
Member

@mzeitlin11 mzeitlin11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Are the ndim=1 additions for readability or performance? Have always just assumed (but never checked) that cython infers it. If it helps perf we should do it everywhere - a quick regex shows lots of places we don't use ndim=1

@mzeitlin11 mzeitlin11 added Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance labels Oct 4, 2021
@mzeitlin11 mzeitlin11 added this to the 1.4 milestone Oct 4, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jreback
Copy link
Contributor

jreback commented Oct 4, 2021

can you post benchmark changes in the top

@jbrockmendel
Copy link
Member Author

Are the ndim=1 additions for readability or performance? Have always just assumed (but never checked) that cython infers it.

When I was working on this I convinced myself that it made a difference, but I can no longer remember why. Maybe @da-woods can weigh in?

can you post benchmark changes in the top

It's all going to be micro-perf that asvs don't pick up well

@da-woods
Copy link

da-woods commented Oct 4, 2021

Are the ndim=1 additions for readability or performance? Have always just assumed (but never checked) that cython infers it.

When I was working on this I convinced myself that it made a difference, but I can no longer remember why. Maybe @da-woods can weigh in?

https://cython.readthedocs.io/en/latest/src/tutorial/numpy.html#efficient-indexing

“ndim” keyword-only argument, if not provided then one-dimensional is assumed

I had a quick test of a simple example (looking at the annotated source) and the behaviour looks to match what the docs say. So I think there's no performance advantage to specifying ndim=1 but you may prefer to be explicit for readability reasons.

@jbrockmendel
Copy link
Member Author

Thanks @da-woods.

If the ndim=1 is extraneous, I'm OK with either the more concise or more explicit versions.

@jreback jreback merged commit 6599834 into pandas-dev:master Oct 5, 2021
@jbrockmendel jbrockmendel deleted the perf-cy branch October 5, 2021 01:39
gasparitiago pushed a commit to gasparitiago/pandas that referenced this pull request Oct 9, 2021
rhshadrach pushed a commit to rhshadrach/pandas that referenced this pull request Oct 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants