-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN/STY: pandas/_libs/internals.pyx #32801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN/STY: pandas/_libs/internals.pyx #32801
Conversation
pandas/_libs/internals.pyx
Outdated
int64_t cur_blkno | ||
Py_ssize_t i, start, stop, n, diff | ||
|
||
Py_ssize_t i, start = 0, stop, n = blknos.shape[0], diff, tot_len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE, I am adding here a variable definition (tot_len
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this on multiple lines, this becomes difficult to read IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since code style is subjective (when going beyond PEP8 / black), you also get some subjective comments from me ;)
pandas/_libs/internals.pyx
Outdated
|
||
return f'{type(self).__name__}({v})' | ||
v = self._as_slice if s is not None else self._as_array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I don't find this necessarily more readable ..
pandas/_libs/internals.pyx
Outdated
if s is not None: | ||
return slice_len(s) | ||
else: | ||
return len(self._as_array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? I personally find an if/else pattern very clear
pandas/_libs/internals.pyx
Outdated
int64_t cur_blkno | ||
Py_ssize_t i, start, stop, n, diff | ||
|
||
Py_ssize_t i, start = 0, stop, n = blknos.shape[0], diff, tot_len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this on multiple lines, this becomes difficult to read IMO
pandas/_libs/internals.pyx
Outdated
val = slice(start, None, step) | ||
else: | ||
val = slice(start, stop, step) | ||
val = slice(start, None, step) if stop < 0 else slice(start, stop, step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
pandas/_libs/internals.pyx
Outdated
else: | ||
return iter(self._as_array) | ||
|
||
return iter(self._as_array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche This is a very similar change to the change that was mentioned here, would you like me to revert that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here (see comment below)
if s is not None: | ||
return s | ||
else: | ||
return self._as_array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche This is a very similar change to the change that was mentioned here, would you like me to revert that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, my opinion is only a single one, but yes, I would personally revert this as well
pandas/_libs/internals.pyx
Outdated
else: | ||
val = self._as_array[loc] | ||
|
||
val = slice_getitem(s, loc) if s is not None else self._as_array[loc] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche This is a very similar change to the change that was mentioned here, would you like me to revert that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is different from the other places where you left this comment, and this is the only one i would ask you to revert. The way it is currently written, I can look in coverage output to see if both branches are reached; I can't with this PR's version.
pandas/_libs/internals.pyx
Outdated
# np.arange(start, stop, step, dtype=np.int64) | ||
# NOTE: | ||
# this is the C-optimized equivalent of | ||
# `np.arange(start, stop, step, dtype=np.int64)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a deal-breaker but i dont see how this is an improvement. especially if the comment is going to be repeated in multiple places id rather it be 2 lines than 3
pandas/_libs/internals.pyx
Outdated
start = 0 | ||
cur_blkno = blknos[start] | ||
|
||
if group is False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if group is False
should be marginally faster than if not group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my benchmarking it's actually the other way around:
In [1]: foo = True
In [2]: %timeit foo is False
22.9 ns ± 0.157 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [3]: %timeit foo is False
23 ns ± 0.0388 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [4]: %timeit foo is False
22.7 ns ± 0.194 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [5]: %timeit not foo
20.7 ns ± 0.0589 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [6]: %timeit not foo
21.1 ns ± 0.0636 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [7]: %timeit not foo
20.6 ns ± 0.0497 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [1]: bar = False
In [2]: %timeit bar is False
28.5 ns ± 0.077 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [3]: %timeit bar is False
28.6 ns ± 0.0666 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [4]: %timeit bar is False
29.5 ns ± 0.203 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [5]: %timeit not bar
26.1 ns ± 0.217 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [6]: %timeit not bar
25.8 ns ± 0.0231 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [7]: %timeit not bar
26.7 ns ± 0.0831 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look at the generated C code. The "is" should be a pointer comparison, whereas the other one should have to do some more work.
pandas/_libs/internals.pyx
Outdated
|
||
if n == 0: | ||
return | ||
|
||
start = 0 | ||
cur_blkno = blknos[start] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing this here instead of above means we can avoid the assignment in the n==0 case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fine. this is mostly moving things around, but ok.
pandas/_libs/internals.pyx
Outdated
return s | ||
|
||
raise TypeError("Not slice-like") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this change I personally don't find an improvement in code flow
thanks @MomIsBestFriend |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff