Skip to content

CLN/STY: pandas/_libs/internals.pyx #32801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Mar 26, 2020

Conversation

ShaharNaveh
Copy link
Member

@ShaharNaveh ShaharNaveh commented Mar 18, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

int64_t cur_blkno
Py_ssize_t i, start, stop, n, diff

Py_ssize_t i, start = 0, stop, n = blknos.shape[0], diff, tot_len
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE, I am adding here a variable definition (tot_len)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this on multiple lines, this becomes difficult to read IMO

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since code style is subjective (when going beyond PEP8 / black), you also get some subjective comments from me ;)


return f'{type(self).__name__}({v})'
v = self._as_slice if s is not None else self._as_array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't find this necessarily more readable ..

if s is not None:
return slice_len(s)
else:
return len(self._as_array)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? I personally find an if/else pattern very clear

int64_t cur_blkno
Py_ssize_t i, start, stop, n, diff

Py_ssize_t i, start = 0, stop, n = blknos.shape[0], diff, tot_len
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this on multiple lines, this becomes difficult to read IMO

val = slice(start, None, step)
else:
val = slice(start, stop, step)
val = slice(start, None, step) if stop < 0 else slice(start, stop, step)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

else:
return iter(self._as_array)

return iter(self._as_array)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche This is a very similar change to the change that was mentioned here, would you like me to revert that as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here (see comment below)

if s is not None:
return s
else:
return self._as_array
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche This is a very similar change to the change that was mentioned here, would you like me to revert that as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, my opinion is only a single one, but yes, I would personally revert this as well

else:
val = self._as_array[loc]

val = slice_getitem(s, loc) if s is not None else self._as_array[loc]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche This is a very similar change to the change that was mentioned here, would you like me to revert that as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is different from the other places where you left this comment, and this is the only one i would ask you to revert. The way it is currently written, I can look in coverage output to see if both branches are reached; I can't with this PR's version.

@simonjayhawkins simonjayhawkins added Clean Code Style Code style, linting, code_checks labels Mar 18, 2020
# np.arange(start, stop, step, dtype=np.int64)
# NOTE:
# this is the C-optimized equivalent of
# `np.arange(start, stop, step, dtype=np.int64)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a deal-breaker but i dont see how this is an improvement. especially if the comment is going to be repeated in multiple places id rather it be 2 lines than 3

start = 0
cur_blkno = blknos[start]

if group is False:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if group is False should be marginally faster than if not group

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my benchmarking it's actually the other way around:

In [1]: foo = True                                                                                            

In [2]: %timeit foo is False                                                                                  
22.9 ns ± 0.157 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [3]: %timeit foo is False                                                                                  
23 ns ± 0.0388 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [4]: %timeit foo is False                                                                                  
22.7 ns ± 0.194 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [5]: %timeit not foo                                                                                       
20.7 ns ± 0.0589 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [6]: %timeit not foo                                                                                       
21.1 ns ± 0.0636 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [7]: %timeit not foo                                                                                       
20.6 ns ± 0.0497 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [1]: bar = False                                                                                           

In [2]: %timeit bar is False                                                                                  
28.5 ns ± 0.077 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [3]: %timeit bar is False                                                                                  
28.6 ns ± 0.0666 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [4]: %timeit bar is False                                                                                  
29.5 ns ± 0.203 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [5]: %timeit not bar                                                                                       
26.1 ns ± 0.217 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [6]: %timeit not bar                                                                                       
25.8 ns ± 0.0231 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [7]: %timeit not bar                                                                                       
26.7 ns ± 0.0831 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look at the generated C code. The "is" should be a pointer comparison, whereas the other one should have to do some more work.


if n == 0:
return

start = 0
cur_blkno = blknos[start]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing this here instead of above means we can avoid the assignment in the n==0 case

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine. this is mostly moving things around, but ok.

return s

raise TypeError("Not slice-like")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this change I personally don't find an improvement in code flow

@jreback jreback added this to the 1.1 milestone Mar 21, 2020
@jreback jreback merged commit 63c631f into pandas-dev:master Mar 26, 2020
@jreback
Copy link
Contributor

jreback commented Mar 26, 2020

thanks @MomIsBestFriend

@ShaharNaveh ShaharNaveh deleted the OCD-collections-internals branch March 26, 2020 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Code Style Code style, linting, code_checks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants