-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: some code cleanups in pandas/_libs/ #31808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The only "big" changes I have done here, is taking this pattern: arr = ["a", "b", "c"]
for i in range(len(arr)):
print(arr[i]) and making it to look like this: arr = ["a", "b", "c"]
for value in arr:
print(value) When it comes to python, I can't see why would one use the first pattern, but when it comes to Cython, I don't know if there's some sort of optimization, that the first pattern gives that the second one don't |
@@ -448,7 +448,7 @@ cdef class BlockIndex(SparseIndex): | |||
ylen = y.blengths | |||
|
|||
# block may be split, but can't exceed original len / 2 + 1 | |||
max_len = int(min(self.length, y.length) / 2) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there no cases where int
rounds up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, in order for int()
to round up it needs to be done with math.ceil
or with a "hack" like:
number // divider + (number % divider > 0)
For example:
number = 42.01
divider = 8
number / divider # 5.25125
number // divider # 5.0
int(number / divider) # 5
number // divider + (number % divider > 0) # 6.0
Then you will need to show that it gives similar results (eg show some timings of the changed code, or look at the generated C code). For example, with a simple test (but I don't know if that is fully equivalent with the changes you did, this was just a quick test), I see a big overhead when iterating directly through the values of an array. Eg check
vs
the second version is much slower (and with the annotation in the notebook you clearly see a lot of python interaction). But I am no cython expert, so I might also have done something wrong. |
pandas/_libs/tslibs/period.pyx
Outdated
check_dts_bounds, | ||
NPY_DATETIMEUNIT, | ||
NPY_FR_D, | ||
NPY_FR_us |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing trailing commas in some of these
if is_period_object(p): | ||
return p.freq | ||
if is_period_object(value): | ||
return value.freq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for avoiding 1-character variable name
@MomIsBestFriend i'd advocate reverting the non-easy parts of this and getting the clear improvements in |
@jbrockmendel NP, I will do it over the weekend as soon as I have time. |
pandas/_libs/tslibs/period.pyx
Outdated
if row < 6: | ||
return 0 | ||
elif col < 6: | ||
if row < 6 or col < 6: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this need brackets to be done in the correct order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche I have reverted this, as I was doing it wrong before, I haven't looked at the possibility that row can be let's say 12 and col be 2.
Reverted in 208bc03
pandas/_libs/tslibs/period.pyx
Outdated
@@ -289,17 +299,15 @@ cdef int64_t DtoB(npy_datetimestruct *dts, int roll_back, | |||
return DtoB_weekday(unix_date) | |||
|
|||
|
|||
cdef inline int64_t upsample_daytime(int64_t ordinal, | |||
asfreq_info *af_info) nogil: | |||
cdef inline int64_t upsample_daytime(int64_t ordinal, asfreq_info *af_info) nogil: | |||
if (af_info.is_end): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you wanted to remove the parens here, i wouldnt object
pandas/_libs/tslibs/strptime.pyx
Outdated
@@ -78,12 +77,10 @@ def array_strptime(object[:] values, object fmt, | |||
if fmt is not None: | |||
if '%W' in fmt or '%U' in fmt: | |||
if '%Y' not in fmt and '%y' not in fmt: | |||
raise ValueError("Cannot use '%W' or '%U' without " | |||
"day and year") | |||
raise ValueError("Cannot use '%W' or '%U' without day and year") | |||
if ('%A' not in fmt and '%a' not in fmt and '%w' not | |||
in fmt): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the in fmt
can go on the previous line, then parens removed
@MomIsBestFriend can you rebase and ping on green |
ping @jbrockmendel |
thanks @MomIsBestFriend |
* CLN: some code cleanups in pandas/_libs/ * Reverted "bint" REF: https://github.com/pandas-dev/pandas/pull/31808/files#r376721172 * Added trailing comma to imports REF: https://github.com/pandas-dev/pandas/pull/31808/files#r378523656 * Reverted bad code * Lint issues * Reverted wrong code REF: pandas-dev#31808 (comment) * Removed parens REF: pandas-dev#31808 (comment) * "in fmt" in prev line REF: pandas-dev#31808 (comment)
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff