CLN: some code cleanups in pandas/_libs/ #31808

ShaharNaveh · 2020-02-08T13:40:45Z

ShaharNaveh · 2020-02-08T13:45:37Z

The only "big" changes I have done here, is taking this pattern:

arr = ["a", "b", "c"]

for i in range(len(arr)):
    print(arr[i])

and making it to look like this:

arr = ["a", "b", "c"]

for value in arr:
    print(value)

When it comes to python, I can't see why would one use the first pattern, but when it comes to Cython, I don't know if there's some sort of optimization, that the first pattern gives that the second one don't

pandas/_libs/lib.pyx

jbrockmendel · 2020-02-08T17:07:35Z

pandas/_libs/sparse.pyx

@@ -448,7 +448,7 @@ cdef class BlockIndex(SparseIndex):
        ylen = y.blengths

        # block may be split, but can't exceed original len / 2 + 1
-        max_len = int(min(self.length, y.length) / 2) + 1


are there no cases where int rounds up?

No, in order for int() to round up it needs to be done with math.ceil or with a "hack" like:

number // divider + (number % divider > 0)

For example:

number = 42.01 divider = 8 number / divider # 5.25125 number // divider # 5.0 int(number / divider) # 5 number // divider + (number % divider > 0) # 6.0

pandas/_libs/tslibs/timezones.pyx

jorisvandenbossche · 2020-02-08T20:51:21Z

but when it comes to Cython, I don't know if there's some sort of optimization, that the first pattern gives that the second one don't

Then you will need to show that it gives similar results (eg show some timings of the changed code, or look at the generated C code).

For example, with a simple test (but I don't know if that is fully equivalent with the changes you did, this was just a quick test), I see a big overhead when iterating directly through the values of an array. Eg check

%%cython -a
import cython

import numpy as np
cimport numpy as np


@cython.wraparound(False)
@cython.boundscheck(False)
def sum1():
    cdef:
        np.int64_t[:] arr
        int n
        int val, total
    
    arr = np.random.randint(10, size=10000)
    
    n = len(arr)
    total = 0
    
    for i in range(n):
        val = arr[i]
        total += val
    
    return total

vs

%%cython -a
import cython

import numpy as np
cimport numpy as np


@cython.wraparound(False)
@cython.boundscheck(False)
def sum2():
    cdef:
        np.int64_t[:] arr
        np.int64_t val, total
    
    arr = np.random.randint(10, size=10000)
    
    n = len(arr)
    total = 0
    
    for val in arr:
        total += val
    
    return total

the second version is much slower (and with the annotation in the notebook you clearly see a lot of python interaction). But I am no cython expert, so I might also have done something wrong.

pandas/_libs/tslibs/period.pyx

jbrockmendel · 2020-02-12T21:30:40Z

pandas/_libs/tslibs/period.pyx

+    check_dts_bounds,
+    NPY_DATETIMEUNIT,
+    NPY_FR_D,
+    NPY_FR_us


missing trailing commas in some of these

jbrockmendel · 2020-02-12T21:31:16Z

pandas/_libs/tslibs/period.pyx

-            if is_period_object(p):
-                return p.freq
+            if is_period_object(value):
+                return value.freq


+1 for avoiding 1-character variable name

jbrockmendel · 2020-02-12T21:31:59Z

@MomIsBestFriend i'd advocate reverting the non-easy parts of this and getting the clear improvements in

ShaharNaveh · 2020-02-13T04:40:54Z

@jbrockmendel NP, I will do it over the weekend as soon as I have time.

REF: https://github.com/pandas-dev/pandas/pull/31808/files#r376721172

REF: https://github.com/pandas-dev/pandas/pull/31808/files#r378523656

jorisvandenbossche · 2020-02-15T08:39:50Z

pandas/_libs/tslibs/period.pyx

-    if row < 6:
-        return 0
-    elif col < 6:
+    if row < 6 or col < 6:


Doesn't this need brackets to be done in the correct order?

@jorisvandenbossche I have reverted this, as I was doing it wrong before, I haven't looked at the possibility that row can be let's say 12 and col be 2.

Reverted in 208bc03

REF: pandas-dev#31808 (comment)

jbrockmendel · 2020-02-20T00:31:06Z

pandas/_libs/tslibs/period.pyx

@@ -289,17 +299,15 @@ cdef int64_t DtoB(npy_datetimestruct *dts, int roll_back,
    return DtoB_weekday(unix_date)


-cdef inline int64_t upsample_daytime(int64_t ordinal,
-                                     asfreq_info *af_info) nogil:
+cdef inline int64_t upsample_daytime(int64_t ordinal, asfreq_info *af_info) nogil:
    if (af_info.is_end):


if you wanted to remove the parens here, i wouldnt object

jbrockmendel · 2020-02-20T00:32:42Z

pandas/_libs/tslibs/strptime.pyx

@@ -78,12 +77,10 @@ def array_strptime(object[:] values, object fmt,
    if fmt is not None:
        if '%W' in fmt or '%U' in fmt:
            if '%Y' not in fmt and '%y' not in fmt:
-                raise ValueError("Cannot use '%W' or '%U' without "
-                                 "day and year")
+                raise ValueError("Cannot use '%W' or '%U' without day and year")
            if ('%A' not in fmt and '%a' not in fmt and '%w' not
                    in fmt):


the in fmt can go on the previous line, then parens removed

jbrockmendel · 2020-02-20T00:35:32Z

@MomIsBestFriend can you rebase and ping on green

REF: pandas-dev#31808 (comment)

ShaharNaveh · 2020-02-22T10:58:32Z

ping @jbrockmendel

jbrockmendel · 2020-02-22T15:45:38Z

thanks @MomIsBestFriend

* CLN: some code cleanups in pandas/_libs/ * Reverted "bint" REF: https://github.com/pandas-dev/pandas/pull/31808/files#r376721172 * Added trailing comma to imports REF: https://github.com/pandas-dev/pandas/pull/31808/files#r378523656 * Reverted bad code * Lint issues * Reverted wrong code REF: pandas-dev#31808 (comment) * Removed parens REF: pandas-dev#31808 (comment) * "in fmt" in prev line REF: pandas-dev#31808 (comment)

CLN: some code cleanups in pandas/_libs/

1ea71f8

jbrockmendel reviewed Feb 8, 2020

View reviewed changes

pandas/_libs/lib.pyx Show resolved Hide resolved

jbrockmendel reviewed Feb 8, 2020

View reviewed changes

pandas/_libs/tslibs/timezones.pyx Show resolved Hide resolved

WillAyd requested changes Feb 12, 2020

View reviewed changes

pandas/_libs/tslibs/period.pyx Show resolved Hide resolved

jbrockmendel reviewed Feb 12, 2020

View reviewed changes

MomIsBestFriend added 6 commits February 14, 2020 16:02

Merge remote-tracking branch 'upstream/master' into CLN-libs

dcd8c38

Reverted "bint"

8634a37

REF: https://github.com/pandas-dev/pandas/pull/31808/files#r376721172

Added trailing comma to imports

362e1f7

REF: https://github.com/pandas-dev/pandas/pull/31808/files#r378523656

Reverted bad code

e151183

Merge remote-tracking branch 'upstream/master' into CLN-libs

2b631b5

Lint issues

b369447

jorisvandenbossche reviewed Feb 15, 2020

View reviewed changes

MomIsBestFriend added 2 commits February 15, 2020 12:17

Merge remote-tracking branch 'upstream/master' into CLN-libs

4f600f9

Reverted wrong code

208bc03

REF: pandas-dev#31808 (comment)

jbrockmendel reviewed Feb 20, 2020

View reviewed changes

MomIsBestFriend added 3 commits February 22, 2020 12:23

Merge remote-tracking branch 'upstream/master' into CLN-libs

4cbec3d

Removed parens

0b47e28

REF: pandas-dev#31808 (comment)

"in fmt" in prev line

eaf2d49

REF: pandas-dev#31808 (comment)

jbrockmendel merged commit 9e69040 into pandas-dev:master Feb 22, 2020

simonjayhawkins added this to the 1.1 milestone Feb 24, 2020

simonjayhawkins added the Clean label Feb 24, 2020

ShaharNaveh deleted the CLN-libs branch February 29, 2020 10:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: some code cleanups in pandas/_libs/ #31808

CLN: some code cleanups in pandas/_libs/ #31808

ShaharNaveh commented Feb 8, 2020

ShaharNaveh commented Feb 8, 2020 •

edited

Loading

jbrockmendel Feb 8, 2020

ShaharNaveh Feb 14, 2020

jorisvandenbossche commented Feb 8, 2020

jbrockmendel Feb 12, 2020

jbrockmendel Feb 12, 2020

jbrockmendel commented Feb 12, 2020

ShaharNaveh commented Feb 13, 2020

jorisvandenbossche Feb 15, 2020

ShaharNaveh Feb 15, 2020 •

edited

Loading

jbrockmendel Feb 20, 2020

jbrockmendel Feb 20, 2020

jbrockmendel commented Feb 20, 2020

ShaharNaveh commented Feb 22, 2020

jbrockmendel commented Feb 22, 2020

CLN: some code cleanups in pandas/_libs/ #31808

CLN: some code cleanups in pandas/_libs/ #31808

Conversation

ShaharNaveh commented Feb 8, 2020

ShaharNaveh commented Feb 8, 2020 • edited Loading

jbrockmendel Feb 8, 2020

Choose a reason for hiding this comment

ShaharNaveh Feb 14, 2020

Choose a reason for hiding this comment

jorisvandenbossche commented Feb 8, 2020

jbrockmendel Feb 12, 2020

Choose a reason for hiding this comment

jbrockmendel Feb 12, 2020

Choose a reason for hiding this comment

jbrockmendel commented Feb 12, 2020

ShaharNaveh commented Feb 13, 2020

jorisvandenbossche Feb 15, 2020

Choose a reason for hiding this comment

ShaharNaveh Feb 15, 2020 • edited Loading

Choose a reason for hiding this comment

jbrockmendel Feb 20, 2020

Choose a reason for hiding this comment

jbrockmendel Feb 20, 2020

Choose a reason for hiding this comment

jbrockmendel commented Feb 20, 2020

ShaharNaveh commented Feb 22, 2020

jbrockmendel commented Feb 22, 2020

ShaharNaveh commented Feb 8, 2020 •

edited

Loading

ShaharNaveh Feb 15, 2020 •

edited

Loading