Skip to content

Fix build warnings #27157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from Jul 2, 2019
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pandas/_libs/groupby.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ def group_shift_indexer(int64_t[:] out, const int64_t[:] labels,
int ngroups, int periods):
cdef:
Py_ssize_t N, i, j, ii
int offset, sign
int offset = 0, sign
int64_t lab, idxer, idxer_slot
int64_t[:] label_seen = np.zeros(ngroups, dtype=np.int64)
int64_t[:, :] label_indexer
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/hashtable.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ cdef class StringHashTable(HashTable):

cdef struct Int64VectorData:
int64_t *data
size_t n, m
Py_ssize_t n, m

cdef class Int64Vector:
cdef Int64VectorData *data
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/hashtable.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ cdef int64_t NPY_NAT = util.get_nat()
_SIZE_HINT_LIMIT = (1 << 20) + 7


cdef size_t _INIT_VEC_CAP = 128
cdef ssize_t _INIT_VEC_CAP = 128

include "hashtable_class_helper.pxi"
include "hashtable_func_helper.pxi"
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/hashtable_class_helper.pxi.in
Original file line number Diff line number Diff line change
Expand Up @@ -659,7 +659,7 @@ cdef class StringHashTable(HashTable):
int64_t[:] locs = np.empty(n, dtype=np.int64)

# these by-definition *must* be strings
vecs = <char **>malloc(n * sizeof(char *))
vecs = <const char **>malloc(n * sizeof(char *))
for i in range(n):
val = values[i]

Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/hashtable_func_helper.pxi.in
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ def ismember_{{dtype}}({{scalar}}[:] arr, {{scalar}}[:] values):

# construct the table
n = len(values)
kh_resize_{{ttype}}(table, min(n, len(values)))
kh_resize_{{ttype}}(table, n)

{{if dtype == 'object'}}
for i in range(n):
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/index.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,7 @@ cdef class IndexEngine:

cdef Py_ssize_t _bin_search(ndarray values, object val) except -1:
cdef:
Py_ssize_t mid, lo = 0, hi = len(values) - 1
Py_ssize_t mid = 0, lo = 0, hi = len(values) - 1
Copy link
Author

@ghost ghost Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a real bug I think. mid is referenced in the return value if len(values)=1 so that the main loop immediately falls through.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this would fail if len(values) == 0 right? Do you see a code sample to actually trigger that?

Copy link
Author

@ghost ghost Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quite possibly this is meant to be "caller checks". But that's out of the compiler's view.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea sure. Similar comment though - good to suppress compiler warning but if there's an actual error here (which I think would still happen without the warning) would be good to address

Copy link
Author

@ghost ghost Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if values is empty, the loop falls through, util.get_value_at(values, mid) gets called, which calls validate_indexer which raises. I think it's ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which raises

cython code can have weird corner cases where exceptions get ignored. Adding a test for this case seems worthwhile.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. example?

Copy link
Author

@ghost ghost Jul 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh. I think you've nailed something here. So it's not exactly weird corner cases, but cdef functions which return exception as if they were python functions.
https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#error-return-values

validate_indexer and get_value_at are both cdef. validate_indexer is tagged except -1, so any exception translates into a -1 retval. But the caller get_value_at doesn't check that and uses -1 to reference into the the array. in this case an empty one. but it doesn't have an except return value defines. So, I'm not sure what happens.

If an exception is detected in such a function, a warning message is 
printed and the exception is ignored.

OTOH, the index constructor refuses to create empty indexes, so an end-to-end test does not seem possible. and get_value_at is used all over the place.

Hmm. will have to think about this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [18]: %%cython
    ...: 
    ...: cdef int foo() except -1:
    ...:     1/0
    ...:     cdef:
    ...:         int mid = 42
    ...:     return mid
    ...:     
    ...: cdef int bar():
    ...:     res = foo()
    ...:     return res
    ...:     
    ...: def baz():
    ...:     return bar()

In [19]: baz()
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
_cython_magic_a591cd36cd8e49116d6e370949ee24c2.pyx in _cython_magic_a591cd36cd8e49116d6e370949ee24c2.foo()

ZeroDivisionError: float division
Exception ignored in: '_cython_magic_a591cd36cd8e49116d6e370949ee24c2.bar'
Traceback (most recent call last):
  File "_cython_magic_a591cd36cd8e49116d6e370949ee24c2.pyx", line 3, in _cython_magic_a591cd36cd8e49116d6e370949ee24c2.foo
ZeroDivisionError: float division
Out[19]: 0

very interesting bug.

object pval

if hi >= 0 and val > util.get_value_at(values, hi):
Expand Down
8 changes: 4 additions & 4 deletions pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -477,7 +477,7 @@ def maybe_indices_to_slice(ndarray[int64_t] indices, int max_len):
def maybe_booleans_to_slice(ndarray[uint8_t] mask):
cdef:
Py_ssize_t i, n = len(mask)
Py_ssize_t start, end
Py_ssize_t start = 0, end = 0
bint started = 0, finished = 0

for i in range(n):
Expand Down Expand Up @@ -1631,7 +1631,7 @@ def is_datetime_with_singletz_array(values: ndarray) -> bool:
Doesn't check values are datetime-like types.
"""
cdef:
Py_ssize_t i, j, n = len(values)
Py_ssize_t i = 0, j, n = len(values)
object base_val, base_tz, val, tz

if n == 0:
Expand Down Expand Up @@ -1913,8 +1913,8 @@ def maybe_convert_objects(ndarray[object] objects, bint try_float=0,
ndarray[int64_t] ints
ndarray[uint64_t] uints
ndarray[uint8_t] bools
ndarray[int64_t] idatetimes
ndarray[int64_t] itimedeltas
int64_t[::1] idatetimes
int64_t[::1] itimedeltas
Seen seen = Seen()
object val
float64_t fval, fnan
Expand Down
11 changes: 6 additions & 5 deletions pandas/_libs/src/parser/tokenizer.c
Original file line number Diff line number Diff line change
Expand Up @@ -424,13 +424,13 @@ static void append_warning(parser_t *self, const char *msg) {

if (self->warn_msg == NULL) {
self->warn_msg = (char *)malloc(length + 1);
strncpy(self->warn_msg, msg, strlen(msg) + 1);
snprintf(self->warn_msg, length + 1, "%s", msg);
} else {
ex_length = strlen(self->warn_msg);
newptr = safe_realloc(self->warn_msg, ex_length + length + 1);
if (newptr != NULL) {
self->warn_msg = (char *)newptr;
strncpy(self->warn_msg + ex_length, msg, strlen(msg) + 1);
snprintf(self->warn_msg + ex_length, length + 1, "%s", msg);
}
}
}
Expand Down Expand Up @@ -1433,13 +1433,14 @@ PANDAS_INLINE void uppercase(char *p) {
int to_boolean(const char *item, uint8_t *val) {
char *tmp;
int i, status = 0;
int bufsize = sizeof(char) * (strlen(item) + 1);
size_t length0 = (strlen(item) + 1);
int bufsize = sizeof(char) * length0;

static const char *tstrs[1] = {"TRUE"};
static const char *fstrs[1] = {"FALSE"};

tmp = malloc(bufsize);
strncpy(tmp, item, bufsize);
snprintf(tmp, length0, "%s", item);
uppercase(tmp);

for (i = 0; i < 1; ++i) {
Expand Down Expand Up @@ -1815,7 +1816,7 @@ double round_trip(const char *p, char **q, char decimal, char sci, char tsep,
double r = PyOS_string_to_double(p, q, 0);
if (maybe_int != NULL) *maybe_int = 0;
if (PyErr_Occurred() != NULL) *error = -1;
else if (r == Py_HUGE_VAL) *error = Py_HUGE_VAL;
else if (r == Py_HUGE_VAL) *error = (int)Py_HUGE_VAL;
PyErr_Clear();
return r;
}
Expand Down