Skip to content

[BLD] Fix remaining compile-time warnings #21940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

jbrockmendel
Copy link
Member

With the exception of Numpy-Deprecated-API-1.7 warnings that any cython code produces, this fixes all remaining compiler warnings (... on py27, there's still a whole mess of them in py37).

Based on a little bit of profiling it looked like npy_isnan(x) gives a 40% perf improvement over x != x. But profiling is hard, so who knows. In the comment where npy_nan is imported there is a link to a discussion about it.

@jreback
Copy link
Contributor

jreback commented Jul 17, 2018

hmm, seems to be failing. but also there are many more checks, e.g. in windows.pyx (e.g. the error which @chris-b1 is working on ). is there a reason not to fix those too?

@jreback
Copy link
Contributor

jreback commented Jul 17, 2018

further I think adding a lint.sh check might make sense here (to avoid adding these back thru time). IOW we should have 1 and only 1 way to compare nans in cython, sure documentation helps...b.ut

@jreback jreback added the Build Library building on various platforms label Jul 17, 2018
@jreback jreback mentioned this pull request Jul 17, 2018
4 tasks
@jbrockmendel
Copy link
Member Author

hmm, seems to be failing.

Yep, will try to track this down...

but also there are many more checks

Yah, I declared initial victory and opened the PR once I got to zero-warnings in py27 on OSX.

further I think adding a lint.sh check might make sense here

Not sure how this would work, but worth a shot. Seems like there might be a compiler flag to set in setup.py to turn warnings into errors.

IOW we should have 1 and only 1 way to compare nans in cython

This doesn't get rid of all the val != val occurrences, just the ones with numeric dtypes. The cases that are typed as object don't cause the warnings.

@jreback
Copy link
Contributor

jreback commented Jul 17, 2018

This doesn't get rid of all the val != val occurrences, just the ones with numeric dtypes. The cases that are typed as object don't cause the warnings.

ok, maybe we have 2 nicely named functions thens?

@codecov
Copy link

codecov bot commented Jul 17, 2018

Codecov Report

Merging #21940 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #21940   +/-   ##
=======================================
  Coverage   91.96%   91.96%           
=======================================
  Files         166      166           
  Lines       50329    50329           
=======================================
  Hits        46287    46287           
  Misses       4042     4042
Flag Coverage Δ
#multiple 90.36% <ø> (ø) ⬆️
#single 42.23% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b73f22...64d0032. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. minor comments.

@@ -389,7 +389,11 @@ cdef class {{name}}HashTable(HashTable):
for i in range(n):
val = values[i]

{{if dtype == 'float64'}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe .startswith

@@ -271,7 +271,12 @@ def ismember_{{dtype}}({{scalar}}[:] arr, {{scalar}}[:] values, bint hasnans=0):
if k != table.n_buckets:
result[i] = 1
else:
{{if dtype == 'float64'}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -28,6 +28,15 @@ This file is derived from NumPy 1.7. See NUMPY_LICENSE.txt
#define PyInt_AsLong PyLong_AsLong
#endif

// Silence "implicit declaration of function" warnings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't these be in the .h?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that would make them public. This particular fix will be superseded by #21962 anyway.

@jreback jreback added this to the 0.24.0 milestone Jul 18, 2018
@jreback
Copy link
Contributor

jreback commented Jul 18, 2018

is there any perf diff? (maybe just test some groupby asvs)

@chris-b1
Copy link
Contributor

@jbrockmendel - assuming you're not on Windows, could you try this benchmark?
https://gist.github.com/chris-b1/0049c54ce8cf37257002ab41b278f7d9

# equality check
In [28]: %timeit isnan_inline(a)
21.6 ms ± 1.26 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# from numpy -> from math.h
In [29]: %timeit isnan_crt(a)
45.2 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using the math.h isnan ends up being a fair bit slower on Windows b/c the call can't be inlined - I'm going to guess they are about the same for you?
https://godbolt.org/g/u8do1V

@jbrockmendel
Copy link
Member Author

assuming you're not on Windows, could you try this benchmark?

OSX, py3.7:

In [3]: %timeit isnan_inline(a)
13.3 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [4]: %timeit isnan_crt(a)
13.2 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Ubuntu, py2.7:

In [3]: %timeit isnan_inline(a)
100 loops, best of 3: 16.2 ms per loop

In [4]: %timeit isnan_crt(a)
100 loops, best of 3: 16.2 ms per loop

Pretty indistinguishable on these platforms. Which implementation do you suggest we go with?

See also: cython/cython#550

@jbrockmendel
Copy link
Member Author

No perceptible change on Linux.

asv continuous -f 1.1 -E virtualenv master HEAD -b groupby
[...]
       before           after         ratio
     [4b73f22d]       [64d00320]
+       309±0.5μs          340±1μs     1.10  groupby.GroupByMethods.time_dtype_as_field('int', 'sum', 'transformation')
-         126±2μs        114±0.5μs     0.91  groupby.GroupByMethods.time_dtype_as_field('float', 'shift', 'transformation')
-       108±0.6μs       94.2±0.3μs     0.87  groupby.GroupByMethods.time_dtype_as_field('object', 'size', 'direct')

@chris-b1
Copy link
Contributor

Probably need to be something like this to pick the most efficient.

#if defined(_MSC_VER)
    #define pd_isnan(x) ((x) != (x))
#else
    #define pd_isnan(x) npy_isnan(x)
#endif

@jbrockmendel
Copy link
Member Author

Are you confident that is the "correct" answer? If so I'll change it and we're done. Otherwise, I'm inclined to close this PR as not being worth the hassle.

@chris-b1
Copy link
Contributor

I'm pretty sure (hah), that's right, but might agree with your instinct to leave this alone. x != x will always be fast, replacing it with functions, we're relying on the compiler to properly inline, which seems to work for gcc, clang, but who knows on different version, platforms, etc.

@jbrockmendel
Copy link
Member Author

OK. I'm going to close this, some of the warnings will be fixed by #21962, and I'll follow up with a PR to fix the casting-comparison warnings.

@jbrockmendel jbrockmendel deleted the isnan branch April 5, 2020 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants