-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Accept constant memoryviews in HashTable.lookup #21688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accept constant memoryviews in HashTable.lookup #21688
Conversation
The Travis failure seems quite silent. Is there a way to extract more information? |
I restarted the one. I haven't seen that error before. |
Codecov Report
@@ Coverage Diff @@
## master #21688 +/- ##
==========================================
- Coverage 91.95% 91.95% -0.01%
==========================================
Files 160 160
Lines 49837 49820 -17
==========================================
- Hits 45830 45812 -18
- Misses 4007 4008 +1
Continue to review full report at Codecov.
|
Must have been a one-off failure. cc @chris-b1 if you have any thoughts on this? |
@@ -379,7 +379,7 @@ cdef class {{name}}HashTable(HashTable): | |||
self.table.vals[k] = i | |||
|
|||
@cython.boundscheck(False) | |||
def lookup(self, {{dtype}}_t[:] values): | |||
def lookup(self, const {{dtype}}_t[:] values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have multiple lookup routines for different drapes - is there a reason to not change them all?
why just lookup? we have many routines which couldnin theory benefit from this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make a pass over hashtable_class_helper.pxi.in
and add const
specifiers where applicable.
Now that cython supports const memoryviews we probably should go ahead and use them everywhere possible, #18825 example of working around this. Looks like would bump our required version up to |
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -343,6 +343,8 @@ Other | |||
^^^^^ | |||
|
|||
- :meth: `~pandas.io.formats.style.Styler.background_gradient` now takes a ``text_color_threshold`` parameter to automatically lighten the text color based on the luminance of the background color. This improves readability with dark background colors without the need to limit the background colormap range. (:issue:`21258`) | |||
- Building pandas for development now requires ``cython >= 0.28`` | |||
- Require at least 0.28 version of cython to support read-only memoryviews (:issue:`21688`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referenced the PR as issue here. Is that sufficient or would you like to have separate issues for each line for documentation purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we really need to bump cython here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we need to bump Cython, it looks like there are a few things missing in order to fully bump it: pin some CI builds to the minimum Cython version, update version number in install.rst
.
See the changes in #18623 for where I bumped to 0.24; should show all the places that need updating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through the changelog of Cython, we need to bump to 0.28 to support the const
specifier for memoryviews. Looking a bit through the Travis setup, we only pin Cython for the 2.7 builds and these are the ones that failed. (sadly with a weird error message).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to bump to 0.28 to support the const specifier for memoryviews
is this an actual change in cython?
I agree its nice to change this, but cython 0.28 is pretty new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move the building pandas to a new sub-section (like we did previously when we bumped deps, as going to do this for more deps this cycle).
also is 0.28 min, or 0.28.2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.28.0
works but we should build with 0.28.2
due to the mentioned problems in sklearn
. I would at least bump the Cython version on Travis & CricleCI.
Should we then allow 0.28.0
or require 0.28.2
so that we don't get the bug reports w.r.t. problems that are fixes in .2
?
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -343,6 +343,8 @@ Other | |||
^^^^^ | |||
|
|||
- :meth: `~pandas.io.formats.style.Styler.background_gradient` now takes a ``text_color_threshold`` parameter to automatically lighten the text color based on the luminance of the background color. This improves readability with dark background colors without the need to limit the background colormap range. (:issue:`21258`) | |||
- Building pandas for development now requires ``cython >= 0.28`` | |||
- Require at least 0.28 version of cython to support read-only memoryviews (:issue:`21688`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we really need to bump cython here?
Yes, it is a new cython feature that it can now work on read-only memoryviews (see the third bullet point here: https://github.com/cython/cython/blob/master/CHANGES.rst#028-2018-03-13). This is also a feature much wanted by sklearn: scikit-learn/scikit-learn#10624 As @chris-b1 said, if we bump the cython version requirement, we should try to simplify our code base and use memoryviews in more places (#10070 as another example where two version of a function are added, one with memoryviews and other with ndarray interface to support read-only arrays) IMO, ideally, if we bump the version, it would be good to test for a few internal cases that the const memoryviews are actually working (since this is new functionality in cython, there might be still some rough edges. In sklearn they found a bug in const memoryviews that was fixed in 0.28.2). |
That's correct. |
Happy to implement the necessary changes when we bump the Cython version but it would be nice to know whether this is the preferred way forward? |
yeah am ok with bumping the cython version i guess for 0.24 since this is not user facing really we should also again bump numpy min but that’s a different issue |
6d38e08
to
36626d8
Compare
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -343,6 +343,8 @@ Other | |||
^^^^^ | |||
|
|||
- :meth: `~pandas.io.formats.style.Styler.background_gradient` now takes a ``text_color_threshold`` parameter to automatically lighten the text color based on the luminance of the background color. This improves readability with dark background colors without the need to limit the background colormap range. (:issue:`21258`) | |||
- Building pandas for development now requires ``cython >= 0.28`` | |||
- Require at least 0.28 version of cython to support read-only memoryviews (:issue:`21688`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move the building pandas to a new sub-section (like we did previously when we bumped deps, as going to do this for more deps this cycle).
also is 0.28 min, or 0.28.2?
pandas/tests/test_algos.py
Outdated
@@ -1079,13 +1079,17 @@ class TestHashTable(object): | |||
|
|||
def test_lookup_nan(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this a fixture and use this for False/True
36626d8
to
20c181f
Compare
pushed some small doc changes. ping on green for merge. |
thanks @xhochy nice patch! |
Backported from pandas-devgh-21688.
* CI: Bump NumPy to 1.9.3 Backport of gh-22499. * BLD: Fix openpyxl to 2.5.5 Backport of gh-22601. * CI: Resolve timeout issue on Travis Backported from gh-22429. * CI: Migrate to CircleCI 2.0 Backport of gh-21814. * Upgrade Cython to >=0.28.2 Backported from gh-21688. * TST: Patch locale handling Backported from gh-21739. Backport of gh-22213.
xref #10070
xref #12813
building an ExtensionArray for nullable ints using a
pyarrow.Array
, the underlying memoryview is marked as constant. The HashTable Cython code uses the memoryview in thelookup
function but in a read-only fashion. Without theconst
specifier Cython raises an error that the memoryview is read-only.Not sure how to test this in a small scope (i.e. without using
pyarrow
) or if this small change is worth mentioning in the changelog.git diff upstream/master -u -- "*.py" | flake8 --diff