Skip to content

Pypy refcheck #16193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Pypy refcheck #16193

wants to merge 3 commits into from

Conversation

mattip
Copy link
Contributor

@mattip mattip commented May 2, 2017

np.resize(a, refcheck=True) relies on refcount semantics to check that no other object holds a reference to a. Since PyPy only partially mocks refcount semantics, the check is unreliable on PyPy. Unfortunately a (or rather uniques in this case) is allocated all the way out at user space, so it might be problematic to always use refcheck=False. Instead I percolated refcheck as a kwarg with default value True for safety out to the point at which uniques is allocated. See also issue #15854.

With this change, on PyPy, failing tests that use hashtables now pass

@codecov
Copy link

codecov bot commented May 2, 2017

Codecov Report

Merging #16193 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16193      +/-   ##
==========================================
+ Coverage   90.86%   90.86%   +<.01%     
==========================================
  Files         162      162              
  Lines       50862    50862              
==========================================
+ Hits        46215    46216       +1     
+ Misses       4647     4646       -1
Flag Coverage Δ
#multiple 88.64% <100%> (ø) ⬆️
#single 40.3% <66.66%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/algorithms.py 94.41% <100%> (ø) ⬆️
pandas/core/reshape/merge.py 93.94% <100%> (ø) ⬆️
pandas/core/common.py 91.03% <0%> (+0.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 217864e...4acc4cd. Read the comment docs.

@codecov
Copy link

codecov bot commented May 2, 2017

Codecov Report

Merging #16193 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #16193   +/-   ##
=======================================
  Coverage   90.86%   90.86%           
=======================================
  Files         162      162           
  Lines       50862    50862           
=======================================
  Hits        46215    46215           
  Misses       4647     4647
Flag Coverage Δ
#multiple 88.64% <100%> (ø) ⬆️
#single 40.3% <75%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/algorithms.py 94.41% <100%> (ø) ⬆️
pandas/core/reshape/merge.py 93.94% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 217864e...9aaf521. Read the comment docs.

@mattip
Copy link
Contributor Author

mattip commented May 2, 2017

Fix formatting for linter

@jreback
Copy link
Contributor

jreback commented May 2, 2017

this seems like a giant change. why not just pass refcheck for object arrays only?

@mattip
Copy link
Contributor Author

mattip commented May 2, 2017

sorry for the commit bomb, but the resize call percolates out for all the Int64Vector, UInt64Vector, Float64Vector, StringVector classes, so either I must assume it is safe to always use refcheck=False (seems a bit dangerous) or percolate it out to where the {name}Vector object is initialized and may be held by another reference

The problem with the 'dangerous' option is that it will reallocate the memory of an object held by the user, which could cause inconsistent state in that object, but if that is what you recommend I will submit another pull request along those lines

@chris-b1
Copy link
Contributor

chris-b1 commented May 2, 2017

At least in the current implementation, I think forcing refcheck=False would be safe? The Vector classes allocate/own the ndarray memory, so it should always be the case that they can safely resize.

@chris-b1
Copy link
Contributor

chris-b1 commented May 2, 2017

A bit more invasive, but we could also remove the ndarray object entirely from these classes until to_array is called. Instead of calling ao.resize, realloc the underlying buffer - I think that would also have the advantage of removing the need to re-acquire the GIL on the resize calls, e.g. here.

@mattip
Copy link
Contributor Author

mattip commented May 2, 2017

The Vector classes allocate/own the ndarray memory, so it should always be the case that they can safely resize

I added a test in commit 9aaf521 to show how refcheck semantics leak to app-level, if the user takes a reference to uniques and then calls get_labels again

@chris-b1
Copy link
Contributor

chris-b1 commented May 2, 2017

Vector isn't part of the public api, but I do see your point, thanks for the example. Maybe instead enforce that to_array can only called once - roughly

cdef class {{name}}Vector:
    cdef bint external_view_exists = False
    # ...ommitted impl...

    cpdef to_array(self):
        if self.external_view_exists:
            raise ValueError("")
        else:
            # original impl
            self.external_view = True

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label May 2, 2017
@mattip mattip mentioned this pull request May 4, 2017
@jreback
Copy link
Contributor

jreback commented May 4, 2017

superseded by #16224

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants