Pypy refcheck #16193

mattip · 2017-05-02T05:06:04Z

np.resize(a, refcheck=True) relies on refcount semantics to check that no other object holds a reference to a. Since PyPy only partially mocks refcount semantics, the check is unreliable on PyPy. Unfortunately a (or rather uniques in this case) is allocated all the way out at user space, so it might be problematic to always use refcheck=False. Instead I percolated refcheck as a kwarg with default value True for safety out to the point at which uniques is allocated. See also issue #15854.

With this change, on PyPy, failing tests that use hashtables now pass

codecov · 2017-05-02T06:03:12Z

Codecov Report

Merging #16193 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #16193      +/-   ##
==========================================
+ Coverage   90.86%   90.86%   +<.01%     
==========================================
  Files         162      162              
  Lines       50862    50862              
==========================================
+ Hits        46215    46216       +1     
+ Misses       4647     4646       -1

Flag	Coverage Δ
#multiple	`88.64% <100%> (ø)`	⬆️
#single	`40.3% <66.66%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/algorithms.py	`94.41% <100%> (ø)`	⬆️
pandas/core/reshape/merge.py	`93.94% <100%> (ø)`	⬆️
pandas/core/common.py	`91.03% <0%> (+0.34%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 217864e...4acc4cd. Read the comment docs.

codecov · 2017-05-02T06:03:17Z

Codecov Report

Merging #16193 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #16193   +/-   ##
=======================================
  Coverage   90.86%   90.86%           
=======================================
  Files         162      162           
  Lines       50862    50862           
=======================================
  Hits        46215    46215           
  Misses       4647     4647

Flag	Coverage Δ
#multiple	`88.64% <100%> (ø)`	⬆️
#single	`40.3% <75%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/algorithms.py	`94.41% <100%> (ø)`	⬆️
pandas/core/reshape/merge.py	`93.94% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 217864e...9aaf521. Read the comment docs.

mattip · 2017-05-02T06:03:17Z

Fix formatting for linter

jreback · 2017-05-02T10:27:47Z

this seems like a giant change. why not just pass refcheck for object arrays only?

mattip · 2017-05-02T13:27:40Z

sorry for the commit bomb, but the resize call percolates out for all the Int64Vector, UInt64Vector, Float64Vector, StringVector classes, so either I must assume it is safe to always use refcheck=False (seems a bit dangerous) or percolate it out to where the {name}Vector object is initialized and may be held by another reference

The problem with the 'dangerous' option is that it will reallocate the memory of an object held by the user, which could cause inconsistent state in that object, but if that is what you recommend I will submit another pull request along those lines

chris-b1 · 2017-05-02T14:13:28Z

At least in the current implementation, I think forcing refcheck=False would be safe? The Vector classes allocate/own the ndarray memory, so it should always be the case that they can safely resize.

chris-b1 · 2017-05-02T14:17:54Z

A bit more invasive, but we could also remove the ndarray object entirely from these classes until to_array is called. Instead of calling ao.resize, realloc the underlying buffer - I think that would also have the advantage of removing the need to re-acquire the GIL on the resize calls, e.g. here.

pandas/pandas/_libs/hashtable_class_helper.pxi.in

Line 364 in 20fda22

with gil:

mattip · 2017-05-02T15:56:40Z

The Vector classes allocate/own the ndarray memory, so it should always be the case that they can safely resize

I added a test in commit 9aaf521 to show how refcheck semantics leak to app-level, if the user takes a reference to uniques and then calls get_labels again

chris-b1 · 2017-05-02T20:21:27Z

Vector isn't part of the public api, but I do see your point, thanks for the example. Maybe instead enforce that to_array can only called once - roughly

cdef class {{name}}Vector:
    cdef bint external_view_exists = False
    # ...ommitted impl...

    cpdef to_array(self):
        if self.external_view_exists:
            raise ValueError("")
        else:
            # original impl
            self.external_view = True

jreback · 2017-05-04T10:55:45Z

superseded by #16224

COMPAT: add refcheck kwarg and percolate out to app-level for PyPy

e221dd8

mattip mentioned this pull request May 2, 2017

COMPAT: hashtable vectors depend on refcount semantics which do not work on PyPy #15854

Closed

COMPAT: add refcheck kwarg and percolate out to app-level for PyPy

4acc4cd

mattip force-pushed the pypy-refcheck branch from 80e0414 to 4acc4cd Compare May 2, 2017 06:02

TST: show how refcheck semantics leak to user space

9aaf521

jreback added the Compat pandas objects compatability with Numpy or Python functions label May 2, 2017

chris-b1 mentioned this pull request May 3, 2017

COMPAT/PERF remove arrays from Vector classes (WIP) #16222

Closed

mattip mentioned this pull request May 4, 2017

Vector array #16224

Closed

jreback closed this May 4, 2017

mattip mentioned this pull request May 5, 2017

COMPAT/TEST test, fix for unsafe Vector.resize(), which allows refche… #16258

Merged

TomAugspurger modified the milestone: No action Jun 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pypy refcheck #16193

Pypy refcheck #16193

Uh oh!

mattip commented May 2, 2017

Uh oh!

codecov bot commented May 2, 2017

Uh oh!

codecov bot commented May 2, 2017 •

edited

Loading

Uh oh!

mattip commented May 2, 2017

Uh oh!

jreback commented May 2, 2017

Uh oh!

mattip commented May 2, 2017

Uh oh!

chris-b1 commented May 2, 2017

Uh oh!

chris-b1 commented May 2, 2017

Uh oh!

mattip commented May 2, 2017

Uh oh!

chris-b1 commented May 2, 2017

Uh oh!

jreback commented May 4, 2017

Uh oh!

Uh oh!

Uh oh!

Pypy refcheck #16193

Pypy refcheck #16193

Uh oh!

Conversation

mattip commented May 2, 2017

Uh oh!

codecov bot commented May 2, 2017

Codecov Report

Uh oh!

codecov bot commented May 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mattip commented May 2, 2017

Uh oh!

jreback commented May 2, 2017

Uh oh!

mattip commented May 2, 2017

Uh oh!

chris-b1 commented May 2, 2017

Uh oh!

chris-b1 commented May 2, 2017

Uh oh!

mattip commented May 2, 2017

Uh oh!

chris-b1 commented May 2, 2017

Uh oh!

jreback commented May 4, 2017

Uh oh!

Uh oh!

codecov bot commented May 2, 2017 •

edited

Loading