-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: memory leak on get_c_string #16060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jreback - this fixes the leak, but on a closer look, I'm not sure we were safe releasing the GIL in |
Here's a contrived case that creates a data race: setup def f1(arr):
return pd.unique(arr)
def f2(arr):
arr.iloc[-1] = 'a new long string'
from concurrent import futures
import time Then depending on timing, unique can change In [38]: x = pd.Series(np.arange(10**6) % 10**3).astype(str)
...: x.iloc[-1] = 'long string #1'
...:
...: with futures.ThreadPoolExecutor(max_workers=2) as exc:
...: fut1 = exc.submit(f1, x)
...: fut2 = exc.submit(f2, x)
...: print(fut1.result()[-1])
a new long string
In [39]: x = pd.Series(np.arange(10**6) % 10**3).astype(str)
...: x.iloc[-1] = 'long string #1'
...:
...: with futures.ThreadPoolExecutor(max_workers=2) as exc:
...: fut1 = exc.submit(f1, x)
...: time.sleep(2)
...: fut2 = exc.submit(f2, x)
...: print(fut1.result()[-1])
long string #1 |
hmm it should be ok u r not manipulating python objects strings are fine |
Codecov Report
@@ Coverage Diff @@
## master #16060 +/- ##
=======================================
Coverage 90.79% 90.79%
=======================================
Files 156 156
Lines 50537 50537
=======================================
Hits 45886 45886
Misses 4651 4651
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #16060 +/- ##
==========================================
+ Coverage 90.79% 90.84% +0.05%
==========================================
Files 156 159 +3
Lines 50537 50794 +257
==========================================
+ Hits 45886 46146 +260
+ Misses 4651 4648 -3
Continue to review full report at Codecov.
|
Yeah, not sure this is actually an issue, not like mutating is ever thread safe. I was thinking there could possibly be a use after free problem if one of the python strings got deleted during the |
@chris-b1 your above example doesn't actually create a race. The actual conversion (IOW, the calling of |
from https://github.com/pandas-dev/pandas/blob/648ae4f03622d8eafe1ca3b833bd6a99f56bece4/pandas/_libs/hashtable_class_helper.pxi.in it looks like the GIL is always held when calling this function. It doesn't look like you are checking NULL though, so exceptions are going unchecked for a while. Perhaps you want pandas/pandas/_libs/src/util.pxd Line 20 in 648ae4f
|
thanks @chris-b1 |
* BUG: memory leak on get_c_string * add except NULL
* BUG: memory leak on get_c_string * add except NULL
git diff upstream/master --name-only -- '*.py' | flake8 --diff