BUG: strengthen typing in get_c_string, fix StringHashTable segfault #30419

jbrockmendel · 2019-12-23T01:35:17Z

Start with util.pxd, we can tighten the arg from "object" to "str" and simplify it a bit. Tracking down the places where get_c_string is used, the main one is in StringHashTable, where in set_item it is currently used incorrectly. The test added by this PR segfaults in master.

@TomAugspurger it looks like StringHashTable has relatively light testing. Should we be using it for StringArray?

WillAyd · 2019-12-23T08:25:33Z

pandas/_libs/tslibs/util.pxd

-        buf = PyUnicode_AsUTF8AndSize(py_string, length)
-    else:
-        PyBytes_AsStringAndSize(py_string, <char**>&buf, length)
+    buf = PyUnicode_AsUTF8AndSize(py_string, length)


Hmm interesting - so I guess this helper function existed in the first place for 2/3 compat? If this is all it is reduced to why not get rid of this function altogether and just replace with PyUnicode_AsUTF8AndSize?

so I guess this helper function existed in the first place for 2/3

thats my guess, yah.

just replace with PyUnicode_AsUTF8AndSize?

for now at least because we need to figure out what to do about the segfault thing; the imlpementation here might end up changing

jreback · 2019-12-23T15:26:32Z

lgtm. not sure why the 36 build is failing though.

jbrockmendel · 2019-12-23T15:36:44Z

not sure why the 36 build is failing though.

Looks like we're passing np.str_ instead of str somewhere. Haven't looked into it closer than that so far (kinda hoping we bump numpy minversion and it fixes itself)

…g-get_c_string2

jbrockmendel · 2019-12-23T22:09:17Z

So it isnt a numpy-version problem, just a slow test that is only getting run in that build. We're passing a np.str_ object instead of just a str. This is tough to check for because isinstance(some_np_str, str) is True

jreback · 2019-12-24T14:26:21Z

lgtm.

jreback · 2019-12-24T14:26:43Z

thanks @jbrockmendel

…andas-dev#30419)

jbrockmendel added 3 commits December 22, 2019 17:30

BUG: strengthen typing in get_c_string, fix StringHashTable segfaulT

430c05f

update get_item siganture

d64b47a

GH ref

c94b351

WillAyd requested changes Dec 23, 2019

View reviewed changes

WillAyd added the Clean label Dec 23, 2019

jreback added this to the 1.0 milestone Dec 23, 2019

jreback added the Strings String extension data type and string data label Dec 23, 2019

Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…

efb06de

…g-get_c_string2

compat for np.str_

e8b3388

jreback merged commit e745be0 into pandas-dev:master Dec 24, 2019

jbrockmendel deleted the bug-get_c_string2 branch December 24, 2019 16:51

AlexKirko pushed a commit to AlexKirko/pandas that referenced this pull request Dec 29, 2019

BUG: strengthen typing in get_c_string, fix StringHashTable segfault (p…

40826c3

…andas-dev#30419)

jorisvandenbossche mentioned this pull request Jan 31, 2020

Pandas 1.0 no longer handles numpy.str_s as catgories #31499

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: strengthen typing in get_c_string, fix StringHashTable segfault #30419

BUG: strengthen typing in get_c_string, fix StringHashTable segfault #30419

jbrockmendel commented Dec 23, 2019

WillAyd Dec 23, 2019

jbrockmendel Dec 23, 2019

jreback commented Dec 23, 2019

jbrockmendel commented Dec 23, 2019

jbrockmendel commented Dec 23, 2019

jreback commented Dec 24, 2019

jreback commented Dec 24, 2019

BUG: strengthen typing in get_c_string, fix StringHashTable segfault #30419

BUG: strengthen typing in get_c_string, fix StringHashTable segfault #30419

Conversation

jbrockmendel commented Dec 23, 2019

WillAyd Dec 23, 2019

Choose a reason for hiding this comment

jbrockmendel Dec 23, 2019

Choose a reason for hiding this comment

jreback commented Dec 23, 2019

jbrockmendel commented Dec 23, 2019

jbrockmendel commented Dec 23, 2019

jreback commented Dec 24, 2019

jreback commented Dec 24, 2019