Skip to content

BUG: don't allow users to move from an interned string #14494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

llllllllll
Copy link
Contributor

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master | flake8 --diff
  • whatsnew entry

I was thinking about code paths that give away borrowed references to strings and realized that interned strings can be accessed even when their refcount is 1. This is because the intern table holds a "weak ref" to the string. This allows the string to be garbage collected normally; however, if a user creates an interned string with exactly one reference and feeds it to move, you could later get a second reference to the same string object by calling intern with a copy of the old string. The data backing the old string may have been mutated so you will get very strange results. I am almost certain that no on has run into this since move_into_mutable_buffer was added but it doesn't hurt to add this quick check to prevent broken usage like this in the future.

This only affects python 2 because bytes objects cannot be interned in python 3.

@jreback
Copy link
Contributor

jreback commented Oct 25, 2016

I read somewhere that PY3 only interns short strings. Is that not right?

@jreback
Copy link
Contributor

jreback commented Oct 25, 2016

how did you find this issue (out of curiosity)? IOW, what broke?

@jreback jreback added the Bug label Oct 25, 2016
@llllllllll
Copy link
Contributor Author

I am not sure what gets interned by default, but users may call sys.intern on any string to intern it I believe. This is not a concern for move in py3 because strings are unicode type, and this function only accepts bytes type. I found this by thinking about ways to mutate string objects with tools like np.frombuffer and thought that I could try to craft bad inputs to this which would give me a mutable string. Nothing broke, this was just the result of some experiments.

@llllllllll
Copy link
Contributor Author

Also, it looks like only one of the py3.4 builds failed:
FAIL: test_downcast_limits (pandas.tools.tests.test_util.TestToNumeric)

I am not sure how my change would have affected this, and only in the particular configuration. Does anyone have any ideas?

@llllllllll llllllllll force-pushed the cannot-move-an-interned-string branch from 194fb53 to 1dbb494 Compare November 4, 2016 18:59
@codecov-io
Copy link

codecov-io commented Nov 4, 2016

Current coverage is 85.28% (diff: 100%)

Merging #14494 into master will not change coverage

@@             master     #14494   diff @@
==========================================
  Files           140        140          
  Lines         50693      50693          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          43234      43234          
  Misses         7459       7459          
  Partials          0          0          

Powered by Codecov. Last update 1d6dbb4...ae8aa64

@llllllllll llllllllll force-pushed the cannot-move-an-interned-string branch from 1dbb494 to 6a83871 Compare November 8, 2016 21:53
@jreback jreback added this to the 0.20.0 milestone Nov 11, 2016
@jreback
Copy link
Contributor

jreback commented Nov 11, 2016

this looks fine. wait till we merge #14631 then push again to force rebuilds.

@jreback jreback added the Strings String extension data type and string data label Nov 11, 2016
@llllllllll llllllllll force-pushed the cannot-move-an-interned-string branch from 6a83871 to ae8aa64 Compare November 14, 2016 17:22
@llllllllll
Copy link
Contributor Author

Tests are now passing after a rebase

@llllllllll
Copy link
Contributor Author

ping

@jreback jreback modified the milestones: 0.19.2, 0.20.0 Nov 16, 2016
@jreback jreback merged commit 726efc7 into pandas-dev:master Nov 16, 2016
@jreback
Copy link
Contributor

jreback commented Nov 16, 2016

thanks!

@llllllllll
Copy link
Contributor Author

Thank you!

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Dec 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants